自我总结

帧同步要点如下 :

确定性 : 去除随机数
缓冲 : 因为数据包并不是均匀地到达, 所以要做一个缓冲区, 然后再均匀地取出
不用TCP : 因为我们的数据对时间非常敏感, 不接受到第n个输入包就无法继续模拟第n帧, 而TCP的确认机制以及重传机制当我们丢包时, 我们只能暂停等待它重发造成卡顿
用UDP :
- 发送冗余数据 : 因为帧同步只发送玩家input数据, 而input包是很小的, 所以发冗余也不会很大
- 增量包 : 加一个bit来标志跟上一个包的比较结果, 如果这个包跟上个包一致则只发送一个1, 如果不一致则发送0和这个包的完整数据
帧同步的缺点 :
- 等的人太多 : 因为你要收到所有玩家对应帧的输入才能对这一帧进行模拟.在实践中，这意味着每个人必须等待最滞后的那个玩家.人越多等得越久, 所以帧同步不适合mmo.
- 比较耗性能 : 因为帧同步技术的话, 在客户端中，每个对象都要执行所有的物理之类的运算; 而状态同步可以只同步当前玩家周围对象的状态, 不需要同步所有对象

. . .

原文

原文标题 : Deterministic Lockstep (Keeping simulations in sync by sending only inputs)

Introduction

Hi, I’m Glenn Fiedler and welcome to Networked Physics.

In the previous article we explored the physics simulation we’re going to network in this article series. In this article specifically, we’re going to network this physics simulation using deterministic lockstep.

Deterministic lockstep is a method of networking a system from one computer to another by sending only the inputs that control that system, rather than the state of that system. In the context of networking a physics simulation, this means we send across a small amount of input, while avoiding sending state like position, orientation, linear velocity and angular velocity per-object.

The benefit is that bandwidth is proportional to the size of the input, not the number of objects in the simulation. Yes, with deterministic lockstep you can network a physics simulation of one million objects with the same bandwidth as just one.

While this sounds great in theory, in practice it’s difficult to implement deterministic lockstep because most physics simulations are not deterministic. Differences in floating point behavior between compilers, OS’s and even instruction sets make it almost impossible to guarantee determinism for floating point calculations.

Determinism

Determinism means that given the same initial condition and the same set of inputs your simulation gives exactly the same result. And I do mean exactly the same result.

Not close. Not near enough. Exactly the same. Exact down to the bit-level. So exact, you could take a checksum of your entire physics state at the end of each frame and it would be identical.

Above you can see a simulation that is almost deterministic. The simulation on the left is controlled by the player. The simulation on the right has exactly the same inputs applied with a two second delay starting from the same initial condition. Both simulations step forward with the same delta time (a necessary precondition to ensure exactly the same result) and both simulations apply the same inputs. Notice how after the smallest divergence the simulation gets further and further out of sync. This simulation is non-deterministic.

What’s going on is that the physics engine I’m using (Open Dynamics Engine) uses a random number generator inside its solver to randomize the order of constraint processing to improve stability. It’s open source. Take a look and see! Unfortunately this breaks determinism because the simulation on the left processes constraints in a different order to the simulation on the right, leading to slightly different results.

Luckily all that is required to make ODE deterministic on the same machine, with the same complied binary and on the same OS (is that enough qualifications?) is to set its internal random seed to the current frame number before running the simulation via dSetRandomSeed. Once this is done ODE gives exactly the same result and the left and right simulations stay in sync.

And now a word of warning. Even though the simulation above is deterministic on the same machine, that does not necessarily mean it would also be deterministic across different compilers, a different OS or different machine architectures (eg. PowerPC vs. Intel). In fact, it’s probably not even deterministic between debug and release builds due to floating point optimizations.

Floating point determinism is a complicated subject and there’s no silver bullet.

For more information please refer to this article.

Networking Inputs

Now let’s get down to implementation.

Our example physics simulation is driven by keyboard input: arrow keys apply forces to make the player cube move, holding space lifts the cube up and blows other cubes around, and holding ‘z’ enables katamari mode.

How can we network these inputs? Must we send the entire state of the keyboard? No. It’s not necessary to send the entire keyboard state, only the state of the keys that affect the simulation. What about key press and release events then? No. This is also not a good strategy. We need to ensure that exactly the same input is applied on the right side, at exactly the same time, so we can’t just send ‘key pressed’, and ‘key released’ events over TCP.

What we do instead is represent the input with a struct and at the beginning of each simulation frame on the left side, sample this struct from the keyboard:

    struct Input
    {
        bool left;
        bool right;
        bool up;
        bool down;
        bool space;
        bool z;
    };

Next we send that input from the left simulation to the right simulation in a way that the simulation on the right side knows that the input belongs to frame n.

And here’s the key part: the simulation on the right can only simulate frame n when it has the input for that frame. If it doesn’t have the input, it has to wait.

For example, if you were sending across using TCP you could simply send the inputs and nothing else, and on the other side you could read the packets coming in, and each input received corresponds to one frame for the simulation to step forward. If no input arrives for a given render frame, the right side can’t advance forward, it has to wait for the next input to arrive.

So let’s move forward with TCP, you’ve disabled Nagle’s Algorithm, and you’re sending inputs from the left to the right simulation once per-frame (60 times per-second).

Here it gets a little complicated. Since we can’t simulate forward unless we have the input for the next frame, it’s not enough to just take whatever inputs arrive over the network and then run the simulation on inputs as they arrive because the result would be very jittery. Data sent across the network at 60HZ doesn’t typically arrive nicely spaced, 1/60th of a second between each packet.

If you want this sort of behavior, you have to implement it yourself.

Playout Delay Buffer

Such a device is called a playout delay buffer.

Unfortunately, the subject of playout delay buffers is a patent minefield. I would not advise searching for “playout delay buffer” or “adaptive playout delay” while at work. But in short, what you want to do is buffer packets for a short amount of time so they appear to be arriving at a steady rate even though in reality they arrive somewhat jittered.

What you’re doing here is similar to what Netflix does when you stream a video. You pause a little bit initially so you have a buffer in case some packets arrive late and then once the delay has elapsed video frames are presented spaced the correct time apart. If your buffer isn’t large enough then the video playback will be hitchy. With deterministic lockstep your simulation behaves exactly the same way: showing hitches when the buffer isn’t large enough to smooth out the jitter. Of course, the cost of increasing the buffer size is additional latency, so you can’t just buffer your way out of all problems. At some point the user says enough! That’s too much latency added. No sir, I will not play your game with 1 second of extra delay :)

My playout delay buffer implementation is really simple. You add inputs to it indexed by frame, and when the very first input is received, it stores the current local time on the receiver machine and from that point on delivers packets assuming they should play at that time + 100ms. You’ll likely need to something more complex for a real world situation, perhaps something that handles clock drift, and detecting when the simulation should slightly speed up or slow down to maintain a nice amount of buffering safety (being “adaptive”) while minimizing overall latency, but this is reasonably complicated and probably worth an article in itself.

The goal is that under average conditions the playout delay buffer provides a steady stream of inputs for frame n, n+1, n+2 and so on, nicely spaced 1/60th of a second apart with no drama. In the worst case the time arrives for frame n and the input hasn’t arrived yet it returns null and the simulation is forced to wait. If packets get bunched up and delivered late, it’s possibly to have multiple inputs ready to dequeue per-frame. In this case I limit to 4 simulated frames per-render frame so the simulation has a chance to catch up, but doesn’t simulate for so long that it falls further behind, aka. the “spiral of death”.

Is TCP good enough?

Using this playout buffer strategy and sending inputs across TCP we ensure that all inputs arrive reliably and in-order. This is convenient, and after all, TCP is designed for exactly this situation: reliable-ordered data.

In fact, It’s a common thing out there on the Internet for pundits to say stuff like:

If you need reliable-ordered, you can’t do better than TCP!

Your game doesn’t need UDP (yet)

But I’m here to tell you this kind of thinking is dead wrong.

Above you can see the simulation networked using deterministic lockstep over TCP at 100ms latency and 1% packet loss. If you look closely on the right side you can see hitches every few seconds. What’s happening here is that each time a packet is lost, TCP has to wait RTT*2 while it is resent (actually it can be much worse, but I’m being generous…). The hitches happen because with deterministic lockstep the right simulation can’t simulate frame n without input n, so it has to pause to wait for input n to be resent!

That’s not all. It gets significantly worse as latency and packet loss increase. Here is the same simulation networked using deterministic lockstep over TCP at 250ms latency and 5% packet loss:

Now I will concede that if you have no packet loss and/or a very small amount of latency then you very well may get acceptable results with TCP. But please be aware that if you use TCP it behaves terribly under bad network conditions.

Can we do better than TCP?

Can we beat TCP at its own game. Reliable-ordered delivery?

The answer is an emphatic YES. But only if we change the rules of the game.

Here’s the trick. We need to ensure that all inputs arrive reliably and in order. But if we send inputs in UDP packets, some of those packets will be lost. What if, instead of detecting packet loss after the fact and resending lost packets, we redundantly include all inputs in each UDP packet until we know for sure the other side has received them?

Inputs are very small (6 bits). Let’s say we’re sending 60 inputs per-second (60fps simulation) and round trip time we know is going the be somewhere in 30-250ms range. Let’s say just for fun that it could be up to 2 seconds worst case and at this point we’ll time out the connection (screw that guy). This means that on average we only need to include between 2-15 frames of input and worst case we’ll need 120 inputs. Worst case is 120 x 6 = 720 bits. That’s only 90 bytes of input! That’s totally reasonable.

We can do even better. It’s not common for inputs to change every frame. What if when we send our packet instead we start with the sequence number of the most recent input, and the 6 bits of the first (oldest) input, and the number of un-acked inputs. Then as we iterate across these inputs to write them to the packet we can write a single bit (1) if the next input is different to the previous, and (0) if the input is the same. So if the input is different from the previous frame we write 7 bits (rare). If the input is identical we write just one (common). Where inputs change infrequently this is a big win and in the worst case this really isn’t that bad. 120 bits of extra data sent. Just 15 bytes overhead worst case.

Of course another packet is required from the right simulation to the left so the left side knows which inputs have been received. Each frame the right simulation reads input packets from the network before adding them to the playout delay buffer and keeps track of the most recent input it has received and sends this back to the left as an “ack” or acknowledgment for inputs.

When the left side receives this ack it discards any inputs older than the most recent received input. This way we have only a small number of inputs in flight proportional to the round trip time between the two simulations.

Flawless Victory

We have beaten TCP by changing the rules of the game.

Instead of “implementing 95% of TCP on top of UDP” we have implemented something totally different and better suited to our requirements. A protocol that redundantly sends inputs because we know they are small, so we never have to wait for retransmission.

So exactly how much better is this approach than sending inputs over TCP?

Let’s take a look…

The video above shows deterministic lockstep synchronized over UDP using this technique with 2 seconds of latency and 25% packet loss. Imagine how awful TCP would look under these conditions.

So in conclusion, even where TCP should have the most advantage, in the only networking model that relies on reliable-ordered data, we can still easily whip its ass with a simple protocol built on top of UDP.

译文

译文出处

翻译：张乾光（星际迷航）审校：陈敬凤(nunu)

介绍

大家好，我是格伦·菲德勒。欢迎大家阅读系列教程《网络物理仿真》，这个系列教程的目的是将物理仿真的状态通过网络进行广播。

在之前的文章中，我们讨论了物理仿真需要在网络上进行广播的各种属性。在这篇文章中，我们将使用具有确定性的帧同步技术来将物理仿真通过网络进行传递和广播。

具有确定性的帧同步是一种用来在一台电脑和其他电脑之间进行同步的方法，这种方法发送的是控制仿真状态变化的输入，而不是像其他方法那样发送的是仿真过程中物体的状态变化。这种方法的背后思想是给定一个初始状态，不妨设为S(n)，我们通过使用输入信息I(n)来运行仿真就能得到S(n+1)这个状态。然后我们可以通过S(n+1)这个状态和输入信息I(n+1)来运行仿真就能得到S(n+2)这个状态，我们可以一直重复这个过程得到S(n+3)、S(n+4)以及其后的各个状态。这看上去有点像是数学归纳法，我们可以只通过输入信息和之前的仿真状态就能得到后面的仿真状态-而且得到的仿真状态是高度一致，并且也不需要发送任何状态方面的同步。

这个网络模型的主要优点是所需的带宽仅仅用来传递输入信息，而输入信息所占的带宽其实是与仿真中物体的数目是完全无关的。你可以通过网络来对一百万个物体进行物理仿真，它所需的带宽会跟只对一个物体进行物理仿真所需的带宽完全相同。可以很容易的看到物理物体的状态通常是包含位置、方向、线性速度和角速度（如果是未压缩的话，这些状态一共需要52字节，在这里面假设方向使用的是四元数而其他所有的变量都是用vec3来表示），所以当你有大量的物体需要进行物理仿真的时候，这是一个非常具有吸引力的方案。

确定性

如果要采用具有确定性的帧同步这个方案来将物理仿真网络化，首先要做的第一件事就是要确保你的仿真具有确定性。在这个上下文中，确定性其实和自由意志之类的没有关系。它只是意味着给定相同的初始条件和相同的一组输入，仿真能够给出完全相同的结果。而且我在这里要着重强调下是完全相同的结果。而不是说的什么在在浮点数容忍度内足够接近。这种精确是精确到比特位的。所以这种精确性使得你可以在每帧的末尾对整个物理状态做一个校验和，不同机器上面同一帧得到的校验和是完全一致的。

从上面的图中可以看到，这里面的仿真几乎是具有确定性的，但是不完全具有确定性。左边的仿真由玩家进行控制，而右边的仿真有完全一致的初始状态，输入信息也和左边完全相同，但是要有2秒钟的延迟。这两个仿真使用相同的间隔时间进行更新（使用相同的间隔时间进行更新也是确保得到完全一致结果的一个必要前提条件），并且在每一帧前对相同的输入信息进行相应。你可以注意到随着仿真的进行，那些一开始很微小的差异是如何一点点被扩大，最后导致两个仿真完全不同步。所以说这个仿真其实不具有确定性。

上面到底发生了什么?最后会导致两个仿真的结果差的这么大？这是因为我使用的物理引擎（ODE）在它的内部使用了一个随机数生成器来对约束处理的顺序进行随机化来提高稳定性。这个物理引擎是完全开源的，所以可以看看它的内部实现！不幸的是，由于左边的仿真处理约束的顺序和右边的仿真处理约束的顺序不同，这导致有一些轻微不同的结果。

幸运的是我们还是能找到让ODE这个物理引擎具有确定性的条件：要在同一台机器上、使用同一个编译好的二进制文件、并且在完全相同的操作系统上运行（这是必要的限制条件么？），还有就是在运行仿真之前通过dSetRandomSeed把随机数的种子设为当前帧的帧数。一旦满足这些条件的话，ODE这个物理引擎能够给出完全相同的结果，并且左边和右边的仿真能够保持高度一致的同步。

现在让我们针对上面这个情况给出一个警告。即使ODE这个物理引擎能够在相同的机器上得到确定性的结果，但是这并不一定意味着在不同编译器、不同的操作系统甚至不同的机器架构上（比如说在PowerPC架构上和在Intel架构上）它能够得到确定性的结果。事实上，由于浮点数的优化，在程序的debug版本和release版本之间可能都没有办法得到确定性的结果。浮点数的确定性是一个非常复杂的问题，而且这个问题没有银弹（意味着这个问题没有什么简单可行的解决办法）。要了解更多这方面的信息，请参考这篇文章。

网络输入 Inputs

让我们讨论下具有确定性的帧同步的具体实现方法。

你可能想知道在我们这个示例仿真中输入信息到底是啥，以及我们该如何吧这些输入信息进行网络化。我们这个示例仿真是由键盘输入进行驱动的：方向键会给代表玩家的立方体施加一个力让他进行移动、按下空格键会把代表玩家的立方体提起来并把碰到的立方体四处滚落、按下‘z’键会启动katamari模式。

但是我们该如何对这些输入信息进行网络化呢？我们需要把整个键盘的状态在网络上进行传输么？在这些键被按下和释放的时候我们要发送这些事件么？不，整个键盘的状态不需要在网络上进行传输，我们只需要传输那些会影响仿真的按键。那么被按下和释放的键的事件需要在网络上进行传输么？不，这也不是一个好的策略。我们需要确保的是在仿真第n帧的时候右边的仿真能够应用完全相同的输入信息，所以我们不能仅仅是通过TCP来发送“按键按下”和“按键释放”的事件，因为这些事件到达网络的另外一侧的时间如果早于或者晚于第n帧的时候都会给仿真造成偏差。

相反我们做的事情是用一个结构来表示整个输入信息，并且在左边一侧仿真开始的时候，通过键盘的访问来填充这个结构并把填充好的结构放到一个滑动窗口中，我们在后面可以根据帧号来对这个输入进行访问。

struct Input
{
    bool left;
    bool right;
    bool up;
    bool down;
    bool space;
    bool z;
};

现在我们就可以通过上面的方法来把左边仿真的输入信息发送到右边仿真中去，这样右边的仿真就知道属于第n帧的输入信息到底是怎么样的。举个简单的例子来说，如果你在通过TCP进行发送的话，你可以简单的只发送输入信息而不发送其他的内容，而发送的输入信息的顺序隐含着帧号N。而在网络的另外一侧，你可以读取传送过来的数据包，并且对输入信息进行处理并把输入信息应用到仿真中去。我不推荐这种方法，但我们可以从这里开始，然后再向你展示如何把这种方法变得更好。

在进一步对这个方法进行优化之前，让我们先统一下使用的网络环境，让我们假设下我们是通过TCP进行数据传输，已经禁止了Nagle算法并且每帧都会从左边的仿真向右边的仿真发送一次输入信息（频率是每秒60次）。

这里面有一个问题会变得比较复杂。把左边仿真发生的输入信息通过网络进行传输，然后右边仿真并没有足够的时间来从网络上收到输入信息并利用这些到达的输入信息来模拟仿真，因为这个过程需要一定的时间。你不能按照某个频率在网络上发送信息并且期望它们能够按照完全相同一致的频率到达网络的另外一侧(比如说，每六十分之一秒到达一个数据包)。互联网并不是按照这个方式工作的。根本就没有这样的保证。

播放延迟缓冲区

如果你想要做到这一点的话，你必须实现一个叫做播放延迟缓冲区的东西。不幸的是，播放延迟缓冲区收到了专利保护，也就是一个专利雷区。我不建议读者在实际使用确定性的帧同步模型的时候搜索“播放延迟缓冲区”或者是“自适应性延迟缓冲区”。但简而言之，你所需要做的事情是缓存收到的数据包一小段时间以便让这些数据包表现的像是以一个稳定的速度到达那样，即使实际上它们的到达时间是充满抖动的。

你现在所做的事情就跟你在看一个视频流的时候，Netflix所做的事情是很类似的。你在最初开始的时候停顿了一下以便你可以拥有一个缓冲区，这样即使一些数据包的到达时间有点晚，但是这种延迟不会对视频帧按正确时间间距的表现有什么影响，视频帧仍然会按照正确的时间间隔一帧帧的播放。当然如果你的缓冲区没有足够大的话，那么这些视频帧的播放可能还是会充满一些抖动。有了确定性的帧同步机制，你的模拟仿真将会以完全相同的方式执行。我建议在播放的时候最好在一开始有100毫秒-250毫秒的延迟。在下面的例子中，我使用的是100毫秒的延迟，这是因为我让延迟最小化来增强响应性。

我的播放延迟缓冲区的实现非常的简单。是将输入信息按照帧序号进行添加，当收到第一个输入信息的时候，它保存了接收方机器上的当前本地时间，并且从那一个时刻起假设所有到达的数据包都会带上100毫秒的延迟。你可能需要一些更加复杂的机制来适应真实世界的情况，比如说可能需要处理时钟漂移、检测在什么时候应该适当的加速或者减慢模拟的速度来让缓冲区的大小在能够保证整体延迟最小的情况下保持在一个适度的情形（这就是所谓的“自适应”），但是这些内容可能会相当的复杂并且可能需要一整篇文章来专门对这些情况进行专门论述。而且如前所述，这些内容还涉及到了专利保护方面的内容，所以这些内容我就不详细展开了，把如何处理这些东西全部托付给你自己实现。

在平均情况下，播放延迟缓冲区给帧n、n+1、n+2以及后续的帧提供了一个稳定的输入信息流，非常完美的以六十分之一秒的间隔依次到达。在最坏的情况下，就是已经该执行第N帧的模拟仿真了，但是这一帧的输入信息还没有到达，那么它就会返回一个空指针，这样整个模拟仿真就必须在那里进行等待了。如果数据包被集中起来发送并且到达接收方的时候已经比预期时间延迟了，这可能会导致多个帧的输入信息同时准备好等待出列进行计算。如果是这种情况的话，我会限制在一个渲染帧的时间最多只能进行4次模拟仿真，这样给模拟仿真一个追上来的机会。如果你把这个值设置的更高的话，那么可能会引起更多其他的问题，比如卡顿，因为你可能需要超过六十分之一秒的时间来运行这些帧（这可能会造成一个非常不好的反馈体验）。总而言之，重要的是确保你的模拟仿真在使用确定性的帧同步这个方案的时候性能不是在中央处理器这一端受限的，否则的话，你在运行更多的模拟帧来追上正常的模拟速度的时候会遇到很多麻烦。

TCP足够好了吗

通过使用这种延迟缓冲区的策略以及通过TCP协议来发送输入信息，我们可以很轻松的确保所有的输入信息会有序的到达并且传输是可信赖的。这就是一开始TCP协议在设计的时候希望达到的目标。实际上，下面这些东西就是互联网的专家常说的一些东西：

· 如果你需要一个可以信赖的有序的发送信息的方法，你不可能找到一种比通过TCP协议进行传输更好的方法！

· 你的游戏根本就不会需要UDP协议。

我在这里将告诉你上面这些想法都是大错特错的。

在上面的视频中，你可以看到如果网络同步模型使用基于TCP协议的确定性的帧同步模型的话，模拟仿真的网络延迟大概是100毫秒，并且有百分之一的丢包率。如果你仔细看右边的话，你可以每隔几秒就会出现一些抖动。如果你在两边都出现这种的情况，那么很抱歉这意味着你的电脑的性能对于播放这些视频而言可能有些艰难。如果是这种情况的话，我建议下载这个视频然后离线观看。无论如何，这里所发生的事情是当一个数据包丢失的时候，TCP协议需要等待至少２个往返时延才会重新发送这个数据包（实际上这里面的等待时间可能会更糟，但是我很慷慨的设定了一个非常理想的情况。。。）。所以上面发生的抖动原因是确定性的帧同步模型要求右边的模拟仿真在没有第n帧的输入信息的时候不能执行第N帧的模拟仿真计算，所以整个模拟仿真就停下来等待对应帧的输入信息的到达!

这还不是全部！随着延迟时间的增大和丢包率的增加，整个情况会变得更加的糟糕。这是在250毫秒延迟和百分之五丢包率的情况下，使用基于TCP协议的确定性的帧同步模型进行相同的仿真模拟运算导致的结果：

现在我要承认一个事情，如果延迟时间设置的非常低的话同时不存在丢包的情况下，那么使用TCP协议进行输入信息的传输会是一个非常可以接受的结果。但是请注意，如果你使用TCP协议来发送时间敏感的数据的话，随着延迟时间的增大和丢包率的增加，整个结果会急剧恶化。

我们能比TCP做得更好吗

我们可以做得更好吗?我们能在自己的游戏里面找到一种比使用TCP协议更好的办法。同时还能实现可信赖的有序传递？

答案是肯定的。但前提是我们需要改变游戏的规则。

下面将具体描述下我们将使用的技巧。我们需要确保所有的输入信息能够可靠地按顺序到达。但是如果我们只发送UDP数据包输入。但是如果我们只使用UDP数据包来发送输入信息的话，这里面的一些数据包会丢失。那么如果我们不采用事后检测的方法来判断哪些数据包丢失并发送这些丢失的数据包的话，我们采用另外一种方法，只是把我们有的所有输入信息都冗余的发送直到我们知道这些输入信息成功的到达另外一侧怎么样？

输入信息都非常非常的小（只有6比特这么大）。让我们假设下我们在每秒需要发送60个输入信息（因为模拟仿真的频率是60fps ），而且我们知道一个往返的时间大概是在30毫秒到250毫秒之间。纯粹为了好玩，让我们假设下载最糟糕的情况下，一个往返的时间可以高达2秒，如果出现这种情况的话，那么整个连接就会超时。这意味着在平均情况我们只需要包括大概2到15帧的输入信息，而在最坏情况下，我们大概需要120帧的输入信息。那么最坏情况下，输入信息的大小是120 x 6 = 720比特。这只是90字节的输入信息!这是安全合理的。

我们还能做的更好。在每一帧中都出现输入信息的变化是非常不常见的。我们可以用最近的那个input的序列号和第一个input的6比特还有所有未被确认的输入信息的数目来做一些事。然后，当我们对这些输入信息进行遍历将它们写入数据包的时候，如果发现这一帧的输入信息如果和之前帧的输入信息不同的话，我们可以写入一个单独的比特位（1），如果发现这一帧的输入信息如果和之前帧的输入信息相同的话，我们可以写入一个单独的比特位（0）。所以这一帧的输入信息如果和之前帧的输入信息不同的话（这种情况比较少），我们需要写入7个比特位，这一帧的输入信息如果和之前帧的输入信息相同的话（这种情况其实非常常见），我们只需要写入1个比特位。在输入信息很少发生变化的情况，这是一个重大的胜利，而在最坏的情况下出现的情况也不会非常糟糕。只需要发送额外120个比特的数据，也就是说在最坏情况下，也只有15字节的额外开销.

当然在这种情况下，需要从右边的模拟仿真中发送一个数据包到左边的模拟仿真中去，这样左边的模拟仿真才知道哪些输入信息被成功收到了。在每一帧，右边的模拟仿真都会从网络中读取输入的数据包，然后才会把这些数据包添加到延迟播放缓冲区，并且通过帧号记录它已经收到的最近那一帧的输入信息，或者如果你想容易一点处理这个问题的话，那么使用一个16比特的序列号就能很好的包装这个信息。在所有的输入数据包都被处理以后，如果右边的模拟仿真收到任何帧的输入信息以后都会回复一个数据包给左边的模拟仿真，告诉它最近收到的最新序列号是多少，这基本就是一个“ack”包，也就是确认包。

当左边的模拟仿真收到这个“ack”包，也就是确认包以后，它会滑动输入信息窗口并且丢弃比已经确认的序列号还老的输入信息包。已经没有必要再发送这些输入信息包给右边的模拟仿真了，因为已经知道右边的模拟仿真成功的接受到了这些输入信息包。通过这种方式，我们通常只有少量的输入信息正在传输过程中，而且这个数量还是与数据包的往返时间成正比的。

完美胜利

我们通过改变游戏的规则成功了找到了一种比TCP协议更好的办法。我们并不是通过在UDP协议纸上构建了实现TCP协议百分之九十五的功能的新协议，而是实现了一种完全不同的方法，而且更加适合我们的要求：数据对时间非常敏感。我们开发了一个自定义的协议，可以冗余的发送input，因为我们知道这些input非常小, 所以我们不必去等待重传它们。

所以这种方法到底比通过TCP协议来发送数据好多少呢？

让我们通过一个例子来看一下。

上面的视频是基于UDP协议来使用具有确定性的帧同步模型，延迟时间是２秒，并且有百分之二十五的丢包率。想象下如果我们是使用基于TCP协议的具有确定性的帧同步模型，我们该看到多么可怕的场景！

所以最后我们能得到这么一个结论：即使这是一个TCP协议最具有优势的情况下，这是唯一一个依赖可靠性、有序性数据传输的网络模型，我们还是可以很容易的通过一个自定义的协议基于UDP来发送我们的数据包，并且得到的效果更好。

原文作者未做权利声明，视为共享知识产权进入公共领域，自动获得授权。