构建游戏网络协议二之序列化策略

自我总结本篇概要

  • 读取数据的时候要特别小心, 因为可能会有攻击者发送过来的恶意的数据包以及错误的包, 在写入数据的时候你可能会轻松很多,因为如果有任何事情出错了,那几乎肯定是你自己导致的错误
  • 统一的数据包序列化功能 :诀窍在于让流类型的序列化函数模板化。在我的系统中有两个流类型:ReadStream类和WriteStream类。每个类都有相同的一套方法,但实际上它们没有任何关系。一个类负责从比特流读取值到变量中,另外一个类负责把变量的值写到流中。
    在模板里类似这样写, 通过 Stream::IsWritingStream::IsReading 模板会自动区分,然后帮你生产你想要的代码, 简洁漂亮

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    class WriteStream
    {
    public:

    enum { IsWriting = 1 };
    enum { IsReading = 0 };
    // ...
    };

    class ReadStream
    {
    public:

    enum { IsWriting = 0 };
    enum { IsReading = 1 };
    // ..
    }

    template <typename Stream>
    bool serialize( Stream & stream, float & value )
    {
    union FloatInt
    {
    float float_value;
    uint32_t int_value;
    };

    FloatInt tmp;
    if ( Stream::IsWriting )
    tmp.float_value = value;

    bool result = stream.SerializeBits( tmp.int_value, 32 );

    if ( Stream::IsReading )
    value = tmp.float_value;

    return result;
    }
  • 边界检查和终止读取 : 把允许的大小范围也传给序列化函数而不仅仅是所需的比特数量。

  • 序列化浮点数和向量 : 计算机根本不知道内存中的这个32位的值到底是一个整数还是一个浮点数还是一个字符串的部分。它知道的就是这仅仅是一个32位的值。代码如下(可以通过一个联合体来访问看上去是整数的浮点数).
    有些时候,你并不想把一个完整精度的浮点数进行传递。那么该如何压缩这个浮点值?第一步是将它的值限制在某个确定的范围内然后用一个整数表示方式来将它量化。
    举个例子来说,如果你知道一个浮点类型的值是在区间[-10,+10],对于这个值来说可以接受的精确度是0.01,那么你可以把这个浮点数乘以100.0让它的值在区间[-1000,+1000]并在网络上将其作为一个整数进行序列化。而在接收的那一端,仅仅需要将它除以100.0来得到最初的浮点值.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    union FloatInt
    {
    float float_value;
    uint32_t int_value;
    };

    FloatInt tmp;
    tmp.float_value= 10.0f;
    printf(“float value as an integer: %x\n”, tmp.int_value );
  • 序列化字符串和数组 : 为什么要费那么大精力把一个字节数组按比特打包到你的比特流里?为什么不在序列化写入之前进行按字节进行对齐?Why not align to byte so you can memcpy the array of bytes directly into the packet?
    如何将比特流按字节对齐?只需要在流的当前位置做些计算就可以了,找出还差写入多少个比特就能让当前比特流的比特数量被8整除,然后按照这个数字插入填充比特(比如当前比特流的比特数量是323,那么323+5才能被8整除,所以需要插入5个填充比特)。对于填充比特来说,填充的比特值都是0,这样当你序列化读取的时候你可以进行检测,如果检测的结果是正确的,那么就确实是在读取填充的部分,并且填充的部分确实是0。一直读取到下一个完整字节的比特起始位置(可以被8整除的位置)。如果检测的结果是在应该填充的地方发现了非0的比特值,那么就中止序列化读取并丢弃这个数据包。

  • 序列化数组的子集 : 当实现一个游戏网络协议的时候,或早或晚总会需要序列化一个对象数组然后在网络上传递。比如说服务器也许需要把所有的物体发送给客户端,或者有时候需要发送一组事件或者消息。如果你要发送所有的物体到客户端,这是相当简单直观的,但是如果你只是想发送一个数组的一个子集怎么办?
    最先想到也是最容易的办法是遍历数组的所有物体然后序列化一个bool数组,这个bool数组标记的是对应的物体是否通过网络发送。如果bool值为1那么后面会跟着物体的数据,否则就会被忽略然后下一个物体的bool值取决于流的下一个值。
    如果有大量的物体需要发送,举个例子来说,整个场景中有4000个物体,有一半的物体也就是2000个需要通过网络进行发送。每个物体需要一个序号,那么就需要2000个序号,每个序号需要12比特。。。。这就是说数据包里面24000比特或者说接近30000比特(几乎是30000,不是严格是,译注:原文如此)的数据被序号浪费掉了.
    可以把序号的编码方式修改下来节省数据,序号不再是全局序号,而是相对上一个物体的相对序号。
  • 如何应对恶意数据包和错误包 : 如果某些人发送一些包含随机信息的恶意数据包给你的服务器。你会不会在解析的时候把服务器弄崩溃掉?
    有三种技术应对 :
    • 协议ID : 在你的数据包里面包含协议ID。一般典型的做法是,头4个字节你可以设定一些比较罕见而且独特的值,你可以通过这32比特的数据判断出来根本就不是你的应用程序的包,然后就可以直接丢弃了。
    • CRC32 : 对你的数据包整体做一个CRC32的校验,并把这个校验码放到数据包的包头。可以不发送这个协议ID,但是发送方和接收方提前确认过这个协议ID是什么,并在计算数据包CRC32值的时候装作这个数据包带上了这个协议ID的前缀来参与计算。这样如果发送方使用的协议ID与接收方不一致的时候,CRC32的校验就会失败,这将为每个数据包节省4个字节.
    • 序列化检测 : 是在包的中间,在一段复杂的序列化写入之前或者之后写上一个已知的32比特整数,并在另外一端序列化读取的时候用相同的值进行检测判断。如果序列化检查值是不正确的,那么就中止序列化读取并丢弃这个数据包。
      . . .

原文

原文出处

原文标题 : Serialization Strategies (Smart tricks that unify packet read and write)


Introduction

Hi, I’m Glenn Fiedler and welcome to Building a Game Network Protocol.


In the previous article, we created a bitpacker but it required manual checking to make sure reading a packet from the network is safe. This is a real problem because the stakes are particularly high - a single missed check creates a vulnerability that an attacker can use to crash your server.


In this article, we’re going to transform the bitpacker into a system where this checking is automatic. We’re going to do this with minimal runtime overhead, and in such a way that we don’t have to code separate read and write functions, performing both read and write with a single function.


This is called a serialize function.


Serializing Bits


Let’s start with the goal. Here’s where we want to end up:


struct PacketA
{
int x,y,z;

template <typename Stream> bool Serialize( Stream & stream )
{
serialize_bits( stream, x, 32 );
serialize_bits( stream, y, 32 );
serialize_bits( stream, z, 32 );
return true;
}
};

Above you can see a simple serialize function. We serialize three integer variables x,y,z with 32 bits each.


struct PacketB
{
int numElements;
int elements[MaxElements];

template <typename Stream> bool Serialize( Stream & stream )
{
serialize_int( stream, numElements, 0, MaxElements );
for ( int i = 0; i < numElements; ++i )
{
serialize_bits( buffer, elements[i], 32 );
}
return true;
}
};

And now something more complicated. We serialize a variable length array, making sure that the array length is in the range [0,MaxElements].


Next, we serialize a rigid body with an simple optimization while it’s at rest, serializing only one bit in place of linear and angular velocity:


struct RigidBody
{
vec3f position;
quat4f orientation;
vec3f linear_velocity;
vec3f angular_velocity;

template <typename Stream> bool Serialize( Stream & stream )
{
serialize_vector( stream, position );
serialize_quaternion( stream, orientation );
bool at_rest = Stream::IsWriting ? ( velocity.length() == 0 ) : 1;
serialize_bool( stream, at_rest );
if ( !at_rest )
{
serialize_vector( stream, linear_velocity );
serialize_vector( stream, angular_velocity );
}
else if ( Stream::IsReading )
{
linear_velocity = vec3f(0,0,0);
angularvelocity = vec3f(0,0,0);
}
return true;
}
};

Notice how we’re able to branch on Stream::IsWriting and Stream::IsReading to write code for each case. These branches are removed by the compiler when the specialized read and write serialize functions are generated.


As you can see, serialize functions are flexible and expressive. They’re also safe, with each serialize call performing checks and aborting read if anything is wrong (eg. a value out of range, going past the end of the buffer). Most importantly, this checking is automatic, so you can’t forget to do it!


Implementation in C++


The trick to making this all work is to create two stream classes that share the same interface: ReadStream and WriteStream.


The write stream implementation writes values using the bitpacker:


class WriteStream
{
public:

enum { IsWriting = 1 };
enum { IsReading = 0 };

WriteStream( uint8_t buffer, int bytes ) : m_writer( buffer, bytes ) {}

bool SerializeInteger( int32_t value, int32_t min, int32_t max )
{
assert( min < max );
assert( value >= min );
assert( value <= max );
const int bits = bits_required( min, max );
uint32_t unsigned_value = value - min;
m_writer.WriteBits( unsigned_value, bits );
return true;
}

// …

private:

BitWriter m_writer;
};

And the read stream implementation reads values in:


class ReadStream
{
public:

enum { IsWriting = 0 };
enum { IsReading = 1 };

ReadStream( const uint8_t buffer, int bytes ) : m_reader( buffer, bytes ) {}

bool SerializeInteger( int32_t & value, int32_t min, int32_t max )
{
assert( min < max );
const int bits = bits_required( min, max );
if ( m_reader.WouldReadPastEnd( bits ) )
{
return false;
}
uint32_t unsigned_value = m_reader.ReadBits( bits );
value = (int32_t) unsigned_value + min;
return true;
}

// …

private:

BitReader mreader;
};

With the magic of C++ templates, we leave it up to the compiler to specialize the serialize function to the stream class passed in, producing optimized read and write functions.


To handle safety serialize calls are not actually functions at all. They’re actually macros that return false on error, thus unwinding the stack in case of error, without the need for exceptions.


For example, this macro serializes an integer in a given range:


#define serialize_int( stream, value, min, max )                    \
do \
{ \
assert( min < max ); \
int32_t int32_value; \
if ( Stream::IsWriting ) \
{ \
assert( value >= min ); \
assert( value <= max ); \
int32_value = (int32_t) value; \
} \
if ( !stream.SerializeInteger( int32_value, min, max ) ) \
{ \
return false; \
} \
if ( Stream::IsReading ) \
{ \
value = int32_value; \
if ( value < min || value > max ) \
{ \
return false; \
} \
} \
} while (0)

If a value read in from the network is outside the expected range, or we read past the end of the buffer, the packet read is aborted.


Serializing Floating Point Values


We’re used to thinking about floating point numbers as being different to integers, but in memory they’re just a 32 bit value like any other.


The C++ language lets us work with this fundamental property, allowing us to directly access the bits of a float value as if it were an integer:


union FloatInt
{
float float_value;
uint32_t int_value;
};

FloatInt tmp;
tmp.float_value = 10.0f;
printf( “float value as an integer: %x\n”, tmp.int_value );

You may prefer to do this with an aliased uint32_t pointer, but this breaks with GCC -O2. Friends of mine point out that the only truly standard way to get the float as an integer is to cast a pointer to the float value to char and reconstruct the integer from the bytes values accessed through the char pointer.


Meanwhile in the past 5 years I’ve had no problems in the field with the union trick. Here’s how I use it to serialize an uncompressed float value:


template <typename Stream>
bool serialize_float_internal( Stream & stream,
float & value )
{
union FloatInt
{
float float_value;
uint32_t int_value;
};
FloatInt tmp;
if ( Stream::IsWriting )
{
tmp.float_value = value;
}
bool result = stream.SerializeBits( tmp.int_value, 32 );
if ( Stream::IsReading )
{
value = tmp.float_value;
}
return result;
}

This is of course wrapped with a serialize_float macro for error checking:


#define serialize_float( stream, value )                             \
do \
{ \
if ( !serialize_float_internal( stream, value ) ) \
{ \
return false; \
}
} while (0)

We can now transmit full precision floating point values over the network.


But what about situations where you don’t need full precision? What about a floating point value in the range [0,10] with an acceptable precision of 0.01? Is there a way to send this over the network using less bits?


Yes there is. The trick is to simply divide by 0.01 to get an integer in the range [0,1000] and send that value over the network. On the other side, convert back to a float by multiplying by 0.01.


Here’s a general purpose implementation of this basic idea:


template <typename Stream>
bool serialize_compressed_float_internal( Stream & stream,
float & value,
float min,
float max,
float res )
{
const float delta = max - min;
const float values = delta / res;
const uint32_t maxIntegerValue = (uint32_t) ceil( values );
const int bits = bits_required( 0, maxIntegerValue );
uint32_t integerValue = 0;
if ( Stream::IsWriting )
{
float normalizedValue =
clamp( ( value - min ) / delta, 0.0f, 1.0f );
integerValue = (uint32_t) floor( normalizedValue
maxIntegerValue + 0.5f );
}
if ( !stream.SerializeBits( integerValue, bits ) )
{
return false;
}
if ( Stream::IsReading )
{
const float normalizedValue =
integerValue / float( maxIntegerValue );
value = normalizedValue
delta + min;
}
return true;
}

Of course we need error checking, so we wrap this with a macro:


#define serialize_compressed_float( stream, value, min, max )        \
do \
{ \
if ( !serialize_float_internal( stream, value, min, max ) ) \
{ \
return false; \
} \
} while (0)

And now the basic interface is complete. We can serialize both compressed and uncompressed floating point values over the network.


Serializing Vectors and Quaternions


Once you can serialize float values it’s trivial to serialize vectors over the network. I use a modified version of the vectorial library in my projects and implement serialization for its vector type like this:


template <typename Stream>
bool serialize_vector_internal( Stream & stream,
vec3f & vector )
{
float values[3];
if ( Stream::IsWriting )
{
vector.store( values );
}
serialize_float( stream, values[0] );
serialize_float( stream, values[1] );
serialize_float( stream, values[2] );
if ( Stream::IsReading )
{
vector.load( values );
}
return true;
}

#define serialize_vector( stream, value ) \
do \
{ \
if ( !serialize_vector_internal( stream, value ) ) \
{ \
return false; \
} \
} \
while(0)

If your vector is bounded in some range, then you can compress it:


template <typename Stream>
bool serialize_compressed_vector_internal( Stream & stream,
vec3f & vector,
float min,
float max,
float res )
{
float values[3];
if ( Stream::IsWriting )
{
vector.store( values );
}
serialize_compressed_float( stream, values[0], min, max, res );
serialize_compressed_float( stream, values[1], min, max, res );
serialize_compressed_float( stream, values[2], min, max, res );
if ( Stream::IsReading )
{
vector.load( values );
}
return true;
}

Notice how we are able to build more complex serialization using the primitives we’re already created. Using this approach you can easily extend the serialization to support anything you need.


Serializing Strings and Arrays


What if you need to serialize a string over the network?


Is it a good idea to send a string over the network with null termination? Not really. You’re just asking for trouble! Instead, serialize the string as an array of bytes with the string length in front. Therefore, in order to send a string over the network, we have to work out how to send an array of bytes.


First observation. Why waste effort bitpacking an array of bytes into your bit stream just so they are randomly shifted by [0,7] bits? Why not align to byte so you can memcpy the array of bytes directly into the packet?


To align a bitstream just work out your current bit index in the stream and how many bits of padding are needed until the current bit index divides evenly into 8, then insert that number of padding bits.


For bonus points, pad up with zero bits to add entropy so that on read you can verify that yes, you are reading a byte align and yes, it is indeed padded up with zero bits to the next whole byte bit index. If a non-zero bit is discovered in the padding, abort serialize read and discard the packet.


Here’s my code to align a bit stream to byte:


void BitWriter::WriteAlign()
{
const int remainderBits = m_bitsWritten % 8;
if ( remainderBits != 0 )
{
uint32_t zero = 0;
WriteBits( zero, 8 - remainderBits );
assert( ( m_bitsWritten % 8 ) == 0 );
}
}

bool BitReader::ReadAlign()
{
const int remainderBits = m_bitsRead % 8;
if ( remainderBits != 0 )
{
uint32_t value = ReadBits( 8 - remainderBits );
assert( m_bitsRead % 8 == 0 );
if ( value != 0 )
return false;
}
return true;
}

#define serialize_align( stream ) \
do \
{ \
if ( !stream.SerializeAlign() ) \
return false; \
} while (0)

Now we can align to byte prior to writing an array of bytes, letting us use memcpy for the bulk of the array data. The only wrinkle is because the bitpacker works at the word level, it’s necessary to have special handling for the head and tail portions. Because of this, the code is quite complex and is omitted for brevity. You can find it in the sample code for this article.


The end result of all this is a serialize_bytes primitive that we can use to serialize a string as a length followed by the string data, like so:


template <typename Stream>
bool serialize_string_internal( Stream & stream,
char string,
int buffer_size )
{
uint32_t length;
if ( Stream::IsWriting )
{
length = strlen( string );
assert( length < buffer_size - 1 );
}
serialize_int( stream, length, 0, buffer_size - 1 );
serialize_bytes( stream, (uint8_t
)string, length );
if ( Stream::IsReading )
{
string[length] = ‘\0’;
}
}

#define serialize_string( stream, string, buffer_size ) \
do \
{ \
if ( !serialize_string_internal( stream, \
string, \
buffer_size ) ) \
{ \
return false; \
} \
} while (0)

This is an ideal string format because it lets us quickly reject malicious data, vs. having to scan through to the end of the packet searching for ‘\0’ before giving up. This is important because otherwise protocol level attacks could be crafted to degrade your server’s performance by making it do extra work.


Serializing Array Subsets


When implemeting a game network protocol, sooner or later you need to serialize an array of objects over the network. Perhaps the server needs to send object state down to the client, or there is an array of messages to be sent.


This is straightforward if you are sending all objects in the array - just iterate across the array and serialize each object in turn. But what if you want to send a subset of the array?


The simplest approach is to iterate across all objects in the array and serialize a bit per-object if that object is to be sent. If the value of the bit is 1 then the object data follows in the bit stream, otherwise it’s ommitted:


template <typename Stream>
bool serialize_scene_a( Stream & stream, Scene & scene )
{
for ( int i = 0; i < MaxObjects; ++i )
{
serialize_bool( stream, scene.objects[i].send );
if ( !scene.objects[i].send )
{
if ( Stream::IsReading )
{
memset( &scene.objects[i], 0, sizeof( Object ) );
}
continue;
}
serialize_object( stream, scene.objects[i] );
}
return true;
}

This approach breaks down as the size of the array gets larger. For example, for an array size of size 4096, then 4096 / 8 = 512 bytes spent on skip bits. That’s not good. Can we switch it around so we take overhead propertional to the number of objects sent instead of the total number of objects in the array?


We can but now, we’ve done something interesting. We’re walking one set of objects in the serialize write (all objects in the array) and are walking over a different set of objects in the serialize read (subset of objects sent).


At this point the unified serialize function concept starts to breaks down, and in my opinion, it’s best to separate the read and write back into separate functions, because they have so little in common:


bool write_scene_b( WriteStream & stream, Scene & scene )
{
int num_objects_sent = 0;
for ( int i = 0; i < MaxObjects; ++i )
{
if ( scene.objects[i].send )
num_objects_sent++;
}
write_int( stream, num_objects_sent, 0, MaxObjects );
for ( int i = 0; i < MaxObjects; ++i )
{
if ( !scene.objects[i].send )
{
continue;
}
write_int( stream, i, 0, MaxObjects - 1 );
write_object( stream, scene.objects[i] );
}
return true;
}

bool read_scene_b( ReadStream & stream, Scene & scene )
{
memset( &scene, 0, sizeof( scene ) );
int num_objects_sent;
read_int( stream, num_objects_sent, 0, MaxObjects );
for ( int i = 0; i < num_objects_sent; ++i )
{
int index;
read_int( stream, index, 0, MaxObjects - 1 );
read_object( stream, scene.objects[index] );
}
return true;
}

One more point. The code above walks over the set of objects twice on serialize write. Once to determine the number of changed objects and a second time to actually serialize the set of changed objects. Can we do it in one pass instead? Absolutely! You can use another trick, rather than serializing the # of objects in the array up front, use a sentinel value to indicate the end of the array:


bool write_scene_c( WriteStream & stream, Scene & scene )
{
for ( int i = 0; i < MaxObjects; ++i )
{
if ( !scene.objects[i].send )
{
continue;
}
write_int( stream, i, 0, MaxObjects );
write_object( stream, scene.objects[i] );
}
write_int( stream, MaxObjects, 0, MaxObjects );
return true;
}

bool read_scene_c( ReadStream & stream, Scene & scene )
{
memset( &scene, 0, sizeof( scene ) );
while ( true )
{
int index; read_int( stream, index, 0, MaxObjects );
if ( index == MaxObjects )
{
break;
}
read_object( stream, scene.objects[index] );
}
return true;
}

The above technique works great if the objects sent are a small percentage of total objects. But what if a large number of objects are sent, lets say half of the 4000 objects in the scene. That’s 2000 object indices with each index costing 12 bits… that’s 24000 bits or 3000 bytes (almost 3k!) in your packet wasted on indexing.


You can reduce this overhead by encoding each object index relative to the previous object index. Think about it, you’re walking from left to right along an array, so object indices start at 0 and go up to MaxObjects - 1. Statistically speaking, you’re quite likely to have objects that are close to each other and if the next index is +1 or even +10 or +30 from the previous one, on average, you’ll need quite a few less bits to represent that difference than an absolute index.


Here’s one way to encode the object index as an integer relative to the previous object index, while spending less bits on statistically more likely values:


template <typename Stream>
bool serialize_object_index_internal( Stream & stream,
int & previous,
int & current )
{
uint32_t difference;
if ( Stream::IsWriting )
{
assert( previous < current );
difference = current - previous;
assert( difference > 0 );
}

// +1 (1 bit)
bool plusOne;
if ( Stream::IsWriting )
{
plusOne = difference == 1;
}
serialize_bool( stream, plusOne );
if ( plusOne )
{
if ( Stream::IsReading )
{
current = previous + 1;
}
previous = current;
return true;
}

// [+2,5] -> [0,3] (2 bits)
bool twoBits;
if ( Stream::IsWriting )
{
twoBits = difference <= 5;
}
serialize_bool( stream, twoBits );
if ( twoBits )
{
serialize_int( stream, difference, 2, 5 );
if ( Stream::IsReading )
{
current = previous + difference;
}
previous = current;
return true;
}

// [6,13] -> [0,7] (3 bits)
bool threeBits;
if ( Stream::IsWriting )
{
threeBits = difference <= 13;
}
serialize_bool( stream, threeBits );
if ( threeBits )
{
serialize_int( stream, difference, 6, 13 );
if ( Stream::IsReading )
{
current = previous + difference;
}
previous = current;
return true;
}

// [14,29] -> [0,15] (4 bits)
bool fourBits;
if ( Stream::IsWriting )
{
fourBits = difference <= 29;
}
serialize_bool( stream, fourBits );
if ( fourBits )
{
serialize_int( stream, difference, 14, 29 );
if ( Stream::IsReading )
{
current = previous + difference;
}
previous = current;
return true;
}

// [30,61] -> [0,31] (5 bits)
bool fiveBits;
if ( Stream::IsWriting )
{
fiveBits = difference <= 61;
}
serialize_bool( stream, fiveBits );
if ( fiveBits )
{
serialize_int( stream, difference, 30, 61 );
if ( Stream::IsReading )
{
current = previous + difference;
}
previous = current;
return true;
}

// [62,125] -> [0,63] (6 bits)
bool sixBits;
if ( Stream::IsWriting )
{
sixBits = difference <= 125;
}
serialize_bool( stream, sixBits );
if ( sixBits )
{
serialize_int( stream, difference, 62, 125 );
if ( Stream::IsReading )
{
current = previous + difference;
}
previous = current;
return true;
}

// [126,MaxObjects+1]
serialize_int( stream, difference, 126, MaxObjects + 1 );
if ( Stream::IsReading )
{
current = previous + difference;
}
previous = current;
return true;
}

template <typename Stream>
bool serialize_scene_d( Stream & stream, Scene & scene )
{
int previous_index = -1;

if ( Stream::IsWriting )
{
for ( int i = 0; i < MaxObjects; ++i )
{
if ( !scene.objects[i].send )
{
continue;
}
write_object_index( stream, previous_index, i );
write_object( stream, scene.objects[i] );
}
write_object_index( stream, previous_index, MaxObjects );
}
else
{
while ( true )
{
int index;
read_object_index( stream, previous_index, index );
if ( index == MaxObjects )
{
break;
}
read_object( stream, scene.objects[index] );
}
}
return true;
}

But what about the worst case? Won’t we spent more bits when indices are >= +126 apart than on an absolute index? Yes we do, but how many of these worst case indices fit in an array of size 4096? Just 32. It’s nothing to worry about.


Protocol IDs, CRC32 and Serialization Checks


We are nearly at the end of this article, and you can see by now that we are sending a completely unattributed binary stream. It’s essential that read and write match perfectly, which is of course why the serialize functions are so great, it’s hard to desync something when you unify read and write.


But accidents happen, and when they do this system can seem like a stack of cards. What if you somehow desync read and write? How can you debug this? What if somebody tries to connect to your latest server code with an old version of your client?


One technique to protect against this is to include a protocol id in your packet. For example, it could be a combination of a unique number for your game, plus the hash of your protocol version and a hash of your game data. Now if a packet comes in from an incompatible game version, it’s automatically discarded because the protocol ids don’t match:


[protocol id] (64bits)
(packet data)

The next level of protection is to pass a CRC32 over your packet and include that in the header. This lets you pick up corrupt packets (these do happen, remember that the IP checksum is just 16 bits…). Now your packet header looks like this:


[protocol id] (64bits)
[crc32] (32bits)
(packet data)

At this point you may be wincing. Wait. I have to take 8+4 = 12 bytes of overhead per-packet just to implement my own checksum and protocol id? Well actually, you don’t. You can take a leaf out of how IPv4 does their checksum, and make the protocol id a magical prefix.


This means you don’t actually send it, and rely on the fact that if the CRC32 is calculated as if the packet were prefixed by the protocol id, then the CRC32 will be incorrect if the sender does not have the same protocol id as the receiver, thus saving 8 bytes per-packet:


[protocol id] (64bits)   // not actually sent, but used to calc crc32
[crc32] (32bits)
(packet data)

One final technique, perhaps as much a check against programmer error on your part and malicious senders (although redundant once you encrypt and sign your packet) is the serialization check. Basically, somewhere mid-packet, either before or after a complicated serialization section, just write out a known 32 bit integer value, and check that it reads back in on the other side with the same value. If the serialize check value is incorrect abort read and discard the packet.


I like to do this between sections of my packet as I write them, so at least I know which part of my packet serialization has desynced read and write as I’m developing my protocol. Another cool trick I like to use is to always serialize a protocol check at the very end of the packet, to detect accidental packet truncation (which happens more often than you would think).


Now the packet looks something like this:


[protocol id] (64bits)   // not actually sent, but used to calc crc32
[crc32] (32bits)
(packet data)
[end of packet serialize check] (32 bits)

This is great packet structure to use during development.

译文

注意 :这篇译文对应的是下面的原作者原文旧版本

译文出处

译者:崔嘉艺(milan21) 审校:陈敬凤(nunu)

在这个系列文章中,我将完全从头开始构建一个专业级别的客户端/服务器游戏网络协议,只使用了C++编译器和一组UDP套接字。如果你正在寻找一个关于如何实现你自己的游戏网络协议方面的详细、实用实现,那么这个系列的文章对你来说就再适合不过了。

大家好,我是Glenn Fiedler,欢迎阅读《构建游戏网络协议》系列教程的第二篇文章。

在前面的文章里,我们讨论了在多人在线网络游戏里面读取和写入网络包的不同方法。我们很快就否决了通过文本的格式比如XMLJSON来发送游戏状态的办法因为它们确实在效率上存在比较大的问题,因此我们决定用自定义的二进制格式进行代替。

我们实现了一个位打包器(bitpacker),所以我们无需手动将几个布尔变量聚成一个8位比特值(以便为了节省空间),也无需考虑大端小端问题,可以每次写入一个完整的单词而不需要将单词拆成一个个字符,再考虑如何用字节表示它们,这使得位打包器既非常简单也工作的非常快,也无需考虑与平台有关的细节。

但是我们仍然遗留了以下这些问题需要解决:

1. 我们需要实现一个方法来判断整数值是否超出预期范围,如果超出了就要中止网络包的读取和解析,因为会有一些不怀好意的人给我们发送恶意网络包希望我们的程序和内存崩溃掉。网络包的读取和解析的中止必须是自动化的,而且不能使用异常处理,因为异常处理太慢了会拖累我们的程序。

2. 如果独立的读取和写入函数是手动编解码的,那么维护它们真的是一个噩梦。我们希望能够为包一次性的编写好序列化代码并且没有任何运行时的性能消耗(主要是额外的分支、虚化等等)。

我们应该如何实现上面的这些目标? 请继续阅读,我将向你展示如何用C++来实现这些功能。开发和完善这些技术花费了我不少时间,所以我希望这些内容对你来说是有帮助的,至少是一个很好的选择值得考虑是否要替换你目前采用的方案,或者可以与你在其他游戏看到的这个问题解决方案相结合,看是否能得到更好的解决方案。


统一的数据包序列化功能

让我们从我们的目标开始。这就是我们在本文结束的时候希望得到的东西:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
struct PacketA
{
int x,y,z;
template <typename Stream> bool Serialize( Stream & stream )
{
serialize_bits( stream, x, 32 );
serialize_bits( stream, y, 32 );
serialize_bits( stream, z, 32 );
return true;
}
};
struct PacketB
{
int numElements;
int elements[MaxElements];
template <typename Stream> bool Serialize( Stream & stream )
{
serialize_int( stream, numElements, 0, MaxElements );
for ( int i = 0; i < numElements; ++i )
serialize_bits( buffer, elements[i], 32 );
return true;
}
};
struct PacketC
{
bool x;
short y;
int z;
template <typename Stream> bool Serialize( Stream & stream )
{
serialize_int( stream, x, 8 );
serialize_int( stream, y, 16 );
serialize_int( stream, z, 32 );
return true;
}
};


看下上面的代码片段,可以看到每个数据包结构里面都只有一个单独的序列化函数,而不是有互相独立的序列化读取和序列化写入函数。这非常的棒!它把整个序列化代码一分为二,你可能需要做很多努力来实现序列化读取和写入(因为读取的过程是数据解析并装入本地内存,写入的的过程是将本地的数据写到消息体,会有比较大的差异,所以代码基本是一分为二,一半用于读取,一半用于写入)。

如果要让这项工作变得有效,诀窍在于让流类型的序列化函数模板化。在我的系统中有两个流类型:ReadStream类和WriteStream类。每个类都有相同的一套方法,但实际上它们没有任何关系。一个类负责从比特流读取值到变量中,另外一个类负责把变量的值写到流中。ReadStreamWriteStream只是上一篇文章中BitReaderBitWriter类的一个高层次封装。

当然也有其他方法可以用来代替。如果你不喜欢用模板的话,你可以使用一个纯虚的基类作为流的接口,然后分别实现读取和写入类来实现这个流接口。但是如果你这么做的话,就要在每个序列化调用的时候发生了一次虚函数调用。这种方法对我来说开销似乎比较大。

实现这个功能的另外一种方法是实现一个超级棒的流类型,可以通过配置而在运行时进入读取或者写入模式。这种方法会比虚函数方法快一些,但是仍然会在每次序列化调用的时候存在分支判断到底应该是读还是写,所以它不如硬编码读取和写入函数那么快。

我更喜欢模板方法,因为它可以让编译器为项目产生经过优化的读取/写入函数。你甚至可以把序列化代码也这样实现以便让编译器为特定的读取和写入优化一大堆东西:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
struct RigidBody
{
vec3f position;
quat3f orientation;
vec3f linear_velocity;
vec3f angular_velocity;
template <typename Stream> bool Serialize( Stream & stream )
{
serialize_vector( stream, position );
serialize_quaternion( stream, orientation );
bool at_rest = Stream::IsWriting ? velocity.length() == 0 : 1;
serialize_bool( stream, at_rest );
if ( !at_rest )
{
serialize_vector( stream, linear_velocity );
serialize_vector( stream, angular_velocity );
}
else if ( Stream::IsReading )
{
linear_velocity = vec3f(0,0,0);
angular_velocity = vec3f(0,0,0);
}
return true;
}
};


虽然这看起来很没有效率,但是实际上并不是!这个函数经过模板特化以后会根据流的类型优化了所有分支。这很整齐漂亮吧?





而ReadStream和WriteStream是这样的 :
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class WriteStream
{
public:

enum { IsWriting = 1 };
enum { IsReading = 0 };
// ...
};

class ReadStream
{
public:

enum { IsWriting = 0 };
enum { IsReading = 1 };
// ..
}


边界检查和终止读取

现在我们通过编译器实现了生成优化过的序列化读取/写入函数,我们还需要一些方法来做序列化读取时候的自动错误检测以便让我们不受恶意网络包的影响。

我们要做的第一件事情是把允许的大小范围也传给序列化函数而不仅仅是所需的比特数量。试想一下如果有了最小和最大的范围,序列化函数就能自己算出所需的比特数量:

serialize_int(stream, numElements, 0, MaxElements );

这种做法开辟了一类新的方法,可以非常容易地支持带符号整数的序列化并且序列化函数可以检测从网络中读取的值并确保这个值一定在期望的范围内。如果这个值超出范围了,就立即中止序列化读取并丢弃这个数据包。

因为我们没有办法使用异常来处理这种中止(因为异常太慢了),所以上面的方式是我比较喜欢的处理方式。

在我的环境中,serialize_int其实并不是一个函数,它实际是一个如下面代码所示的宏:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#define serialize_int( stream, value, min, max) \
do \
{ \
assert( min < max); \
int32_tint32_value; \
if ( Stream::IsWriting) \
{ \
assert( value >= min); \
assert( value <= max); \
int32_value = (int32_t)value; \
} \
if ( !stream.SerializeInteger( int32_value, min, max ) ) \
return false; \
if ( Stream::IsReading) \
{ \
value =int32_value; \
if ( value < min || value > max) \
return false; \
} \
} while (0)


我让人觉得恐怖害怕的原因是我竟然使用了宏来插入代码来检测SerializeInteger函数的结果以及在发生错误的时候返回false。这会让人感觉到这种行为和异常处理很像,它会在出错的时候回溯堆栈到序列化调用堆栈的最顶上,但是这种处理不会带来任何的问题比如性能的消耗。在回溯的时候出现分支是非常罕见的(序列化错误非常少见)所以分支预测应该不会带来什么性能上的问题。

还有一种情况我们也需要中止序列化读取:如果流读取超出了结尾。这种情况其实也是非常罕见的,但是我们必须在每次序列化操作都进行这个检查,这是因为流读取超出结尾会造成的影响是未定义的(也就是说我们对于它能造成什么样子的结果完全是未知的,最糟糕的情况并不是代码崩溃,而是把我们的内容数据完全搞乱了,相信我,你会无比痛恨这件事情)。如果我们没有做这个检测,可能会出现程序无限循环的情况,因为读取的位置超出了缓冲区的结尾。虽然在读取的时候如果发现超出比特流结尾的时候返回0值是很常见的做法(如以前的文章提到的那样),但是返回0值也不能保证序列化函数在有循环的时候能够正确的中止。如果要确保程序是有良好定义的行为,那么这种缓冲溢出检测总是必须的。

最后一点,在序列化写入的时候如果遇到范围检测失败或者写入的地址超出流的结尾的时候,我并没有采用中止这种做法。在写入数据的时候你可能会轻松很多,因为如果有任何事情出错了,那几乎肯定是你自己导致的错误。在序列化写入的时候我们只是对每个序列化写入做了断言来确保一切是符合预期的(在范围内、写入的地址没有超出流的结尾),其他的一切都任由你来发挥。



序列化浮点数和向量

这个比特流现在只序列化类型整数的值。如果我们要序列化一个类型为浮点数的值,我们该怎么做?

我们的做法虽然看上去有点投机取巧但实际上并不是。在内存中浮点数也是像整数那样保存成一个32位的值。你的计算机根本不知道内存中的这个32位的值到底是一个整数还是一个浮点数还是一个字符串的部分。它知道的就是这仅仅是一个32位的值。幸运的是,C++语言使得我们可以直接对这个基础属性进行控制(其他语言不行,因为底层被封装掉了,这也是C++被认为不好的地方之一,很多现代语言都禁止了这种做法)。

你可以通过一个联合体来访问看上去是整数的浮点数:

1
2
3
4
5
6
7
8
9
union FloatInt
{
float float_value;
uint32_t int_value;
};
FloatInt tmp;
tmp.float_value= 10.0f;
printf(“float value as an integer: %x\n”, tmp.int_value );


你也可以通过别名uint32_t的指针来做到这一点,但是因为GCC -O2会导致这种做法有些性能问题,所以我更倾向于使用联合体这种做法。我的朋友们指出(很有可能是正确的)从一个整数类型的值转换到浮点值的唯一真正标准的做法是将浮点数指针转换成uint8_t指针然后通过这个字节指针来分别引用4个字节来对这个值进行重建。虽然这对我来说似乎是一个非常愚蠢的做法。女士们,先生们。。。这毕竟是C++啊!(作者的意思是C++提供了很多接触底层的方法,我们可以尽情利用这一优势,只要能保证结果是正确的就可以了,哪怕使用一些取巧的办法也无所谓!)。

与此同时,在过去的5年里,在使用联合体这个技巧方面我还没有遇到过什么问题。下面是我如何序列化一个未压缩的浮点值:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
template <typename Stream>
bool serialize_float_internal( Stream & stream,
float & value )
{
union FloatInt
{
float float_value;
uint32_t int_value;
};
FloatInt tmp;
if ( Stream::IsWriting )
tmp.float_value = value;
bool result = stream.SerializeBits( tmp.int_value, 32 );
if ( Stream::IsReading )
value = tmp.float_value;
return result;
}


通过一个serialize_float宏来包装这个部分以方便在序列化读取的时候方便进行一致的错误检测:

1
2
3
4
5
6
#define serialize_float( stream, value) \
do \
{ \
if ( !protocol2::serialize_float_internal( stream, value )) \
return false; \
} while(0)


有些时候,你并不想把一个完整精度的浮点数进行传递。那么该如何压缩这个浮点值?第一步是将它的值限制在某个确定的范围内然后用一个整数表示方式来将它量化。

举个例子来说,如果你知道一个浮点类型的值是在区间[-10,+10],对于这个值来说可以接受的精确度是0.01,那么你可以把这个浮点数乘以100.0让它的值在区间[-1000,+1000]并在网络上将其作为一个整数进行序列化。而在接收的那一端,仅仅需要将它除以100.0来得到最初的浮点值。

下面是这个概念用序列化实现的版本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
template <typename Stream>
boolserialize_compressed_float_internal( Stream & stream,
float & value,
float min,
float max,
float res )
{
const float delta = max - min;
const float values = delta / res;
const uint32_t maxIntegerValue = (uint32_t) ceil( values );
const int bits = bits_required( 0, maxIntegerValue );
uint32_t integerValue = 0;
if ( Stream::IsWriting )
{
float normalizedValue =
clamp( ( value - min ) / delta, 0.0f, 1.0f );
integerValue = (uint32_t) floor( normalizedValue
maxIntegerValue + 0.5f );
}
if ( !stream.SerializeBits( integerValue, bits ) )
return false;
if ( Stream::IsReading )
{
const float normalizedValue =
integerValue / float( maxIntegerValue );
value = normalizedValue delta + min;
}
return true;
}


一旦你实现了对浮点数的序列化,那么将方法拓展下通过网络序列化向量和四元数就非常容易了。我在我自己的项目中使用了这个超赞的针对向量数学的向量库(https://github.com/scoopr/vectorial)的一个修改版本,并且我对这些类型实现的序列化方法如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
template <typename Stream>
boolserialize_vector_internal( Stream & stream,
vec3f & vector )
{
float values[3];
if ( Stream::IsWriting )
vector.store( values );
serialize_float( stream, values[0] );
serialize_float( stream, values[1] );
serialize_float( stream, values[2] );
if ( Stream::IsReading )
vector.load( values );
return true;
}
template <typename Stream>
boolserialize_quaternion_internal( Stream & stream,
quat4f & quaternion )
{
float values[4];
if ( Stream::IsWriting )
quaternion.store( values );
serialize_float( stream, values[0] );
serialize_float( stream, values[1] );
serialize_float( stream, values[2] );
serialize_float( stream, values[3] );
if ( Stream::IsReading )
quaternion.load( values );
return true;
}
#defineserialize_vector( stream, value) \
do \
{ \
if ( !serialize_vector_internal( stream, value )) \
return false; \
} \
while(0)
#defineserialize_quaternion( stream, value) \
do \
{ \
if ( !serialize_quaternion_internal( stream, value ) ) \
return false; \
} \
while(0)


如果你知道你的向量的取值会限制在某个范围内,你可以像下面这样对它进行压缩:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
template <typename Stream> <typename stream=“”>
bool serialize_compressed_vector_internal( Stream & stream,
vec3f & vector,
float min,
float max,
float res )
{
float values[3];
if ( Stream::IsWriting )
vector.store( values );
serialize_compressed_float( stream, values[0], min, max, res );
serialize_compressed_float( stream, values[1], min, max, res );
serialize_compressed_float( stream, values[2], min, max, res );
if ( Stream::IsReading )
vector.load( values );
return true;
}</typename>


你如果想要在网络上压缩一个方向,不要把它视为四个取值范围在[-1,+1]的成员变量的结构。如果使用这个四元数的三个最小值来表示它效果会好的多,请看下这篇文章的示例代码(地址在https://www.patreon.com/gafferongames?ty=h)来得到一个这方面的实现。

四元数是简单的超复数复数是由实数加上虚数单位 i 组成,其中i^2 = -1。 相似地,四元数都是由实数加上三个虚数单位 ijk 组成,而且它们有如下的关系: i^2 = j^2 = k^2 = -1 i^0 = j^0 = k^0 = 1 , 每个四元数都是 1ij k 的线性组合,即是四元数一般可表示为a + bk+ cj + di,其中abc d是实数。

对于ijk本身的几何意义可以理解为一种旋转,其中i旋转代表X轴与Y轴相交平面中X轴正向向Y轴正向的旋转,j旋转代表Z轴与X轴相交平面中Z轴正向向X轴正向的旋转,k旋转代表Y轴与Z轴相交平面中Y轴正向向Z轴正向的旋转,-i-j-k分别代表ijk旋转的反向旋转。

序列化字符串和数组

如果你想序列化字符串并通过网络传输该怎么办?

在网络上发送字符串的时候用Null作为终止符是个好主意么?我不这么认为。我认为这么做只是在自找麻烦!我们应该把字符串作为带长度作为前缀的字符数组。所以,要通过网络发送字符串,我们必须解决如何有效的发送字符数组的问题。

观察到的第一个事情:为什么要费那么大精力把一个字节数组按比特打包到你的比特流里?只是为了让它们随机的偏移[0,7]比特?为什么不在序列化写入之前进行按字节进行对齐?Why not align to byte so you can memcpy the array of bytes directly into the packet?如果这么处理的话,数据包里面的字节数组数据就很对齐的很准,数组的每个字节都对应着数据包里面的一个实际字节。对于每个要序列化的字节数组,你只损失了[0,7]个比特,这取决于对齐的方式,但是以我的观点来看这没什么好在意的。

如何将比特流按字节对齐?只需要在流的当前位置做些计算就可以了,找出还差写入多少个比特就能让当前比特流的比特数量被8整除,然后按照这个数字插入填充比特(比如当前比特流的比特数量是323,那么323+5才能被8整除,所以需要插入5个填充比特)。对于填充比特来说,填充的比特值都是0,这样当你序列化读取的时候你可以进行检测,如果检测的结果是正确的,那么就确实是在读取填充的部分,并且填充的部分确实是0。一直读取到下一个完整字节的比特起始位置(可以被8整除的位置)。如果检测的结果是在应该填充的地方发现了非0的比特值,那么就中止序列化读取并丢弃这个数据包。

下面是我用来将比特流按比特对齐的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
void BitWriter::WriteAlign()
{
const int remainderBits = m_bitsWritten % 8;
if ( remainderBits != 0 )
{
uint32_t zero = 0;
WriteBits( zero, 8 - remainderBits );
assert( ( m_bitsWritten % 8 ) == 0 );
}
}
bool BitReader::ReadAlign()
{
const int remainderBits = m_bitsRead % 8;
if ( remainderBits != 0 )
{
uint32_t value = ReadBits( 8 - remainderBits );
assert( m_bitsRead % 8 == 0 );
if ( value != 0 )
return false;
}
return true;
}
#define serialize_align( stream) \
do \
{ \
if ( !stream.SerializeAlign() ) \
return false; \
} while(0)


现在我们可以使用这个对齐操作来有效率的将字节数组写入比特流:因为我们已经将比特流按照字节对齐了,所以我们可以使用memcpy方法来做大部分的工作。唯一的问题在于比特读取器和比特写入器是按照双字进行工作的,所以需要一些特殊的代码来处理字节数组的头部和尾部,以确保头部的零散比特会被写入内存,并且在头部处理完毕以后,读取的位置会被正确设置到下一个字节。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
void BitWriter::WriteBytes( const uint8_t data, int bytes )
{
assert( GetAlignBits() == 0 );
assert( m_bitsWritten + bytes 8 <= m_numBits );
assert( ( m_bitsWritten % 32 ) == 0 ||
( m_bitsWritten % 32 ) == 8||
( m_bitsWritten % 32 ) == 16 ||
( m_bitsWritten % 32 ) == 24 );
int headBytes = ( 4 - ( m_bitsWritten % 32 ) / 8 ) % 4;
if ( headBytes > bytes )
headBytes = bytes;
for ( int i = 0; i < headBytes; ++i )
WriteBits( data[i], 8 );
if ( headBytes == bytes )
return;
assert( GetAlignBits() == 0 );
int numWords = ( bytes - headBytes ) / 4;
if ( numWords > 0 )
{
assert( ( m_bitsWritten % 32 ) == 0 );
memcpy( &m_data[m_wordIndex], data+headBytes, numWords4 );
m_bitsWritten += numWords 32;
m_wordIndex += numWords;
m_scratch = 0;
}
assert( GetAlignBits() == 0 );
int tailStart = headBytes + numWords 4;
int tailBytes = bytes - tailStart;
assert( tailBytes >= 0 && tailBytes < 4 );
for ( int i = 0; i < tailBytes; ++i )
WriteBits( data[tailStart+i], 8 );
assert( GetAlignBits() == 0 );
assert( headBytes + numWords 4 + tailBytes == bytes );
}
void ReadBytes( uint8_t data, int bytes )
{
assert( GetAlignBits() == 0 );
assert( m_bitsRead + bytes 8 <= m_numBits );
assert( ( m_bitsRead % 32 ) == 0 ||
( m_bitsRead % 32 ) == 8 ||
( m_bitsRead % 32 ) == 16 ||
( m_bitsRead % 32 ) == 24 );
int headBytes = ( 4 - ( m_bitsRead % 32 ) / 8 ) % 4;
if ( headBytes > bytes )
headBytes = bytes;
for ( int i = 0; i < headBytes; ++i )
data[i] = ReadBits( 8 );
if ( headBytes == bytes )
return;
assert( GetAlignBits() == 0 );
int numWords = ( bytes - headBytes ) / 4;
if ( numWords > 0 )
{
assert( ( m_bitsRead % 32 ) == 0 );
memcpy( data + headBytes, &m_data[m_wordIndex], numWords 4 );
m_bitsRead += numWords 32;
m_wordIndex += numWords;
m_scratchBits = 0;
}
assert( GetAlignBits() == 0 );
int tailStart = headBytes + numWords 4;
int tailBytes = bytes - tailStart;
assert( tailBytes >= 0 && tailBytes < 4 );
for ( int i = 0; i < tailBytes; ++i )
data[tailStart+i] = ReadBits( 8 );
assert( GetAlignBits() == 0 );
assert( headBytes + numWords 4 + tailBytes == bytes );
}
template <typename Stream>
bool serialize_bytes_internal( Stream & stream,
uint8_t data,
int bytes )
{
return stream.SerializeBytes( data, bytes );
}
#define serialize_bytes( stream, data, bytes) \
do \
{ \
if ( !serialize_bytes_internal( stream, data, bytes ) ) \
return false; \
} while(0)


现在,我们可以通过先序列化字符串长度然后序列化字符串数据的方法来序列化一个字符串:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
template <typename Stream>
bool serialize_string_internal(Stream & stream,
char string,
int buffer_size )
{
uint32_t length;
if ( Stream::IsWriting )
{
length = strlen( string );
assert( length < buffer_size - 1 );
}
serialize_int( stream, length, 0, buffer_size - 1 );
serialize_bytes( stream, (uint8_t*)string, length );
if ( Stream::IsReading )
string[length] = ‘\0’;
}
#define serialize_string( stream, string, buffer_size) \
do \
{ \
if ( !serialize_string_internal(stream, \
string,buffer_size ) ) \
return false; \
} while (0)


正如你看到的那样,可以从基本元素的序列化开始构建一个相当复杂的序列化体系。


序列化数组的子集

当实现一个游戏网络协议的时候,或早或晚总会需要序列化一个对象数组然后在网络上传递。比如说服务器也许需要把所有的物体发送给客户端,或者有时候需要发送一组事件或者消息。如果你要发送所有的物体到客户端,这是相当简单直观的,但是如果你只是想发送一个数组的一个子集怎么办?

最先想到也是最容易的办法是遍历数组的所有物体然后序列化一个bool数组,这个bool数组标记的是对应的物体是否通过网络发送。如果bool值为1那么后面会跟着物体的数据,否则就会被忽略然后下一个物体的bool值取决于流的下一个值。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
template <typename Stream>
bool serialize_scene_a( Stream & stream, Scene & scene )
{
for ( int i = 0; i < MaxObjects; ++i )
{
serialize_bool( stream, scene.objects[i].send );
if ( !scene.objects[i].send )
{
if ( Stream::IsReading )
memset( &scene.objects[i], 0, sizeof( Object ) );
continue;
}
serialize_object( stream, scene.objects[i] );
}
return true;
}



但是如果物体的数组很大怎么办?举个例子,比如场景中有4000个物体。4000 / 8 = 500。光是标记物体是否发送的BOOL数组就要500个字节的开销,即使你只发送了一两个物体也是这样!这种方法。。。。不是太好。所以我们是否能够找到一种办法来让额外的开销正比于发送的物体数目而不是正比于数组中的物体数目?

我们可以找到这么一个方法,但是现在我们已经做了一些有意思的事情。我们在序列化写入的时候遍历一个物体的集合(数组里面的所有物体)但是序列化读取的时候遍历的是一个不同的物体集合(发送物体数组的子集)。在这一点上统一的序列化函数概念就不能维系了。对于这种情况最好是把读取和写入分解成单独的函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
bool write_scene_b( protocol2::WriteStream & stream, Scene & scene )
{
int num_objects_sent = 0;
for ( int i = 0; i < MaxObjects; ++i )
{
if ( scene.objects[i].send )
num_objects_sent++;
}
write_int( stream, num_objects_sent, 0, MaxObjects );
for ( int i = 0; i < MaxObjects; ++i )
{
if ( !scene.objects[i].send )
continue;
write_int( stream, i, 0, MaxObjects - 1 );
write_object( stream, scene.objects[i] );
}
return true;
}
bool read_scene_b( protocol2::ReadStream & stream, Scene & scene )
{
memset( &scene, 0, sizeof( scene ) );
int num_objects_sent;
read_int( stream, num_objects_sent, 0, MaxObjects );
for ( int i = 0; i < num_objects_sent; ++i )
{
int index;
read_int( stream, index, 0, MaxObjects - 1 );
read_object( stream, scene.objects[index] );
}
return true;
}


此外,你可以用发生变化的对象集合来生成一个单独的数据结构,并且针对发生变化的对象集合实现序列化。但是对每个你期望能够序列化的数据结构都产生C++代码对应的数据结构体是一件非常痛苦的事情。最终你可能想要同时遍历几个数据结构然后高效的将一个动态数据结构写入比特流。这在写一些更高级的序列化方法比如增量编码的时候是一种非常平常的做法。只要你采用了这种做法,统一序列化这种做法就不再有什么意义。

我对此的建议是如果任何时候你想这么做,那么请不要担心,就把序列化读取和序列化写入分开好了。将序列化读取和序列化写入统一起来是一种非常简单的方式,但是这种方式带来的简单易用与序列化写入时动态生成数据结构的痛苦相比是不划算的。我的经验是复杂的序列化功能有时候可能会需要单独的序列化读取和序列化写入功能,但是如果可能的话,尽量让具体的序列化函数是统一读取和写入的(举个例子来说,实际的物体和事件无论何时序列化都尽量保持序列化读取和序列化写入是统一的)。

多说一点。上面的代码在一次序列化写入的时候对物体集合进行了两次遍历。一次遍历用来确定发生变化的物体数目,第二次遍历用来对发生变化的物体集合进行实际的序列化。我们是否能只用一次遍历就能处理好发生变化的物体集合的序列化?当然可以!你可以使用另外一个技巧,用一个哨兵值(sentinel value)来标记数组的结尾位置,而不是一直序列化数组中的物体直到遇到#。使用这种方法你可以在发送的时候只遍历整个数组一遍,当没有更多物体需要发送的时候,就把哨兵值序列化进数据包以表示数组结束了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
bool write_scene_c( protocol2::WriteStream & stream, Scene & scene )
{
for ( int i = 0; i < MaxObjects; ++i )
{
if ( !scene.objects[i].send )
continue;
write_int( stream, i, 0, MaxObjects );
write_object( stream, scene.objects[i] );
}
write_int( stream, MaxObjects, 0, MaxObjects );
return true;
}
bool read_scene_c( protocol2::ReadStream & stream, Scene & scene )
{
memset( &scene, 0, sizeof( scene ) );
while ( true )
{
int index; read_int( stream, index, 0, MaxObjects );
if ( index == MaxObjects )
break;
read_object( stream, scene.objects[index] );
}
return true;
}

这种做法非常的简单,并且在发送的物体集合相比较全部物体集合比例非常小的时候工作的很棒。但是如果有大量的物体需要发送,举个例子来说,整个场景中有4000个物体,有一半的物体也就是2000个需要通过网络进行发送。每个物体需要一个序号,那么就需要2000个序号,每个序号需要12比特。。。。这就是说数据包里面24000比特或者说接近30000比特(几乎是30000,不是严格是,译注:原文如此)的数据被序号浪费掉了。

可以把序号的编码方式修改下来节省数据,序号不再是全局序号,而是相对上一个物体的相对序号。想下这个问题,我们从左到右遍历一个数组,所以数组中物体的序号从0开始并且逐步增大到MaxObjects – 1。从统计学的角度来说,要发送的物体有可能是挨着很近的,这样下一个序号可能就是+1或者+10再或者是+30这样的小数字,因为我们这里用的序号是相对上一个发送的物体的,所以数字从统计意义上来说都会比较小,所以平均来讲,相比较之前的解决方案你可能需要更少的比特来表示物体的序号。(其实最差情况下我们所需的比特位也只是和前一个方案相同而已,可以证明每个序号,后一方案都比前一方案的要小,那么每个序号花费的比特位无疑不会更多,但是这种方案的主要问题在于健壮性,需要确保关于数据集合的数据包中间都不能丢,一旦中间某个包被丢掉了,那么后面的解析就完全乱掉了,实现起来更加困难一些)

下面就是这么一种编码物体序号的方式,每个序号都是相对上一个物体序号而言的,不再是全局序号,从统计的角度来讲它们会消耗更少的比特位(但是如果非常大的集合,但是发送的数组所占的比例很小,那么两种方法的差异其实是比较小的):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
template <typename Stream>
bool serialize_object_index_internal( Stream & stream,
int & previous,
int & current )
{
uint32_t difference;
if ( Stream::IsWriting )
{
assert( previous < current );
difference = current - previous;
assert( difference > 0 );
}
// +1 (1 bit)
bool plusOne;
if ( Stream::IsWriting )
plusOne = difference == 1;
serialize_bool( stream, plusOne );
if ( plusOne )
{
if ( Stream::IsReading )
current = previous + 1;
previous = current;
return true;
}
// [+2,5] -> [0,3] (2 bits)
bool twoBits;
if ( Stream::IsWriting )
twoBits = difference <= 5;
serialize_bool( stream, twoBits );
if ( twoBits )
{
serialize_int( stream, difference, 2, 5 );
if ( Stream::IsReading )
current = previous + difference;
previous = current;
return true;
}
// [6,13] -> [0,7] (3 bits)
bool threeBits;
if ( Stream::IsWriting )
threeBits = difference <= 13;
serialize_bool( stream, threeBits );
if ( threeBits )
{
serialize_int( stream, difference, 6, 13 );
if ( Stream::IsReading )
current = previous + difference;
previous = current;
return true;
}
// [14,29] -> [0,15] (4 bits)
bool fourBits;
if ( Stream::IsWriting )
fourBits = difference <= 29;
serialize_bool( stream, fourBits );
if ( fourBits )
{
serialize_int( stream, difference, 14, 29 );
if ( Stream::IsReading )
current = previous + difference;
previous = current;
return true;
}
//[30,61] -> [0,31] (5 bits)
bool fiveBits;
if ( Stream::IsWriting )
fiveBits = difference <= 61;
serialize_bool( stream, fiveBits );
if ( fiveBits )
{
serialize_int( stream, difference, 30, 61 );
if ( Stream::IsReading )
current = previous + difference;
previous = current;
return true;
}
// [62,125] -> [0,63] (6 bits)
bool sixBits;
if ( Stream::IsWriting )
sixBits = difference <= 125;
serialize_bool( stream, sixBits );
if ( sixBits )
{
serialize_int( stream, difference, 62, 125 );
if ( Stream::IsReading )
current = previous + difference;
previous = current;
return true;
}
// [126,MaxObjects+1]
serialize_int( stream, difference, 126, MaxObjects + 1 );
if ( Stream::IsReading )
current = previous + difference;
previous = current;
return true;
}
template <typename Stream>
bool serialize_scene_d( Stream & stream, Scene & scene )
{
int previous_index = -1;
if ( Stream::IsWriting )
{
for ( int i = 0; i < MaxObjects; ++i )
{
if ( !scene.objects[i].send )
continue;
write_object_index( stream, previous_index, i );
write_object( stream, scene.objects[i] );
}
write_object_index( stream, previous_index, MaxObjects );
}
else
{
while ( true )
{
int index;
read_object_index( stream, previous_index, index );
if ( index == MaxObjects )
break;
read_object( stream, scene.objects[index] );
}
}
return true;
}



通常情况下,这将节省大量的带宽,因为要发送的物体的序号往往倾向于聚在一起。在这种情况下,如果下一个物体也要被发送,那么它的的序号就是+1只需要一个比特大小。如果是+2+5的情况每个序号需要5个比特。平均下来序号所占的数据大小方面可以降低2-3倍。但是要注意的是如果是间隔序号比较大的序号的消耗将比不相关序号编码方案(每个序号都是占12比特空间)要大。这看上去非常糟糕,但是实际上并不会这么差,试想一下,即使你遇到了“最差情况”(要发送的物体的序号间隔均匀都是相差128),那么在一个4000物体的大数组里面你实际才发送几个物体?只有32个而已,所以不用担心这个问题!



协议ID和CRC32和序列化检测


阅读到这里,你可能会有一个疑惑。“哦哦哦,整个体系看上去非常脆弱啊,只有一个完全不带任何属性信息的二进制流。流里面只有一个个数据协议。你该怎么对这些信息进行反序列化读取和写入?如果某些人发送一些包含随机信息的数据包给你的服务器。你会不会在解析的时候把服务器弄崩溃掉?”

确实大部分游戏服务器就是这样工作的,但是我有个好消息告诉你和其他之前是这么做服务器的人,存在这样的技术可以减少或者几乎杜绝由于序列化层传过来的数据导致的崩溃可能性。

第一种技术是在你的数据包里面包含协议ID。一般典型的做法是,头4个字节你可以设定一些比较罕见而且独特的值,比如0x12345678,反正是这种其他人不会想着去使用的值就好了。但是说真的,把你的序列ID和协议版本的数字用散列得到一个散列值放到每个数据包的前面32比特的位置,这种方法真的工作的很好。至少如果是其他应用程序的数据包发送到了你的端口(要记住,UDP的数据包可以从任何IP任何端口在任何时间发送过来),你可以通过这32比特的数据判断出来根本就不是你的应用程序的包,然后就可以直接丢弃了。



1
2
[protocol id] (32bits)
(packet data)


下一个级别的防护是对你的数据包整体做一个CRC32的校验,并把这个校验码放到数据包的包头。这可以让你在接收的时候偶然会放过一些错误的数据包进来处理(这确实是会发生,IP的校验和是16位的,所以一堆东西不会使用16位的校验和。。。其实是通过协议ID来避免这种小概率事件的)。现在你的数据包头文件看起来像下面这样:



1
2
3
4
5
[protocol id](32bits)

[crc32](32bits)

(packet data)


如果你按着这个顺序做下来的话,现在你可能会有点畏惧。请等一下,我需要为每个数据包花费8个额外的字节来实现我自己的校验和以及协议ID么?事实上,你可以不这么做。你可以学习下看看IPv4是如何进行校验的,并让协议ID变成一个魔术前缀(Magical Prefix)。也就是说你可以不发送这个协议ID,但是发送方和接收方提前确认过这个协议ID是什么,并在计算数据包CRC32值的时候装作这个数据包带上了这个协议ID的前缀来参与计算。这样如果发送方使用的协议ID与接收方不一致的时候,CRC32的校验就会失败,这将为每个数据包节省4个字节:



1
2
3
4
5
[protocol id] (32bits)  // not actually sent, but used to calc crc32

[crc32](32bits)

(packet data)



当然,CRC32只是防止有些随机的数据包误打误撞的情况,但是对于可以轻易修改或者构建恶意数据包的头4个字节以便修正CRC32值的那些恶意发送者来说它起不到什么防护作用。要防止那些恶意发送者,你需要使用一个保密性更好的密码哈希函数,同时还需要一个密钥,这个密钥最好是在客户端尝试登陆游戏服务器之前就通过HTTPS协议在客户端和服务器之间统一好(而且要确保每个客户端的密钥都不一样,只有服务器和对应的客户端才知道密钥是什么)

最后一项技术,也可能是最有效的阻止恶意发送者的技术了(虽然会导致数据包的加密和签名有很多冗余信息),这就是序列化检查(serialization check)。这个技术基本上来说是在包的中间,在一段复杂的序列化写入之前或者之后写上一个已知的32比特整数,并在另外一端序列化读取的时候用相同的值进行检测判断。如果序列化检查值是不正确的,那么就中止序列化读取并丢弃这个数据包。


我喜欢在我的数据包每个部分之间写入一些序列化检查值,这样我至少知道我的数据包那部分已经被成功的序列化读取和写入(有些问题无论你如何努力避免都很难完全避免的)。我喜欢使用的另外一个很酷的技巧是在数据包的结尾序列化一个协议检查值,这非常非常的有用,因为它能够帮我判断是否遇到了数据包截断(非常像上一篇文章最后提到的臭名昭著的大端截断和小端截断,在开发的时候也是很让人头疼的地方)。

所以现在网络包看起来应该是像这样:



1
2
3
4
5
6
7
[protocol id] (32bits)  // not actually sent, but used to calc crc32

[crc32](32bits)

(packet data)

[end of packet serialize check] (32 bits)


如果你喜欢的话,你可以把这些协议编译出来然后在你的发布版本中检查这些数据包的内容,特别是在有非常棒的数据包加密和数据包签名支持的情况下,不过不编译也没关系,反正不再需要它们了。


下一篇预告:数据包的分包和重组

请继续阅读这个系列的下一篇文章,在这篇文章里我将向大家介绍如何拓展本章中实现的网络协议来实现数据包的分包和重组以确保你的网络包的大小在MTU限制以下。

MTU:最大传输单元,Maximum Transmission Unit,是指一种通信协议的某一层上面所能通过的最大数据包大,以字节为单位。最大传输单元这个参数通常与通信接口有关,比如网络接口卡、串口等。因为协议数据单元的包头和包尾的长度是固定的,MTU越大,则一个协议数据单元的承载的有效数据就越长,通信效率也越高。MTU越大,传送相同的用户数据所需的数据包个数也越低。MTU也不是越大越好,因为MTU越大, 传送一个数据包的延迟也越大;并且MTU越大,数据包中 bit位发生错误的概率也越大。MTU越大,通信效率越高而传输延迟增大,所以要权衡通信效率和传输延迟选择合适的MTU。以以太网传送IPv4报文为例。MTU表示的长度包含IP包头的长度,如果IP层以上的协议层发送的数据报文的长度超过了MTU,则在发送者的IP层将对数据报文进行分片,在接收者的IP层对接收到的分片进行重组。

如果你觉得这篇文章有价值的话,请在patreon上支持我的写作,这样我会写的更快。你可以在BSD 3.0许可下访问到这篇文章里面的代码。非常感谢你的支持!

【版权声明】

原文作者未做权利声明,视为共享知识产权进入公共领域,自动获得授权。


原文旧版本


Hi, I’mGlenn Fiedler and welcome to the second article in Building a GameNetwork Protocol.

In the previous article we discussed different ways to read and write packets in multiplayer games. Wequickly shot down sending game state via text formats like XML and JSON becausethey’re really inefficient and decided to write own binary protocolinstead. We implemented a bitpacker so we don’t have to round boolsup to 8 bits, solved endianness issues, wrote words at a time instead of bytesand pretty much made the bitpacker as simple and as fast as possible withoutplatform specific tricks.

Where weleft off we still had the following problems to solve:

1. Weneed a way to check if integer values are outside the expected rangeand abort packet read because people will send malicious packets tryingto make us trash memory. The packet read abort must be automatic and notuse exceptions because they’re really slow.

2. Separateread and write functions are a maintainance nightmare if those functions arecoded manually. We’d like to write the serialization code for a packet once but not pay any runtime cost (in terms of additional branching, virtuals and soon) when doing so.

How can wedo this? Read on and I’ll show you how exactly I do it in C++. It’s taken awhile for me to develop and refine this technique so I hope you’ll find ituseful and at least a good alternative to consider vs. the way youcurrently do it or how you’ve seen it done in other game engines.

Unified Packet Serialize Function

Lets startwith the goal. Here’s where we want to end up:

struct PacketA

{

int x,y,z;

template <typename Stream> bool Serialize( Stream & stream )

{

serialize_bits( stream, x, 32 );

serialize_bits( stream, y, 32 );

serialize_bits( stream, z, 32 );

return true;

}

};

struct PacketB

{

int numElements;

int elements[MaxElements];

template <typename Stream> bool Serialize( Stream & stream )

{

serialize_int( stream, numElements, 0, MaxElements );

for ( int i = 0; i < numElements; ++i )

serialize_bits( buffer, elements[i], 32 );

return true;

}

};

struct PacketC

{

bool x;

short y;

int z;

template <typename Stream> bool Serialize( Stream & stream )

{

serialize_int( stream, x, 8 );

serialize_int( stream, y, 16 );

serialize_int( stream, z, 32 );

return true;

}

};

Noticethere is a single serialize function per-packet struct instead of separate readand write functions. This is great! It halves the amount of serialization codeand now you have put in some serious effort in order to desync read andwrite.

The trickto making this work efficiently is having thestream class templated in the serialize function. There are two stream types inmy system: ReadStream and WriteStream. Each class has the same set of methods,but otherwise are not related in any way. One class reads values in from a bitstream to variables, and the other writes variables values out to a bit stream.ReadStream and WriteStream are just wrappers on top of BitReader andBitWriter classes from the previous article.

There areof course alternatives to this approach. If you dislike templates you couldhave a pure virtual base stream interface and implement that interfacewith read and write stream classes. But now you’re taking a virtualfunction for each serialize call. Seems like an excessive amount of overhead tome.

Anotheroption is to have an uber-stream class that can be configured to act in read orwrite mode at runtime. This can be faster than the virtual functionmethod, but you still have to branch per-serialize call to decide if you shouldread or write so it’s not going to be as fast as hand-coded read and write.

I preferthe templated method because it lets the compiler do the work of generatingoptimized read/write functions for you. You can even code serializefunctions like this and let the compiler optimize out a bunch of stuff whenspecializing read and write:

structRigidBody

{

vec3f position;

quat3f orientation;

vec3f linear_velocity;

vec3f angular_velocity;

template <typename Stream> bool Serialize( Stream & stream )

{

serialize_vector( stream, position );

serialize_quaternion( stream, orientation );

bool at_rest = Stream::IsWriting ? velocity.length() == 0 : 1;

serialize_bool( stream, at_rest );

if ( !at_rest )

{

serialize_vector( stream, linear_velocity );

serialize_vector( stream, angular_velocity );

}

else if ( Stream::IsReading )

{

linear_velocity = vec3f(0,0,0);

angular_velocity = vec3f(0,0,0);

}

return true;

}

};

While thismay look inefficient, it’s actually not! The templatespecialization of this function optimizes out all of the branchesaccording to the stream type. Pretty neat huh?

Bounds Checking and Abort Read

Now thatwe’ve twisted the compiler’s arm to generate optimized read/writefunctions, we need some way to automate error checking on readso we’re not vulnerable to malicious packets.

The firststep is to pass in the range of the integer to the serialize functioninstead of just the number of bits required. Think about it. The serializefunction can work out the number of bits required from the min/max values:

serialize_int(stream, numElements, 0, MaxElements );

This opensup the interface to support easy serialization of signed integer quantities andthe serialize function can check the value read in from the networkand make sure it’s within the expected range. If the value is outsiderange, abort serialize read immediately and discard the packet.

Since wecan’t use exceptions to handle this abort (too slow), here’s how I like to doit.

In mysetup serialize_int is not actually a function, it’s a sneaky macro likethis:

#defineserialize_int( stream, value, min, max) \

do \

{ \

assert( min < max); \

int32_tint32_value; \

if ( Stream::IsWriting) \

{ \

assert( value >= min); \

assert( value <= max); \

int32_value = (int32_t)value; \

} \

if ( !stream.SerializeInteger( int32_value, min, max ) ) \

return false; \

if ( Stream::IsReading) \

{ \

value =int32_value; \

if ( value < min || value > max) \

return false; \

} \

} while (0)

The reasonI’m being a terrible person here is that I’m using the macro to insert codethat checks the result of SerializeInteger and returns false onerror. This gives you exception-like behavior in the sense that itunwinds the stack back to the top of the serialization callstack on error, butyou don’t pay anything like the cost of exceptions to do this. The branch tounwind is super uncommon (serialization errors are rare) so branchprediction should have no trouble at all.

Anothercase where we need to abort is if the stream reads past the end. This is also arare branch but it’s one we do have to check on each serialization operationbecause reading past the end is undefined. If we fail to do this check, weexpose ourselves to infinite loops as we read past the end of the buffer.While it’s common to return 0 values when reading past the end of a bit stream(as per-the previous article) there is no guarantee that reading zerovalues will always result in the serialize function terminating correctlyif it has loops. This overflow check is necessary for well defined behavior.

One finalpoint. On serialize write I don’t do any abort on range checksor write past the end of the stream. You can be a lot more relaxed on thewrite since if anything goes wrong it’s pretty much guaranteed to be yourfault. Just assert that everything is as expected (in range, not past theend of stream) for each serialize write and you’re good to go.

Serializing Floats and Vectors

The bitstream only serializes integer values. How can we serialize a float value?

Seemstrickly but it’s not actually. A floating point number stored inmemory is just a 32 bit value like any other. Your computer doesn’t knowif a 32 bit word in memory is an integer, a floating point value or partof a string. It’s just a 32 bit value. Luckily, the C++ language (unlikea few others) lets us work with this fundamental property.

You canaccess the integer value behind a floating point number with a union:

union FloatInt

{

float float_value;

uint32_t int_value;

};

FloatInt tmp;

tmp.float_value= 10.0f;

printf(“float value as an integer: %x\n”, tmp.int_value );

You canalso do it via an aliased uint32_t pointer, but I’ve experienced thisbreak with GCC -O2, so I prefer the union trick instead. Friends of minepoint out (likely correctly) that the only truly standard way to get the float as an integer is to cast a pointer to the float value touint8_t and reconstruct the integer value from the four bytevalues accessed individually through the byte pointer. Seems a prettydumb way to do it to me though. Ladies and gentlemen… C++!

Meanwhilein the past 5 years I’ve had no actual problems in the field with the uniontrick. Here’s how I serialize an uncompressed float value:

template <typename Stream>

boolserialize_float_internal( Stream & stream,

float & value )

{

union FloatInt

{

float float_value;

uint32_t int_value;

};

FloatInt tmp;

if ( Stream::IsWriting )

tmp.float_value = value;

bool result = stream.SerializeBits( tmp.int_value, 32 );

if ( Stream::IsReading )

value = tmp.float_value;

return result;

}

Wrap thiswith a serialize_float macro for convenient error checking on read:

#define serialize_float( stream, value) \

do \

{ \

if ( !protocol2::serialize_float_internal( stream, value )) \

return false; \

} while(0)

Sometimesyou don’t want to transmit a full precision float. How can you compress a floatvalue? The first step is to bound that value in some known rangethen quantize it down to an integer representation.

Forexample, if you know a floating point number in is range [-10,+10] and anacceptable resolution for that value is 0.01, then you can just multiply thatfloating point number by 100.0 to get it in the range [-1000,+1000] andserialize that as an integer over the network. On the other side, justdivide by 100.0 to get back to the floating point value.

Here is ageneralized version of this concept:

template <typename Stream>

boolserialize_compressed_float_internal( Stream & stream,

float & value,

float min,

float max,

float res )

{

const float delta = max - min;

const float values = delta / res;

const uint32_t maxIntegerValue = (uint32_t) ceil( values );

const int bits = bits_required( 0, maxIntegerValue );

uint32_t integerValue = 0;

if ( Stream::IsWriting )

{

float normalizedValue =

clamp( ( value - min ) / delta, 0.0f, 1.0f );

integerValue = (uint32_t) floor( normalizedValue

maxIntegerValue + 0.5f );

}

if ( !stream.SerializeBits( integerValue, bits ) )

return false;

if ( Stream::IsReading )

{

const float normalizedValue =

integerValue / float( maxIntegerValue );

value = normalizedValue delta + min;

}

return true;

}

Once youcan serialize float values it’s trivial extend to serialize vectorsand quaternions over the network. I use a modified version of the awesome vectorial library for vector math in my projects and I implement serialization for thosetypes like this:

template <typename Stream>

boolserialize_vector_internal( Stream & stream,

vec3f & vector )

{

float values[3];

if ( Stream::IsWriting )

vector.store( values );

serialize_float( stream, values[0] );

serialize_float( stream, values[1] );

serialize_float( stream, values[2] );

if ( Stream::IsReading )

vector.load( values );

return true;

}

template <typename Stream>

boolserialize_quaternion_internal( Stream & stream,

quat4f & quaternion )

{

float values[4];

if ( Stream::IsWriting )

quaternion.store( values );

serialize_float( stream, values[0] );

serialize_float( stream, values[1] );

serialize_float( stream, values[2] );

serialize_float( stream, values[3] );

if ( Stream::IsReading )

quaternion.load( values );

return true;

}

#defineserialize_vector( stream, value) \

do \

{ \

if ( !serialize_vector_internal( stream, value )) \

return false; \

} \

while(0)

#defineserialize_quaternion( stream, value) \

do \

{ \

if ( !serialize_quaternion_internal( stream, value ) ) \

return false; \

} \

while(0)

If youknow your vector is bounded in some range, you can compress it like this:

template <typename Stream>

boolserialize_compressed_vector_internal( Stream & stream,

vec3f & vector,

float min,

float max,

float res )

{

float values[3];

if ( Stream::IsWriting )

vector.store( values );

serialize_compressed_float( stream, values[0], min, max, res );

serialize_compressed_float( stream, values[1], min, max, res );

serialize_compressed_float( stream, values[2], min, max, res );

if ( Stream::IsReading )

vector.load( values );

return true;

}

If youwant to compress an orientation over the network, don’t just compress it as avector with 8.8.8.8 bounded in the range [-1,+1]. You can do much better if youuse the smallest three representation of the quaternion. See the sample code for this article for an implementation.

Serializing Strings and Arrays

What ifyou want to serialize a string over the network?

Is it agood idea to send a string over the network with null termination? I don’tthink so. You’re just asking for trouble! Instead, treat the string as an arrayof bytes with length prefixed. So, in order to send a string over thenetwork, we have to work out how to efficiently send an array of bytes.

Firstobservation: why waste effort bitpacking an array of bytes into your bit streamjust so they are randomly shifted by shifted by [0,7] bits? Why not just alignto byte before writing the array, so the array data sits in the packetnicely aligned, each byte of the array corresponding to an actual byte in thepacket. You lose only [0,7] bits for each array of bytes serialized,depending on the alignment, but that’s nothing to be too concerned about in myopinion.

How toalign the bit stream to byte? Just work out your current bit index in thestream and how many bits are left to write until the current bit number in thebit stream divides evenly into 8, then insert that number of paddingbits. For bonus points, pad up with zero bits to add entropy so that onread you can verify that yes, you are reading a byte align and yes, it isindeed padded up with zero bits to the next whole byte bit index. If a non-zerobit is discovered in the pad bits, abort serialize read and discard thepacket.

Here’s mycode to align a bit stream to byte:

void BitWriter::WriteAlign()

{

const int remainderBits = m_bitsWritten % 8;

if ( remainderBits != 0 )

{

uint32_t zero = 0;

WriteBits( zero, 8 - remainderBits );

assert( ( m_bitsWritten % 8 ) == 0 );

}

}

bool BitReader::ReadAlign()

{

const int remainderBits = m_bitsRead % 8;

if ( remainderBits != 0 )

{

uint32_t value = ReadBits( 8 - remainderBits );

assert( m_bitsRead % 8 == 0 );

if ( value != 0 )

return false;

}

return true;

}

#define serialize_align( stream) \

do \

{ \

if ( !stream.SerializeAlign() ) \

return false; \

} while(0)

Now we canuse this align operation to write an array of bytes into the bit streamefficiently: since we are aligned to bytes we can do most of the workusing memcpy. The only wrinkle is because the bit reader and bit writer work atthe word level, so it’s neccessary to have special code to handle the head andtail portion of the byte array, to make sure any previous scratch bits areflushed to memory at the head, and the scratch is properly setup for the nextbytes after the array in the tail section.

void BitWriter::WriteBytes( const uint8_t data, int bytes )

{

assert( GetAlignBits() == 0 );

assert( m_bitsWritten + bytes 8 <= m_numBits );

assert( ( m_bitsWritten % 32 ) == 0 ||

( m_bitsWritten % 32 ) == 8||

( m_bitsWritten % 32 ) == 16 ||

( m_bitsWritten % 32 ) == 24 );

int headBytes = ( 4 - ( m_bitsWritten % 32 ) / 8 ) % 4;

if ( headBytes > bytes )

headBytes = bytes;

for ( int i = 0; i < headBytes; ++i )

WriteBits( data[i], 8 );

if ( headBytes == bytes )

return;

assert( GetAlignBits() == 0 );

int numWords = ( bytes - headBytes ) / 4;

if ( numWords > 0 )

{

assert( ( m_bitsWritten % 32 ) == 0 );

memcpy( &m_data[m_wordIndex], data+headBytes, numWords4 );

m_bitsWritten += numWords 32;

m_wordIndex += numWords;

m_scratch = 0;

}

assert( GetAlignBits() == 0 );

int tailStart = headBytes + numWords 4;

int tailBytes = bytes - tailStart;

assert( tailBytes >= 0 && tailBytes < 4 );

for ( int i = 0; i < tailBytes; ++i )

WriteBits( data[tailStart+i], 8 );

assert( GetAlignBits() == 0 );

assert( headBytes + numWords 4 + tailBytes == bytes );

}

void ReadBytes( uint8_t data, int bytes )

{

assert( GetAlignBits() == 0 );

assert( m_bitsRead + bytes 8 <= m_numBits );

assert( ( m_bitsRead % 32 ) == 0 ||

( m_bitsRead % 32 ) == 8 ||

( m_bitsRead % 32 ) == 16 ||

( m_bitsRead % 32 ) == 24 );

int headBytes = ( 4 - ( m_bitsRead % 32 ) / 8 ) % 4;

if ( headBytes > bytes )

headBytes = bytes;

for ( int i = 0; i < headBytes; ++i )

data[i] = ReadBits( 8 );

if ( headBytes == bytes )

return;

assert( GetAlignBits() == 0 );

int numWords = ( bytes - headBytes ) / 4;

if ( numWords > 0 )

{

assert( ( m_bitsRead % 32 ) == 0 );

memcpy( data + headBytes, &m_data[m_wordIndex], numWords 4 );

m_bitsRead += numWords 32;

m_wordIndex += numWords;

m_scratchBits = 0;

}

assert( GetAlignBits() == 0 );

int tailStart = headBytes + numWords 4;

int tailBytes = bytes - tailStart;

assert( tailBytes >= 0 && tailBytes < 4 );

for ( int i = 0; i < tailBytes; ++i )

data[tailStart+i] = ReadBits( 8 );

assert( GetAlignBits() == 0 );

assert( headBytes + numWords 4 + tailBytes == bytes );

}

template <typename Stream>

bool serialize_bytes_internal( Stream & stream,

uint8_t data,

int bytes )

{

return stream.SerializeBytes( data, bytes );

}

#define serialize_bytes( stream, data, bytes) \

do \

{ \

if ( !serialize_bytes_internal( stream, data, bytes ) ) \

return false; \

} while(0)

Now we canserialize a string by by serializing its length followed by the stringdata:

template <typename Stream>

bool serialize_string_internal(Stream & stream,

char string,

int buffer_size )

{

uint32_t length;

if ( Stream::IsWriting )

{

length = strlen( string );

assert( length < buffer_size - 1 );

}

serialize_int( stream, length, 0, buffer_size - 1 );

serialize_bytes( stream, (uint8_t*)string, length );

if ( Stream::IsReading )

string[length] = ‘\0’;

}

#define serialize_string( stream, string, buffer_size) \

do \

{ \

if ( !serialize_string_internal(stream, \

string,buffer_size ) ) \

return false; \

} while (0)

As you cansee, you can build up quite complicated serialization from basicprimitives.

Serializing Array Subsets

Whenimplemeting a game network protocol, sooner or later you need to serialize anarray of objects over the network. Perhaps the server needs to send all objectsdown to the client, or an array of events or messages to be sent. This isfairly straightforward if you are sending all objects in the array downto the client, but what if you want to send only a subset of the array?

The firstand simplest approach is to iterate across all objects in the array andserialize a bool per-object if that object is to be sent. If the value of thatbool 1 then the object data follows, otherwise it’s ommitted and the bool forthe next object is up next in the stream.

template <typename Stream>

bool serialize_scene_a( Stream & stream, Scene & scene )

{

for ( int i = 0; i < MaxObjects; ++i )

{

serialize_bool( stream, scene.objects[i].send );

if ( !scene.objects[i].send )

{

if ( Stream::IsReading )

memset( &scene.objects[i], 0, sizeof( Object ) );

continue;

}

serialize_object( stream, scene.objects[i] );

}

return true;

}

But whatif the array of objects is very large, like 4000 objects in the scene? 4000 / 8= 500. Ruh roh. That’s an overhead of 500 bytes, even if you only send oneor two objects! That’s… not good. Can we switch it around so we take overheadpropertional to the number of objects sent instead of the total number ofobjects in the array?

Wecan but now, we’ve done something interesting. We’re walking one set of objectsin the serialize write (all objects in the array) and are walking over adifferent set of objects in the serialize read (subset of objects sent). Atthis point the unified serialize function concept breaks down. It’s best toseparate the read and write back into separate functions in cases like this:

bool write_scene_b( protocol2::WriteStream & stream, Scene & scene )

{

int num_objects_sent = 0;

for ( int i = 0; i < MaxObjects; ++i )

{

if ( scene.objects[i].send )

num_objects_sent++;

}

write_int( stream, num_objects_sent, 0, MaxObjects );

for ( int i = 0; i < MaxObjects; ++i )

{

if ( !scene.objects[i].send )

continue;

write_int( stream, i, 0, MaxObjects - 1 );

write_object( stream, scene.objects[i] );

}

return true;

}

bool read_scene_b( protocol2::ReadStream & stream, Scene & scene )

{

memset( &scene, 0, sizeof( scene ) );

int num_objects_sent;

read_int( stream, num_objects_sent, 0, MaxObjects );

for ( int i = 0; i < num_objects_sent; ++i )

{

int index;

read_int( stream, index, 0, MaxObjects - 1 );

read_object( stream, scene.objects[index] );

}

return true;

}

Alternativelyyou could generate a separate data structure with the set of changed objects,and implement a serialize for that array of changed objects. But having togenerate a C++ data structure for each data structure you want serialized is ahuge pain in the ass. Eventually you want to walk several datastructures at the same time and effectively write out a dynamic data structureto the bit stream. This is a really common thing to do when writing moreadvanced serialization methods like delta encoding. As soon as you do it thisway, unified serialize no longer makes sense.

My adviceis that when you want to do this, don’t worry, just separate read and write.Unifying read and write are simply not worth the hassle when dynamicallygenerating a data structure on write. My rule of thumb is that complicatedserialization probably justifies separate read and writefunctions, but if possible, try to keep the leaf nodes unified if you can (eg.the actual objects / events, whatever being serialized).

One morepoint. The code above walks over the set of objects twice onserialize write. Once to determine the number of changed objects and a secondtime to actually serialize the set of changed objects. Can we do it in onepass instead? Absolutely! You can use another trick, a sentinel value toindicate the end of the array, rather than serializing the # of objects in thearray up front. This way you can iterate over the array only once onsend, and when there are no more objects to send, serialize the sentinalvalue to indicate the end of the array:

bool write_scene_c( protocol2::WriteStream & stream, Scene & scene )

{

for ( int i = 0; i < MaxObjects; ++i )

{

if ( !scene.objects[i].send )

continue;

write_int( stream, i, 0, MaxObjects );

write_object( stream, scene.objects[i] );

}

write_int( stream, MaxObjects, 0, MaxObjects );

return true;

}

bool read_scene_c( protocol2::ReadStream & stream, Scene & scene )

{

memset( &scene, 0, sizeof( scene ) );

while ( true )

{

int index; read_int( stream, index, 0, MaxObjects );

if ( index == MaxObjects )

break;

read_object( stream, scene.objects[index] );

}

return true;

}

This ispretty simple and it works great if the set of objects sent is a smallpercentage of total objects. But what if a large number of objects are sent,lets say half of the 4000 objects in the scene. That’s 2000 object indices witheach index costing 12 bits… that’s 24000 bits or 3000 bytes (almost 3k!) inyour packet wasted indexing objects.

You canreduce this by encoding each object index relative to the previous objectindex. Think about it, we’re walking left to right along an array, so objectindices start at 0 and go up to MaxObjects – 1. Statistically speaking, you’relikely to have objects that are close to each other and if the next index is +1or even +10 or +30 from the previous one, on average, you’ll need quite a fewless bits to represent that difference than youneed to represent an absolute index.

Here’s oneway to encode the object index as an integer relative to the previous objectindex, while spending less bits on statistically more likely values (eg. smalldifferences between successive object indices, vs. large ones):

template <typename Stream>

bool serialize_object_index_internal( Stream & stream,

int & previous,

int & current )

{

uint32_t difference;

if ( Stream::IsWriting )

{

assert( previous < current );

difference = current - previous;

assert( difference > 0 );

}

// +1 (1 bit)

bool plusOne;

if ( Stream::IsWriting )

plusOne = difference == 1;

serialize_bool( stream, plusOne );

if ( plusOne )

{

if ( Stream::IsReading )

current = previous + 1;

previous = current;

return true;

}

// [+2,5] -> [0,3] (2 bits)

bool twoBits;

if ( Stream::IsWriting )

twoBits = difference <= 5;

serialize_bool( stream, twoBits );

if ( twoBits )

{

serialize_int( stream, difference, 2, 5 );

if ( Stream::IsReading )

current = previous + difference;

previous = current;

return true;

}

// [6,13] -> [0,7] (3 bits)

bool threeBits;

if ( Stream::IsWriting )

threeBits = difference <= 13;

serialize_bool( stream, threeBits );

if ( threeBits )

{

serialize_int( stream, difference, 6, 13 );

if ( Stream::IsReading )

current = previous + difference;

previous = current;

return true;

}

// [14,29] -> [0,15] (4 bits)

bool fourBits;

if ( Stream::IsWriting )

fourBits = difference <= 29;

serialize_bool( stream, fourBits );

if ( fourBits )

{

serialize_int( stream, difference, 14, 29 );

if ( Stream::IsReading )

current = previous + difference;

previous = current;

return true;

}

//[30,61] -> [0,31] (5 bits)

bool fiveBits;

if ( Stream::IsWriting )

fiveBits = difference <= 61;

serialize_bool( stream, fiveBits );

if ( fiveBits )

{

serialize_int( stream, difference, 30, 61 );

if ( Stream::IsReading )

current = previous + difference;

previous = current;

return true;

}

// [62,125] -> [0,63] (6 bits)

bool sixBits;

if ( Stream::IsWriting )

sixBits = difference <= 125;

serialize_bool( stream, sixBits );

if ( sixBits )

{

serialize_int( stream, difference, 62, 125 );

if ( Stream::IsReading )

current = previous + difference;

previous = current;

return true;

}

// [126,MaxObjects+1]

serialize_int( stream, difference, 126, MaxObjects + 1 );

if ( Stream::IsReading )

current = previous + difference;

previous = current;

return true;

}

template <typename Stream>

bool serialize_scene_d( Stream & stream, Scene & scene )

{

int previous_index = -1;

if ( Stream::IsWriting )

{

for ( int i = 0; i < MaxObjects; ++i )

{

if ( !scene.objects[i].send )

continue;

write_object_index( stream, previous_index, i );

write_object( stream, scene.objects[i] );

}

write_object_index( stream, previous_index, MaxObjects );

}

else

{

while ( true )

{

int index;

read_object_index( stream, previous_index, index );

if ( index == MaxObjects )

break;

read_object( stream, scene.objects[index] );

}

}

return true;

}

In thecommon case this saves a bunch of bandwidth because object indices tend to beclustered together. In the case where the next object is sent, that’s just onebit for the next index being +1 and 5 bits per-index for +2 to +5. On averagethis gives somewhere between a 2-3X reduction in indexing overhead. But noticethat larger indices far apart cost a lot more for each index than thenon-relative encoding (12 bits per index). This seems bad but it’snot because think about it, even if you hit the ‘worst case’ (objectsindices spaced apart evenly with by +128 apart) how many of these can youactually fit into an object array 4000 large? Just 32. No worries!

Protocol IDs, CRC32 and Serialization Checks

At thispoint you may wonder. Wow. This whole thing seems reallyfragile. It’s a totally unattributed binary stream. A stack of cards. Whatif you somehow desync read and write? What if somebody just sent packetscontaining random bytes to your server. How long until you hit a sequence ofbytes that crashes you out?

I havegood news for you and the rest of the game industry since most game serversbasically work this way. There are techniques you can use to reduce orvirtually eliminate the possibility of corrupt data getting past theserialization layer.

The firsttechnique is to include a protocol id in your packet. Typically, the first 4bytes you can set to some reasonable rare and unique value, maybe 0x12345678because nobody else will ever think to use that. But seriously, put in a hashof your protocol id and your protocol version number in the first 32 bits ofeach packet and you’re doing pretty good. At least if a random packet gets sentto your port from some other application (remember UDP packets can come in fromany IP/port combination at any time) you can trivially discard it:

[protocol id] (32bits)

(packet data)

The nextlevel of protection is to pass a CRC32 over your packet and include that in theheader. This lets you pick up corrupt packets (these do happen, rememberthat the IP checksum is just 16 bits, and a bunch of stuff will not get pickedup by a checksum of 16bits…). Now your packet header looks ilke this:

protocol id

crc32

(packet data)

At thispoint you may be wincing. Wait. I have to take 8 bytes of overhead per-packetjust to implement my own checksum and protocol id? Well actually, you don’t.You can take a leaf out of how IPv4 does their checksum, and make the protocolid a magical prefix. eg: you don’t actually send it, but if bothsender and receiver knows the protocol id and the CRC32 iscalculated as if the packet were prefixed by the protocol id, the CRC32will be incorrect if the sender does not have the same protocol id as thereceiver, saving 4 bytes per-packet:

[protocol id] (32bits) // not actually sent, but used to calc crc32

crc32

(packet data)

Of courseCRC32 is only protection against random packet correction, and is no actualprotection against a malicious sender who can easily modify or construct amalicious packet and then properly adjust the CRC32 in the first four bytes. Toprotect against this you need to use a more cryptographically secure hashfunction combined with a secret key perhaps exchanged between client andserver over HTTPS by the matchmaker prior to the client attempting to connectto the game server (different key for each client, known only by the server andthat particular client).

One finaltechnique, perhaps as much a check against programmer error on your part andmalicious senders (although redundant once you encrypt and sign your packet) isthe serialization check. Basically, somewhere mid-packet, either beforeor after a complicated serialization section, just write out a known 32 bitinteger value, and check that it reads back in on the other side with the samevalue. If the serialize check value is incorrect abort read and discardthe packet.

I like todo this between sections of my packet as I write them, so at least I know whichpart of my packet serialization has desynced read and write as I’m developingmy protocol (it’s going to happen no matter how hard you try to avoid it…).Another cool trick I like to use is to serialize a protocol check at the veryend of the packet, this is super, super useful because it helps pick up packettruncations (like the infamous, little endian vs. big endian truncation of thelast word from the previous article).

So now thepacket looks something like this:

[protocol id] (32bits) // not actually sent, but used to calc crc32

crc32

(packet data)

[end of packetserialize check] (32 bits)

You canjust compile these protocol checks out in your retail build if you like,especially if you have a good encryption and packet signature, as they shouldno longer be necessary.

Up next: Packet Fragmentation and Reassembly

Read onfor the next article in this series where I show you how to extend your networkprotocol to perform packet fragmentation and reassembly so you can keep yourpacket payload under MTU.

Pleasesupport my writing on patreon, and I’llwrite new articles faster, plus you get access to example source code for thisarticle under BSD 3.0 licence. Thanks for your support!