Sending and receiving data

Main page -> FAQ -> Sending and receiving data

General

There are several ways to transfer data. No matter if it is a file, an image, textual or raw data, it is all the same. What is needed is a common language between the data sender and the data receiver, in order for each to understand the other. This language defines a format. Formatting data is a way to "shape" it, describing its beginning and its end, in order for the receiver to know where data starts and ends.

Many formats are of course possible, mainly depending upon the kind of data itself. Basically, we can consider that we have two kinds of data :

predictable data content
unpredictable data content

In the first case, we know what kind of content the data contains - ie the byte set that can be used is known (byte values from 32 to 127, for example). In the other case, the data content is not known - ie the byte set can be any possible byte value (from 0 to 255). This major difference will mostly determine the format of the data to be exchanged, defining then the protocol.

Concerning Predictable Data Content, as the content values are known by advance - a subset of the 256 possible byte values, it is possible to know which byte values cannot be part of the data. This hint let us the possibility to create a delimiter made of one or more of these "extra data" byte values. This delimiter, sent after data, may then help us to know where data ends. This mode where a delimiter is placed at the end of data is named LineMode.

Concerning Unpredictable Data Content, as the content values are never known by advance - byte set values starts at 0 and ends at 255, if we plan to send several data packet (for example several records) at once, then we have to tell the receiver the size of the data we're going to send for each packet. This mode to "format" data packet by prefixing it with its size is call PacketMode.

And if we plan to send at the same time some Predictable Data Content and some Unpredictable Data Content, we can then imagine a combination of both mode.

TWSocket has the ability to help you in implementing such a protocol. It is able to manage data using LineMode and PacketMode (which is Non-LineMode).

Line mode

Base-64

Base64 is an encoding process using a 64 letter alphabet where each letter representing 6 bits in the input stream. It is described in RFC 2045. All encoding and decoding procedures can be found in unit MimeUtil. The encoded data is about 33 percent larger and not human readable. Every character not used in the Base64 alphabet can be used as control character, including line end.

ASCII-hex

ASCII-hex is used in many protocols. Every character is converted into his hexadecimal equivalent and sent as such. For example the string '123' is sent as '313233'. The encoded data is twice as long and difficult human readable. Every character except 0..9, A..F can be used as control character, including line end.

Escaping

Escaping is very often used. Control characters including line end has to be chosen in a way they are as less as possible in the original data. The principle is to precede the control characters with an escape character and replace them by other characters. Very often a NULL character is escaped as well. Most of the time the data is only a little longer than original and good human readable.

Quoted-Printable

The Quoted-Printable encoding as specified in RFC 1521 is intended to represent data that largely consists of bytes that correspond to printable characters in the ASCII character set. It encodes the data in such a way that the resulting bytes are unlikely to be modified by mail transport. If the data being encoded are mostly ASCII text, the encoded form of the data remains largely recognizable by humans. Procedures to encode and decode Quoted-Printable can be found in unit MimeUtil.

Examples

Packet mode

Preceding each data packet with his length

This is a very common used technique. The first 1, 2 o 4 bytes of the data represent the length of the packet. Note that it is common habitude in communications to represent the length header in Big Endian format while Intel CPU use by design Little Endian format.

Less used but also a good technique is to represent the preceding length in hex format of 2, 4 or 8 bytes. Advantage is that this part of the header is human readable.

Sending and receiving data

Contents

General

Line mode

Base-64

ASCII-hex

Escaping

Quoted-Printable

Examples

Packet mode

Preceding each data packet with his length

Fixed length data

Mixed mode

Conclusion

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools