Difference between revisions of "Sending and receiving data"
m (→Base-64) |
|||
Line 1: | Line 1: | ||
− | == | + | == General == |
− | We use the term ''binary data'' not to make a difference to ''ASCII'' or human readable data, but for data that has an unpredictable content, meaning we cannot use | + | There are several ways to transfer data. No matter if it is a file an image ASCII or binary data it is all the same. What is needed is a certain format so that the receiver of the data knows he has a complete packet. |
+ | |||
+ | <!-- | ||
+ | We use the term ''binary data'' not to make a difference to ''ASCII'' or human readable data, but for data that has an unpredictable content, meaning we cannot use Line Mode just like that because whatever combination of characters you choose it ''could'' be in the data stream itself. | ||
== Solutions == | == Solutions == | ||
+ | --> | ||
− | Many solutions are of course possible. You can use | + | Many solutions are of course possible. You can use Line Mode or packet data. Using Line Mode has the advantage that TWSocket will concatenate packets for you but the disadvantage that it will hold all data in memory before it will fire OnDataAvailable. The latter is of course only important if packets are very large. If you choose for packet data then you have to concatenate data yourself but you have control on what is kept in memory or not. |
− | == | + | == Line Mode == |
− | + | === Base-64 === | |
− | ''Base64'' is an encoding process using a 64 letter alphabet where each letter representing 6 bits in the input stream. It is described in RFC [http://rfc.net/rfc2045.html#p24 2045]. | + | ''Base64'' is an encoding process using a 64 letter alphabet where each letter representing 6 bits in the input stream. It is described in RFC [http://rfc.net/rfc2045.html#p24 2045]. All encoding / decoding procedures can be found in [[TMimeDecode]] component. The encoded data is about 33 percent larger and not human readable. Every character not used in the Base64 alphabet can be used as control character, including line end. |
− | === | + | === ASCII-hex === |
− | '' | + | ''ASCII-hex'' is used in many protocols. Every character is converted into his hexadecimal equivalent and sent as such. For example the string '123' is sent as '313233'. The encoded data is twice as long and difficult human readable. Every character except 0..9, A..F can be used as control character, including line end. |
− | + | === Escaping === | |
− | Escaping is very often used. The principle is to | + | Escaping is very often used. Control characters including line end has to be chosen in a way they are as less as possible in the original data. The principle is to precede the control characters with an escape character and replace them by other characters. Very often a NULL character is escaped as well. The data is only a little longer than original and good human readable. |
− | Often used in many protocols is choosing for escape character 0x1B, and set high bit of the | + | ==== Examples ==== |
+ | <!-- | ||
+ | Often used in many protocols is choosing for escape character 0x1B, and set high bit of the forbidden characters. While the set / reset is only a bit manipulation it is very CPU friendly, but choosing for 0x1B is mostly a bad choice in binary data, unless it is predictable that this character is not often used in it. Another disadvantage is the bad reading of the escaped characters in a log. | ||
− | Less often used in communications but in some cases better is to use readable characters for the escape as | + | Less often used in communications but in some cases better is to use readable characters for the escape as well for the replacements. For example you can use '\' as escape character and 'c' for 0x13 and 'n' for 0x10 as it is a well known behaviour in C programming language. |
− | When data is containing many unreadable characters mixed with readable characters, and it has to be logged then it could be better to do a kind of a mix of previous techniques. | + | When data is containing many unreadable characters mixed with readable characters, and it has to be logged then it could be better to do a kind of a mix of previous techniques. Forbidden characters as well as non printable characters can be displayed in hex-ASCII and preceded by an escape character. For example 'hello '#10 will look like: 'hello%20%0A' if we choose for '%' as escape character. |
+ | --> | ||
− | + | == Packet data == | |
− | === | + | === Preceding each data packet with his length === |
− | This is a very common used technique. The first 1, 2 o 4 bytes of the data represent the length of the packet. Note that it is habitude in communications to | + | This is a very common used technique. The first 1, 2 o 4 bytes of the data represent the length of the packet. Note that it is common habitude in communications to represent the length header in [[Big Endian]] format while Intel CPU use by design [[Little Endian]] format. |
− | Less used but still practice is to give the | + | Less used but still practice is to give the preceding length in hex format of 2, 4 or 8 bytes. Advantage is that it part of the header is human readable. |
− | + | === Fixed length data === | |
− | + | == Mixed mode == | |
== Conclusion == | == Conclusion == | ||
Difficult to explain something :) | Difficult to explain something :) |
Revision as of 10:39, 21 February 2006
Contents
General
There are several ways to transfer data. No matter if it is a file an image ASCII or binary data it is all the same. What is needed is a certain format so that the receiver of the data knows he has a complete packet.
Many solutions are of course possible. You can use Line Mode or packet data. Using Line Mode has the advantage that TWSocket will concatenate packets for you but the disadvantage that it will hold all data in memory before it will fire OnDataAvailable. The latter is of course only important if packets are very large. If you choose for packet data then you have to concatenate data yourself but you have control on what is kept in memory or not.
Line Mode
Base-64
Base64 is an encoding process using a 64 letter alphabet where each letter representing 6 bits in the input stream. It is described in RFC 2045. All encoding / decoding procedures can be found in TMimeDecode component. The encoded data is about 33 percent larger and not human readable. Every character not used in the Base64 alphabet can be used as control character, including line end.
ASCII-hex
ASCII-hex is used in many protocols. Every character is converted into his hexadecimal equivalent and sent as such. For example the string '123' is sent as '313233'. The encoded data is twice as long and difficult human readable. Every character except 0..9, A..F can be used as control character, including line end.
Escaping
Escaping is very often used. Control characters including line end has to be chosen in a way they are as less as possible in the original data. The principle is to precede the control characters with an escape character and replace them by other characters. Very often a NULL character is escaped as well. The data is only a little longer than original and good human readable.
Examples
Packet data
Preceding each data packet with his length
This is a very common used technique. The first 1, 2 o 4 bytes of the data represent the length of the packet. Note that it is common habitude in communications to represent the length header in Big Endian format while Intel CPU use by design Little Endian format.
Less used but still practice is to give the preceding length in hex format of 2, 4 or 8 bytes. Advantage is that it part of the header is human readable.
Fixed length data
Mixed mode
Conclusion
Difficult to explain something :)