Difference between revisions of "Sending and receiving data"
Line 39: | Line 39: | ||
This is a very common used technique. The first 1, 2 o 4 bytes of the data represent the length of the packet. Note that it is habitude in communications to use [[Big Endian]] format while Intel CPU use [[Little Endian]] format. Of course if you design your own proto you can use the endian format of your choise. | This is a very common used technique. The first 1, 2 o 4 bytes of the data represent the length of the packet. Note that it is habitude in communications to use [[Big Endian]] format while Intel CPU use [[Little Endian]] format. Of course if you design your own proto you can use the endian format of your choise. | ||
− | Less used but still practice is to give the preceiding length in hex format of 2, 4 or 8 bytes. Advantage is that it is very readable in logs | + | Less used but still practice is to give the preceiding length in hex format of 2, 4 or 8 bytes. Advantage is that it is very readable in logs. |
==== Fixed length data ==== | ==== Fixed length data ==== |
Revision as of 21:41, 20 February 2006
Contents
Terminology
We use the term binary data not to make a difference to ASCII or human readable data, but for data that has an unpredictable content, meaning we cannot use LineMode just like that because whatever combination of characters you choose it could be in the data stream itself.
Solutions
There are many solutions possible, and dont just pick one. You have to think carefully on what the data is, and what to do with it. Many time a 'log' is named. This is not nececarely a logfile to peep to the data, but (specially if thirth parties involved) many time development time can be twice as long because of a protocol / format error that is not properly logged.
Think very carefull if you want a solution with or without LineMode. If you for example send a GigaByte of data and decide to use LineMode, that TWSocket will concatenate the whole Gigabyte of data before OnDataAvailable is called, consuming valuable memory.
OK maybe this is better:
Many solutions are of course possible. You can use LineMode or packet data. Using LineMode has the advantage that TWSocket will concatenate packets for you but the disadvantage that it will hold all data in memory before it will fire OnDataAvailable. If you choose for packet data then you have to concatenate data yourself but you have control on what is kept in memory or not.
LineMode
Base-64
Base64 is an easy solution as every character is encoded, and if you for example choose for #13#10 as line end then the whole encoded stream is received in one chunck. Todo: Link to the components here. Advantage is that all encoding / decoding is on board with ICS. Disavantage is that the data is larger and unreadable in logs.
Ascii-hex
Ascii-hex is used in many protocols. Every character is converted into his hexadecimal equivalent and sent as such. For example the string '123' is sent as '313233'. Disavantage is that the data is twice as long and difficult to read in logs (depending on the length and contents).
Escaping
Escaping is very often used. The principle is to preceed the forbitten characters with an escape character and replace them by (an)other character(s). Forbitten characters are the LineEnd character, the escape character, and in many cases also a NULL character. Advantage is that the data is only a little longer than original and that log is very good reading. The latter can be of importance if the data contains many readable literals, also in this case often all unprintable characters are choosed to be escapen.
Often used in many protocols is choosing for escape character 0x1B, and set high bit of the forbitten characters. While the set / reset is only a bit manipulation it is very CPU friendly, but choosing for 0x1B is mostly a bad choice in binary data, unless it is predictable that this character is not often used in it. Another disavantage is the bad reading of the escaped characters in a log.
Less often used in communications but in some cases better is to use readable characters for the escape as whell for the replacements. For example you can use '\' as escape character and 'c' for 0x13 and 'n' for 0x10 as it is a whell known behaviour in C programming language.
When data is containing many unreadable characters mixed with readable characters, and it has to be logged then it could be better to do a kind of a mix of previous techniques. Forbitten characters as whell as non printable characters can be displayed in hex-ascii and preceided by an escape character. For example 'hello '#10 will look like: 'hello%20%0A' if we choose for '%' as escape character.
Packet data
Preceiding each data packet with his length
This is a very common used technique. The first 1, 2 o 4 bytes of the data represent the length of the packet. Note that it is habitude in communications to use Big Endian format while Intel CPU use Little Endian format. Of course if you design your own proto you can use the endian format of your choise.
Less used but still practice is to give the preceiding length in hex format of 2, 4 or 8 bytes. Advantage is that it is very readable in logs.