Splitting binary data


First post (i am new to ruby :-)). Can you help?

I am using eventmachine to read in TCP segments off the network. I read
in a TCP segment that contains 4 messages. The TCP segment binary data
is shown below, where
\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\ is the
marker for each message. I would like to split the data intot he 4
messages, but am having trouble doing so. When I split the data, the
whole message gets inserted into the first array element. I understand I
may need to escape the , but how would i do that for the following
message. I can split it by unpacking to Hex, and the splitting, but that
is inefficient for my needs as I use bindata to inspect the packet. Any
help is appreciated



hroyd hroyd wrote in post #993957:


First post (i am new to ruby :-)). Can you help?

I am using eventmachine to read in TCP segments off the network. I read
in a TCP segment that contains 4 messages. The TCP segment binary data
is shown below, where
\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\ is the
marker for each message. I would like to split the data intot he 4
messages, but am having trouble doing so. When I split the data, the
whole message gets inserted into the first array element.

That means your pattern isn’t matching anywhere in the string:

str = ‘abc’
p str.split(‘e’)


Here’s what happens when the pattern matches:


str = “\xFF\xFF” +
“0xE2 0x82 0xAC” +
“\xFF\xFF” +
“0xE2 0x82 0xAC” +
“\xFF\xFF” +
“0xE2 0x82 0xAC” +
“\xFF\xFF” +
“0xE2 0x82 0xAC”

pattern = “\xFF\xFF”
p str.split(pattern)

["", “0xE2 0x82 0xAC”, “0xE2 0x82 0xAC”, “0xE2 0x82 0xAC”, “0xE2 0x82

Because your string string starts with the delimiter, there is an empty
string to the left side of the delimiter that is split.

Thanks for the reply, that works

I was trying to split on


but dropping the last \ was what I was missing


\n\x13\x00\x01 \x01\x01\x01\x01”,

Thanks for your help

“Iñaki Baz C.” [email protected] wrote in post #994264:

2011/4/20 7stud – [email protected]:

p str.split(pattern)

[“”, “a”, “b”, “c”, “d”]

Note that this fails under Ruby1.9:

p str.split(pattern)
ArgumentError: invalid byte sequence in UTF-8
from (irb):10:in `split’

I guess you missed this part:



hroyd hroyd wrote in post #994257:

Thanks for your help

Sure. Also, note that ruby lets you do this:

pattern = “\xFF” * 16
p pattern


…so that you don’t have to write that out by hand, and suffer the
inevitable typo.

2011/4/21 7stud – [email protected]:


Interesting, I also use 1.9.2, but have realized that it fails under
irb, but not in case I run the above code in a separate file.

2011/4/20 7stud – [email protected]:

p str.split(pattern)

[“”, “a”, “b”, “c”, “d”]

Note that this fails under Ruby1.9:

p str.split(pattern)
ArgumentError: invalid byte sequence in UTF-8
from (irb):10:in `split’

“Iñaki Baz C.” [email protected] wrote in post #994342:

2011/4/21 7stud – [email protected]:


Interesting, I also use 1.9.2, but have realized that it fails under
irb, but not in case I run the above code in a separate file.a

I never use irb like interfaces in any language anymore–they are

On ruby 1.9, a String object knows the encoding of itself.
And, If a String object includes byte sequences unsuitable for the
the String#split method raises error.

Not using the magic comment, it’s not the matter that a string literal
non-ASCII characters.

example: OK!!

#! ruby-1.9.2

str = “\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64”
p str.encoding #=> #Encoding:ASCII-8BIT
p str.valid_encoding? #=> true

pattern = “\xFF\xFF”
p str.split( pattern ) #=> ["", “a”, “b”, “c”, “d”]

However, using the magic comment to tell the file encoding is UTF-8,
it’s the matter that a string literal includes non-ASCII characters.

example: NG

#! ruby-1.9.2

coding: UTF-8

str = “\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64”
p str.encoding #=> #Encoding:UTF-8
p str.valid_encoding? #=> false

pattern = “\xFF\xFF”
p pattern.valid_encoding? #=> false
p str.split( pattern ) # ERROR OCCURS!!!

Avoiding this problem, you must change the encoding of the string which
non-ASCII characters into ASCII-8BIT.

example: avoiding the problem

#! ruby-1.9.2

coding: UTF-8

str = “\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64”

change the encoding of the string

str.force_encoding Encoding::ASCII_8BIT
p str.encoding #=> #Encoding:ASCII-8BIT
p str.valid_encoding? #=> true

pattern = “\xFF\xFF”.force_encoding Encoding::ASCII_8BIT
p pattern.valid_encoding? #=> true
p str.split( pattern ) #=> ["", “a”, “b”, “c”, “d”]

Kind regards,