Integer to byte string - Speed improvements

I’m writing code that needs to store an integer as a sequence of bytes.
Array#pack works for 1-, 2-, and 4-byte integers, but I want to be able
to support Bignums as well.

I have working code below. I’m happy with it. However, I thought I’d
pose the question to the list: how much faster can you make the below
code run?

The natural log of 2 (for base-2 logarithms)

Math::LOG2 = Math.log( 2 )

class Integer

Converts the integer into a string, where the value of each

character

is a byte in the bit representation of the integer.

Low-order bytes appear first in the string.

If the number of characters is not specified, the fewest number of

characters that can represent the integer is used.

def to_bytestring( num_chars=nil )
unless num_chars
bits_needed = Math.log( self ) / Math::LOG2
num_chars = ( bits_needed / 8.0 ).ceil
end
if pack_code = { 1=>‘C’, 2=>‘S’, 4=>‘L’ }[ num_chars ]
[ self ].pack( pack_code )
else
(0…(num_chars)).map{ |i|
(( self >> i*8 ) & 0xFF ).chr
}.join
end
end

Creates an integer from a byte string.

See Integer#to_bytestring for more information.

def self.from_bytestring( str )
val = 0
index = 0
str.each_byte{ |b|
val += b << 8 * index
index += 1
}
val
end

end

[
0x48,
0x6948,
0x646c726f57206f6c6c6548,
0x217962755220666f207463657073612074656577732061207369206d756e676942
].each{ |i| puts i, i.to_bytestring, ‘’ }

#=> 72
#=> H

#=> 26952
#=> Hi

#=> 121404708493354166158910792
#=> Hello World
#=>
3876042760240808297111079005855559646422331404814505579794973210389374306838850
#=> Bignum is a sweet aspect of Ruby!

I forgot to add - I’d also be interested in seeing a golf tournament on
the two methods. Of course whitespace and variable names could be
changed, but I’m interested in more terse, alternative methods of
calculating the same information.

On Thu, 31 Aug 2006 12:20:32 -0400, Phrogz [email protected] wrote:

I’m writing code that needs to store an integer as a sequence of bytes.
Array#pack works for 1-, 2-, and 4-byte integers, but I want to be able
to support Bignums as well.

s =
Marshal.dump(0xCC8080808080808080808080808080808080808080808080808080808080AA)
bytes = s[5…-2]
bytes.each_byte { |b| print "%02x " % b }
aa 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80
80
80 80 80 80 80 cc
(low byte first, as you mentioned)
or if you prefer:
rev_bytes = bytes.reverse
rev_bytes.each_byte { |b| print "%02x " % b }

As far as it being faster, dunno, probably. Here’s the benchmark:
irb(main):041:0> Process.times
{
s=Marshal.dump(0xCC8080808080808080808080808080808080808080808080808080808080AA)
}

=> #

Regards,
JJ

So, a couple bugs:

Phrogz wrote:

Low-order bytes appear first in the string.

Not necessarily

if pack_code = { 1=>'C', 2=>'S', 4=>'L' }[ num_chars ]

in Array.pack, ‘S’, and ‘L’, use system-endian order.
To use little endian order use ‘v’ and ‘V’, respectively.

Personally, I think it’s cleaner to just leave out the clause
altogether.
[…]

unless num_chars
  bits_needed = Math.log( self ) / Math::LOG2
  num_chars = ( bits_needed / 8.0 ).ceil
end

You’re ignoring a bunch of edge cases here.
0 and 1, for example.

For 0, Math.log(0) == -Infinity, which will throw an error when ‘ceil’
is called.
For 1, Math.log(1) == 0, so num_chars == 0, which isn’t quite right
either.

You’ll run into the same issue at other powers of 256, as well.

What you should actually do, instead of taking the ceiling, is round
down and add one.

You’re also ignoring negative integers. Math.log doesn’t like negative
numbers either, so you’ll need to take the absolute value. This
doesn’t solve the problem when you’re reading back in, of
differentiating between positive and negative integers that have the
same byte code.

Here’s my code. I ignore the positive/negative reading back in
problem.

#!/usr/bin/env ruby -w

The natural log of 256 (for base-256 logarithms)

Math::LOG256 = Math.log( 256 )

class Integer

Converts the integer into a string, where the value of each

character is a byte in the bit representation of the integer,

low order bytes first.

If the number of characters is not specified, the fewest number of

characters that can represent the integer is used.

def to_bytestring( num_octets=nil )

# find out how many octets are required to encode it
num_octets ||=
  (self == 0) ? 1 : ( Math.log(self.abs) / Math::LOG256 ).to_i + 1

# encode the low num_octets bytes of the integer.
shifted = (self << 8)
Array.new(num_octets).map do
  ((shifted >>= 8) & 0xFF).chr
end.join

end

Creates an integer from a byte string.

See Integer#to_bytestring for more information.

NOTE: reads in negative integers as positive

def self.from_bytestring( str )
str.reverse.split(//).inject(0) do |val, char|
(val << 8) | char[0]
end
end
end

[
-1,
0x0,
0x1,
255,
256,
-255,
-256,
0x48,
26952,
0x6948,
0x646c726f57206f6c6c6548,
0x217962755220666f207463657073612074656577732061207369206d756e676942
].each{ |i|
begin
p [i, i.to_bytestring, Integer.from_bytestring(i.to_bytestring)]
rescue StandardError => err
p err
end
}

On 8/31/06, Phrogz [email protected] wrote:

I’m writing code that needs to store an integer as a sequence of bytes.
Array#pack works for 1-, 2-, and 4-byte integers, but I want to be able
to support Bignums as well.

I have working code below. I’m happy with it. However, I thought I’d
pose the question to the list: how much faster can you make the below
code run?

The natural log of 2 (for base-2 logarithms)

Math::LOG2 = Math.log( 2 )

No need for these, I’m on a campaign to let integers be integers.

def to_bytestring( num_chars=nil )

replace these four lines:

unless num_chars
  bits_needed = Math.log( self ) / Math::LOG2

By the way, this won’t work for negative integers. Dealing with
negative numbers provides some interesting questions which I’ll ignore
for now.

  num_chars = ( bits_needed / 8.0 ).ceil
end

with

    raise RangeError.new("Can't convert negative integer to 

bytestring")
num_chars ||= self.size

if pack_code = { 1=>'C', 2=>'S', 4=>'L' }[ num_chars ]
  [ self ].pack( pack_code )
else

I might try getting rid of the multiplication

  (0...(num_chars)).map{ |i|
    (( self >> i*8 ) & 0xFF ).chr
  }.join
end

With something like:

    result = []
    i = self
    until i == 0
          result << (i & 0xFF).chr
           i >>=  8
   end

So following those ideas:
rick@frodo:/public/rubyscripts$ cat tobytes.rb
class Integer
def to_bytestring(num_chars=nil)
raise RangeError.new(“Can’t convert negative number to
bytestring”) if self < 0
num_chars ||= self.size
mask = 0xFF << (8 * num_chars - 1)
while (self & mask) == 0
mask >>= 8
num_chars -= 1
end

            result = []
            bits_left = self
            until bits_left == 0
                    result << (bits_left & 0xFF).chr
                    bits_left >>= 8
            end
            result.join
    end

    # Creates an integer from a byte string.
    # See Integer#to_bytestring for more information.
    def self.from_bytestring( str )
            val = 0
            index = 0
            str.each_byte{ |b|
                    val += b << 8 * index
                    index += 1
            }
            val
    end

end

[
0x48,
0x6948,
0x646c726f57206f6c6c6548,
0x217962755220666f207463657073612074656577732061207369206d756e676942
].each{ |i| puts i, i.to_bytestring, ‘’ }

rick@frodo:/public/rubyscripts$ ruby tobytes.rb
72
H

26952
Hi

121404708493354166158910792
Hello World

3876042760240808297111079005855559646422331404814505579794973210389374306838850
Bignum is a sweet aspect of Ruby!


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Phrogz wrote

I forgot to add - I’d also be interested in seeing a golf tournament on
the two methods. Of course whitespace and variable names could be
changed, but I’m interested in more terse, alternative methods of
calculating the same information.

v = 0x217962755220666f207463657073612074656577732061207369206d756e676942
puts [v.to_s(16)].pack(‘H*’).reverse

terse enough? :slight_smile:

cheers

Simon

Simon Kröger wrote:

v = 0x217962755220666f207463657073612074656577732061207369206d756e676942
puts [v.to_s(16)].pack(‘H*’).reverse

terse enough? :slight_smile:

Wow, that’s hot. Thanks :slight_smile:

Got any golf for the reverse? (Creating the number from the string.)

Phrogz wrote:

Simon Kröger wrote:

v = 0x217962755220666f207463657073612074656577732061207369206d756e676942
puts [v.to_s(16)].pack(‘H*’).reverse

terse enough? :slight_smile:

Wow, that’s hot. Thanks :slight_smile:

Got any golf for the reverse? (Creating the number from the string.)

Well, just do the reverse:

s = ‘Bignum is a sweet aspect of Ruby!’
puts s.reverse.unpack(‘H*’).first.to_i(16)

cheers

Simon

Noah E. wrote:

To use little endian order use ‘v’ and ‘V’, respectively.

Ah, thanks.

You’re ignoring a bunch of edge cases here.
0 and 1, for example.
[…]
What you should actually do, instead of taking the ceiling, is round
down and add one.

Thanks for pointing that out.

You’re also ignoring negative integers.

On purpose, actually. I happen to know in my domain that it would be an
error to hit that case. I prefer to keep my code lean and leave out
various validity checks that also kill the program (albeit in a
slightly cleaner way).

The natural log of 256 (for base-256 logarithms)

Math::LOG256 = Math.log( 256 )

Ah, tricky! I’m unused to thinking of crazy-high bases. That’s a nice
shortcut, thanks.

def self.from_bytestring( str )
str.reverse.split(//).inject(0) do |val, char|
(val << 8) | char[0]
end

No solution is complete without an inject. :wink:
I wonder (haven’t tested yet): is splitting on “” any faster than
splitting on //?
Introducing a string.reverse also feels to me like it would slow things
down. Will need to run some benchmarks.

Thanks for the ideas and code!

“John J.” [email protected] writes:

As far as it being faster, dunno, probably. Here’s the benchmark:
irb(main):041:0> Process.times {
s=Marshal.dump(0xCC8080808080808080808080808080808080808080808080808080808080AA)
}

=> #

I at first thought to have missed something, but Process.times doesn’t
do what you think it does:

irb(main):005:0> Process.times { p “foo” }
=> #

The block doesn’t even get called.

 Returns a +Tms+ structure (see +Struct::Tms+ on page 388) that
 contains user and system CPU times for this process.
                                    ^^^^^^^^^^^^^^^^