How to write data in binary to a file?

luislavena · September 10, 2011, 2:57am

Hi,

I want to write the data into a file in binary mode. The file is opened
in binary mode by ofile=File.open(“filename”,‘wb’). However, when I
write the data using ofile.write(1), ruby will write the ascii code 31
to the file. I want it to write one byte number withe value 1. How can I
do it? I have been googling a lot and could not find the answer.

Thanks

Frank

0ffh · September 10, 2011, 3:33am

frank hi [email protected] wrote:

I want to write the data into a file in binary mode. The file is opened
in binary mode by ofile=File.open(“filename”,‘wb’). However, when I
write the data using ofile.write(1), ruby will write the ascii code 31
to the file. I want it to write one byte number withe value 1. How can I
do it? I have been googling a lot and could not find the answer.

You need to pack the data into a string before writing. The following
will write one unsigned byte:

buffer = [ 1 ].pack(“C”)
file.write(buffer)

See ri Array#pack for full documentation on things you can do with
pack (and String#unpack for the inverse).

There are also a nearly-identical pack'/unpack’ functions in Perl.
The “perlpacktut” manpage is a great read for this.

0ffh · September 10, 2011, 5:48am

frank hi wrote in post #1021069:

Hi,

I want to write the data into a file in binary mode. The file is opened
in binary mode by ofile=File.open(“filename”,‘wb’).

All that does is turn off newline conversions, so if you write \n to a
file, \n will be written to the file no matter what operating system the
program is running on. Well, and this:

===
…
And sets external encoding to ASCII-8BIT unless explicitly specified.

0ffh · September 10, 2011, 6:52pm

7stud – wrote in post #1021082:

I want to write the data into a file in binary mode. The file is opened
in binary mode by ofile=File.open(“filename”,‘wb’).

All that does is turn off newline conversions, so if you write \n to a
file, \n will be written to the file no matter what operating system the
program is running on. Well, and this:

===
…
And sets external encoding to ASCII-8BIT unless explicitly specified.

Oh no… yet another inconsistency from ruby 1.9.

The documentation you quote is correct in what it says:

irb(main):001:0> f1 = File.open("/tmp/f1",“w”)
=> #<File:/tmp/f1>
irb(main):002:0> f2 = File.open("/tmp/f2",“wb”)
=> #<File:/tmp/f2>
irb(main):003:0> f1.external_encoding
=> nil
irb(main):004:0> f2.external_encoding
=> #Encoding:ASCII-8BIT

However, the behaviour this implies is unexpected:

irb(main):005:0> f2.write(“groß”)
=> 5

Normally, transcoding a UTF-8 string (which contains non-ASCII
characters) to ASCII-8BIT would raise an exception:

irb(main):006:0> “groß”.encode(“ASCII-8BIT”)
Encoding::UndefinedConversionError: U+00DF from UTF-8 to ASCII-8BIT

But now we have a special case where it doesn’t raise an exception

Regards,

Brian.

(Tested with “ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]”)

0ffh · September 10, 2011, 7:14pm

Did you try with File#syswrite?

0ffh · September 11, 2011, 5:15am

Phillip G. wrote in post #1021166:

On Sat, Sep 10, 2011 at 6:52 PM, Brian C. [email protected]
wrote:

irb(main):005:0> f2.write(“gro”)
=> 5

Normally, transcoding a UTF-8 string (which contains non-ASCII
characters) to ASCII-8BIT would raise an exception:

You obviously aren’t aware what Extended ASCII (aka ASCII 8bit) is.

I’m not? Then please explain it to me. What I know so far about ruby 1.9
encoding I have documented at

github.com

candlerb/string19/blob/master/string19.rb

#!/usr/bin/env ruby19
# encoding: UTF-8
# This document is Copyright (C) Brian Candler 2009 and released under a
# Creative Commons Attribution-NonCommercial 3.0 Unported License.

############# CONTENTS ###################

# -1. PREAMBLE
#  0. INTRODUCTION
#  1. ENCODINGS
#  2. PROPERTIES OF ENCODINGS
#  3. STRING, FILE AND REGEXP ENCODINGS
#  4. VALID AND FIXED ENCODINGS
#  5. COMPATIBLE OBJECTS
#  6. STRING CONCATENATION
#  7. THE BINARY / ASCII-8BIT ENCODING
#  8. SINGLE CHARACTERS
#  9. EQUALITY AND COLLATION
# 10. HASH AND EQL?
# 11. UPPER AND LOWER CASE

This file has been truncated. show original

There are some 200+ behaviours there, reverse-engineered with tests.

However, I’m quite happy to have gaps in my knowledge filled out.

Take a look at ISO 8859-1, and check what 11011111 encodes.

What has ISO-8859-1 got to do with this?

If the file had an external encoding of ISO-8859-1, then the character
“ß” (two bytes in UTF-8) would have been translated to the single byte
0xDF as it was written out.

But in this example, the external encoding is ASCII-8BIT, which is
ruby’s encoding for “ASCII characters in the low 128 values, and unknown
binary in the high 128 values”

Ruby does not let you transcode UTF-8 to ASCII-8BIT, or back again, if
there are any high-value characters in it. You can confirm this easily:

s1 = “groß”
=> “groß”
s1.encode(“ASCII-8BIT”)
Encoding::UndefinedConversionError: U+00DF from UTF-8 to ASCII-8BIT
from (irb):2:in encode' from (irb):2 from /usr/local/bin/irb192:12:in ’
s2 = “gro\xDF”.force_encoding(“ASCII-8BIT”)
=> “gro\xDF”
s2.encode(“UTF-8”)
Encoding::UndefinedConversionError: “\xDF” from ASCII-8BIT to UTF-8
from (irb):4:in encode' from (irb):4 from /usr/local/bin/irb192:12:in ’

Now, if you open a file for write and set an external encoding (the
default is nil) it means “transcode to this encoding”. But for some
reason, setting external encoding to ASCII-8BIT bypasses this rule.

0ffh · September 12, 2011, 1:21pm

On Sun, Sep 11, 2011 at 5:15 AM, Brian C. [email protected]
wrote:

I’m not? Then please explain it to me. What I know so far about ruby 1.9
encoding I have documented at
string19/string19.rb at master · candlerb/string19 · GitHub

irb(main):001:0> u=Encoding::BINARY
=> #Encoding:ASCII-8BIT
irb(main):002:0> u.names
=> [“ASCII-8BIT”, “BINARY”]

In other words: binary and ASCII-8BIT are the same.

I find the behavior totally consistent: when writing a String with
any encoding to BINARY then simply bytes are dumped as is and no
conversion is done. So there cannot be an encoding error.

irb(main):003:0> s=“gro”
=> “gro”
irb(main):004:0> s.size
=> 4
irb(main):005:0> s.bytesize
=> 5
irb(main):006:0> s.encoding
=> #Encoding:UTF-8
irb(main):007:0> File.open(“x”,“wb”){|io| p io.external_encoding,
io.internal_encoding ; io.write(s)}
#Encoding:ASCII-8BIT
nil
=> 5
irb(main):008:0> File.stat(“x”).size
=> 5

Now I can read the file with encoding UTF-8 properly:

irb(main):009:0> t = File.open(“x”,“r:UTF-8”) {|io| io.read}
=> “gro”
irb(main):010:0> t.size
=> 4
irb(main):011:0> t.bytesize
=> 5
irb(main):012:0> t.encoding
=> #Encoding:UTF-8
irb(main):013:0> s == t
=> true

Now, if you open a file for write and set an external encoding (the
default is nil) it means “transcode to this encoding”. But for some
reason, setting external encoding to ASCII-8BIT bypasses this rule.

See above. The name ASCII-8BIT is probably not the best one to choose
for this but if you think about it as BINARY then everything fits
nicely together.

Kind regards

robert

0ffh · September 10, 2011, 8:07pm

On Sat, Sep 10, 2011 at 6:52 PM, Brian C. [email protected]
wrote:

irb(main):005:0> f2.write(“gro”)
=> 5

Normally, transcoding a UTF-8 string (which contains non-ASCII
characters) to ASCII-8BIT would raise an exception:

You obviously aren’t aware what Extended ASCII (aka ASCII 8bit) is.

Take a look at ISO 8859-1, and check what 11011111 encodes.

–
Phillip G.

phgaw.posterous.com | twitter.com/phgaw | gplus.to/phgaw

A method of solution is perfect if we can forsee from the start,
and even prove, that following that method we shall attain our aim.
– Leibniz

0ffh · September 12, 2011, 7:45pm

Robert K. wrote in post #1021429:

irb(main):002:0> u.names
=> [“ASCII-8BIT”, “BINARY”]

In other words: binary and ASCII-8BIT are the same.

Indeed.

I find the behavior totally consistent

Consistent with what? If you set external_encoding to something other
than nil, you are telling ruby to transcode strings on output to the
given encoding - unless the given encoding is “ASCII-8BIT”/“BINARY”.

So it seems that “ASCII-8BIT” is handled as a special case. In fact,
looking at io.c, that encoding is handled as a special case all over the
place:

$ egrep -n “= rb_ascii8bit” io.c
232:#define NEED_WRITECONV(fptr) ((fptr->encs.enc != NULL &&
fptr->encs.enc != rb_ascii8bit_encoding()) ||
NEED_NEWLINE_DECORATOR_ON_WRITE(fptr) || (fptr->encs.ecflags &
(ECONV_DECORATOR_MASK|ECONV_STATEFUL_DECORATOR_MASK)))
774: if (!fptr->encs.enc || (fptr->encs.enc ==
rb_ascii8bit_encoding() && !fptr->encs.enc2)) {
924: else if (fptr->encs.enc != rb_ascii8bit_encoding())
3948: fptr->encs.enc = rb_ascii8bit_encoding();
4203: if (intern == NULL && ext != rb_ascii8bit_encoding())

0ffh · September 13, 2011, 12:11am

On Mon, Sep 12, 2011 at 7:45 PM, Brian C. [email protected]
wrote:

Consistent with what? If you set external_encoding to something other
than nil, you are telling ruby to transcode strings on output to the
given encoding - unless the given encoding is “ASCII-8BIT”/“BINARY”.

So it seems that “ASCII-8BIT” is handled as a special case. In fact,
looking at io.c, that encoding is handled as a special case all over the
place:

The only special thing is that it is the “encoding” of a String’s raw
data, I’d say.

Cheers

robert

0ffh · September 12, 2011, 8:54pm

On Sep 12, 2011, at 7:20 AM, Robert K. wrote:

Now, if you open a file for write and set an external encoding (the
default is nil) it means “transcode to this encoding”. But for some
reason, setting external encoding to ASCII-8BIT bypasses this rule.

See above. The name ASCII-8BIT is probably not the best one to choose
for this but if you think about it as BINARY then everything fits
nicely together.

I think the anomaly that Brian pointed out was that the transcoding
during IO behaved differently than explicit transcoding:

Normally, transcoding a UTF-8 string (which contains non-ASCII
characters) to ASCII-8BIT would raise an exception:

irb(main):006:0> “gro”.encode(“ASCII-8BIT”)
Encoding::UndefinedConversionError: U+00DF from UTF-8 to ASCII-8BIT

But now we have a special case where it doesn’t raise an exception

It makes sense to me that transcoding to ‘binary’ (or ‘ASCII-8BIT’)
would just copy bytes but it doesn’t make sense that implicit
transcoding via IO and explicit transcoding via #encode behave
differently.

Gary W.

How to write data in binary to a file?

=== … And sets external encoding to ASCII-8BIT unless explicitly specified.

=== … And sets external encoding to ASCII-8BIT unless explicitly specified.

===
…
And sets external encoding to ASCII-8BIT unless explicitly specified.

===
…
And sets external encoding to ASCII-8BIT unless explicitly specified.