How to write data in binary to a file?

Hi,

I want to write the data into a file in binary mode. The file is opened
in binary mode by ofile=File.open(“filename”,‘wb’). However, when I
write the data using ofile.write(1), ruby will write the ascii code 31
to the file. I want it to write one byte number withe value 1. How can I
do it? I have been googling a lot and could not find the answer.

Thanks

Frank

frank hi [email protected] wrote:

I want to write the data into a file in binary mode. The file is opened
in binary mode by ofile=File.open(“filename”,‘wb’). However, when I
write the data using ofile.write(1), ruby will write the ascii code 31
to the file. I want it to write one byte number withe value 1. How can I
do it? I have been googling a lot and could not find the answer.

You need to pack the data into a string before writing. The following
will write one unsigned byte:

buffer = [ 1 ].pack(“C”)
file.write(buffer)

See ri Array#pack for full documentation on things you can do with
pack (and String#unpack for the inverse).

There are also a nearly-identical pack'/unpack’ functions in Perl.
The “perlpacktut” manpage is a great read for this.

frank hi wrote in post #1021069:

Hi,

I want to write the data into a file in binary mode. The file is opened
in binary mode by ofile=File.open(“filename”,‘wb’).

All that does is turn off newline conversions, so if you write \n to a
file, \n will be written to the file no matter what operating system the
program is running on. Well, and this:

===

And sets external encoding to ASCII-8BIT unless explicitly specified.

7stud – wrote in post #1021082:

I want to write the data into a file in binary mode. The file is opened
in binary mode by ofile=File.open(“filename”,‘wb’).

All that does is turn off newline conversions, so if you write \n to a
file, \n will be written to the file no matter what operating system the
program is running on. Well, and this:

===

And sets external encoding to ASCII-8BIT unless explicitly specified.

Oh no… yet another inconsistency from ruby 1.9.

The documentation you quote is correct in what it says:

irb(main):001:0> f1 = File.open("/tmp/f1",“w”)
=> #<File:/tmp/f1>
irb(main):002:0> f2 = File.open("/tmp/f2",“wb”)
=> #<File:/tmp/f2>
irb(main):003:0> f1.external_encoding
=> nil
irb(main):004:0> f2.external_encoding
=> #Encoding:ASCII-8BIT

However, the behaviour this implies is unexpected:

irb(main):005:0> f2.write(“groß”)
=> 5

Normally, transcoding a UTF-8 string (which contains non-ASCII
characters) to ASCII-8BIT would raise an exception:

irb(main):006:0> “groß”.encode(“ASCII-8BIT”)
Encoding::UndefinedConversionError: U+00DF from UTF-8 to ASCII-8BIT

But now we have a special case where it doesn’t raise an exception :frowning:

Regards,

Brian.

(Tested with “ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]”)

Did you try with File#syswrite?

Phillip G. wrote in post #1021166:

On Sat, Sep 10, 2011 at 6:52 PM, Brian C. [email protected]
wrote:

irb(main):005:0> f2.write(“gro”)
=> 5

Normally, transcoding a UTF-8 string (which contains non-ASCII
characters) to ASCII-8BIT would raise an exception:

You obviously aren’t aware what Extended ASCII (aka ASCII 8bit) is.

I’m not? Then please explain it to me. What I know so far about ruby 1.9
encoding I have documented at

There are some 200+ behaviours there, reverse-engineered with tests.

However, I’m quite happy to have gaps in my knowledge filled out.

Take a look at ISO 8859-1, and check what 11011111 encodes.

What has ISO-8859-1 got to do with this?

If the file had an external encoding of ISO-8859-1, then the character
“ß” (two bytes in UTF-8) would have been translated to the single byte
0xDF as it was written out.

But in this example, the external encoding is ASCII-8BIT, which is
ruby’s encoding for “ASCII characters in the low 128 values, and unknown
binary in the high 128 values”

Ruby does not let you transcode UTF-8 to ASCII-8BIT, or back again, if
there are any high-value characters in it. You can confirm this easily:

s1 = “groß”
=> “groß”
s1.encode(“ASCII-8BIT”)
Encoding::UndefinedConversionError: U+00DF from UTF-8 to ASCII-8BIT
from (irb):2:in encode' from (irb):2 from /usr/local/bin/irb192:12:in
s2 = “gro\xDF”.force_encoding(“ASCII-8BIT”)
=> “gro\xDF”
s2.encode(“UTF-8”)
Encoding::UndefinedConversionError: “\xDF” from ASCII-8BIT to UTF-8
from (irb):4:in encode' from (irb):4 from /usr/local/bin/irb192:12:in

Now, if you open a file for write and set an external encoding (the
default is nil) it means “transcode to this encoding”. But for some
reason, setting external encoding to ASCII-8BIT bypasses this rule.

On Sun, Sep 11, 2011 at 5:15 AM, Brian C. [email protected]
wrote:

I’m not? Then please explain it to me. What I know so far about ruby 1.9
encoding I have documented at
string19/string19.rb at master · candlerb/string19 · GitHub

irb(main):001:0> u=Encoding::BINARY
=> #Encoding:ASCII-8BIT
irb(main):002:0> u.names
=> [“ASCII-8BIT”, “BINARY”]

In other words: binary and ASCII-8BIT are the same.

I find the behavior totally consistent: when writing a String with
any encoding to BINARY then simply bytes are dumped as is and no
conversion
is done. So there cannot be an encoding error.

irb(main):003:0> s=“gro”
=> “gro”
irb(main):004:0> s.size
=> 4
irb(main):005:0> s.bytesize
=> 5
irb(main):006:0> s.encoding
=> #Encoding:UTF-8
irb(main):007:0> File.open(“x”,“wb”){|io| p io.external_encoding,
io.internal_encoding ; io.write(s)}
#Encoding:ASCII-8BIT
nil
=> 5
irb(main):008:0> File.stat(“x”).size
=> 5

Now I can read the file with encoding UTF-8 properly:

irb(main):009:0> t = File.open(“x”,“r:UTF-8”) {|io| io.read}
=> “gro”
irb(main):010:0> t.size
=> 4
irb(main):011:0> t.bytesize
=> 5
irb(main):012:0> t.encoding
=> #Encoding:UTF-8
irb(main):013:0> s == t
=> true

Now, if you open a file for write and set an external encoding (the
default is nil) it means “transcode to this encoding”. But for some
reason, setting external encoding to ASCII-8BIT bypasses this rule.

See above. The name ASCII-8BIT is probably not the best one to choose
for this but if you think about it as BINARY then everything fits
nicely together.

Kind regards

robert

On Sat, Sep 10, 2011 at 6:52 PM, Brian C. [email protected]
wrote:

irb(main):005:0> f2.write(“gro”)
=> 5

Normally, transcoding a UTF-8 string (which contains non-ASCII
characters) to ASCII-8BIT would raise an exception:

You obviously aren’t aware what Extended ASCII (aka ASCII 8bit) is.

Take a look at ISO 8859-1, and check what 11011111 encodes.


Phillip G.

phgaw.posterous.com | twitter.com/phgaw | gplus.to/phgaw

A method of solution is perfect if we can forsee from the start,
and even prove, that following that method we shall attain our aim.
– Leibniz

Robert K. wrote in post #1021429:

irb(main):002:0> u.names
=> [“ASCII-8BIT”, “BINARY”]

In other words: binary and ASCII-8BIT are the same.

Indeed.

I find the behavior totally consistent

Consistent with what? If you set external_encoding to something other
than nil, you are telling ruby to transcode strings on output to the
given encoding - unless the given encoding is “ASCII-8BIT”/“BINARY”.

So it seems that “ASCII-8BIT” is handled as a special case. In fact,
looking at io.c, that encoding is handled as a special case all over the
place:

$ egrep -n “= rb_ascii8bit” io.c
232:#define NEED_WRITECONV(fptr) ((fptr->encs.enc != NULL &&
fptr->encs.enc != rb_ascii8bit_encoding()) ||
NEED_NEWLINE_DECORATOR_ON_WRITE(fptr) || (fptr->encs.ecflags &
(ECONV_DECORATOR_MASK|ECONV_STATEFUL_DECORATOR_MASK)))
774: if (!fptr->encs.enc || (fptr->encs.enc ==
rb_ascii8bit_encoding() && !fptr->encs.enc2)) {
924: else if (fptr->encs.enc != rb_ascii8bit_encoding())
3948: fptr->encs.enc = rb_ascii8bit_encoding();
4203: if (intern == NULL && ext != rb_ascii8bit_encoding())

On Mon, Sep 12, 2011 at 7:45 PM, Brian C. [email protected]
wrote:

Consistent with what? If you set external_encoding to something other
than nil, you are telling ruby to transcode strings on output to the
given encoding - unless the given encoding is “ASCII-8BIT”/“BINARY”.

So it seems that “ASCII-8BIT” is handled as a special case. In fact,
looking at io.c, that encoding is handled as a special case all over the
place:

The only special thing is that it is the “encoding” of a String’s raw
data, I’d say.

Cheers

robert

On Sep 12, 2011, at 7:20 AM, Robert K. wrote:

Now, if you open a file for write and set an external encoding (the
default is nil) it means “transcode to this encoding”. But for some
reason, setting external encoding to ASCII-8BIT bypasses this rule.

See above. The name ASCII-8BIT is probably not the best one to choose
for this but if you think about it as BINARY then everything fits
nicely together.

I think the anomaly that Brian pointed out was that the transcoding
during IO behaved differently than explicit transcoding:

Normally, transcoding a UTF-8 string (which contains non-ASCII
characters) to ASCII-8BIT would raise an exception:

irb(main):006:0> “gro”.encode(“ASCII-8BIT”)
Encoding::UndefinedConversionError: U+00DF from UTF-8 to ASCII-8BIT

But now we have a special case where it doesn’t raise an exception :frowning:

It makes sense to me that transcoding to ‘binary’ (or ‘ASCII-8BIT’)
would just copy bytes but it doesn’t make sense that implicit
transcoding via IO and explicit transcoding via #encode behave
differently.

Gary W.