I want to write the data into a file in binary mode. The file is opened
in binary mode by ofile=File.open(“filename”,‘wb’). However, when I
write the data using ofile.write(1), ruby will write the ascii code 31
to the file. I want it to write one byte number withe value 1. How can I
do it? I have been googling a lot and could not find the answer.
I want to write the data into a file in binary mode. The file is opened
in binary mode by ofile=File.open(“filename”,‘wb’). However, when I
write the data using ofile.write(1), ruby will write the ascii code 31
to the file. I want it to write one byte number withe value 1. How can I
do it? I have been googling a lot and could not find the answer.
You need to pack the data into a string before writing. The following
will write one unsigned byte:
buffer = [ 1 ].pack(“C”)
file.write(buffer)
See ri Array#pack for full documentation on things you can do with
pack (and String#unpack for the inverse).
There are also a nearly-identical pack'/unpack’ functions in Perl.
The “perlpacktut” manpage is a great read for this.
I want to write the data into a file in binary mode. The file is opened
in binary mode by ofile=File.open(“filename”,‘wb’).
All that does is turn off newline conversions, so if you write \n to a
file, \n will be written to the file no matter what operating system the
program is running on. Well, and this:
===
…
And sets external encoding to ASCII-8BIT unless explicitly specified.
I want to write the data into a file in binary mode. The file is opened
in binary mode by ofile=File.open(“filename”,‘wb’).
All that does is turn off newline conversions, so if you write \n to a
file, \n will be written to the file no matter what operating system the
program is running on. Well, and this:
===
…
And sets external encoding to ASCII-8BIT unless explicitly specified.
Oh no… yet another inconsistency from ruby 1.9.
The documentation you quote is correct in what it says:
Normally, transcoding a UTF-8 string (which contains non-ASCII
characters) to ASCII-8BIT would raise an exception:
You obviously aren’t aware what Extended ASCII (aka ASCII 8bit) is.
I’m not? Then please explain it to me. What I know so far about ruby 1.9
encoding I have documented at
There are some 200+ behaviours there, reverse-engineered with tests.
However, I’m quite happy to have gaps in my knowledge filled out.
Take a look at ISO 8859-1, and check what 11011111 encodes.
What has ISO-8859-1 got to do with this?
If the file had an external encoding of ISO-8859-1, then the character
“ß” (two bytes in UTF-8) would have been translated to the single byte
0xDF as it was written out.
But in this example, the external encoding is ASCII-8BIT, which is
ruby’s encoding for “ASCII characters in the low 128 values, and unknown
binary in the high 128 values”
Ruby does not let you transcode UTF-8 to ASCII-8BIT, or back again, if
there are any high-value characters in it. You can confirm this easily:
s1 = “groß”
=> “groß”
s1.encode(“ASCII-8BIT”)
Encoding::UndefinedConversionError: U+00DF from UTF-8 to ASCII-8BIT
from (irb):2:in encode' from (irb):2 from /usr/local/bin/irb192:12:in ’
s2 = “gro\xDF”.force_encoding(“ASCII-8BIT”)
=> “gro\xDF”
s2.encode(“UTF-8”)
Encoding::UndefinedConversionError: “\xDF” from ASCII-8BIT to UTF-8
from (irb):4:in encode' from (irb):4 from /usr/local/bin/irb192:12:in ’
Now, if you open a file for write and set an external encoding (the
default is nil) it means “transcode to this encoding”. But for some
reason, setting external encoding to ASCII-8BIT bypasses this rule.
In other words: binary and ASCII-8BIT are the same.
I find the behavior totally consistent: when writing a String with any encoding to BINARY then simply bytes are dumped as is and no
conversion is done. So there cannot be an encoding error.
Now I can read the file with encoding UTF-8 properly:
irb(main):009:0> t = File.open(“x”,“r:UTF-8”) {|io| io.read}
=> “gro”
irb(main):010:0> t.size
=> 4
irb(main):011:0> t.bytesize
=> 5
irb(main):012:0> t.encoding
=> #Encoding:UTF-8
irb(main):013:0> s == t
=> true
Now, if you open a file for write and set an external encoding (the
default is nil) it means “transcode to this encoding”. But for some
reason, setting external encoding to ASCII-8BIT bypasses this rule.
See above. The name ASCII-8BIT is probably not the best one to choose
for this but if you think about it as BINARY then everything fits
nicely together.
In other words: binary and ASCII-8BIT are the same.
Indeed.
I find the behavior totally consistent
Consistent with what? If you set external_encoding to something other
than nil, you are telling ruby to transcode strings on output to the
given encoding - unless the given encoding is “ASCII-8BIT”/“BINARY”.
So it seems that “ASCII-8BIT” is handled as a special case. In fact,
looking at io.c, that encoding is handled as a special case all over the
place:
Consistent with what? If you set external_encoding to something other
than nil, you are telling ruby to transcode strings on output to the
given encoding - unless the given encoding is “ASCII-8BIT”/“BINARY”.
So it seems that “ASCII-8BIT” is handled as a special case. In fact,
looking at io.c, that encoding is handled as a special case all over the
place:
The only special thing is that it is the “encoding” of a String’s raw
data, I’d say.
Now, if you open a file for write and set an external encoding (the
default is nil) it means “transcode to this encoding”. But for some
reason, setting external encoding to ASCII-8BIT bypasses this rule.
See above. The name ASCII-8BIT is probably not the best one to choose
for this but if you think about it as BINARY then everything fits
nicely together.
I think the anomaly that Brian pointed out was that the transcoding
during IO behaved differently than explicit transcoding:
Normally, transcoding a UTF-8 string (which contains non-ASCII
characters) to ASCII-8BIT would raise an exception:
irb(main):006:0> “gro”.encode(“ASCII-8BIT”)
Encoding::UndefinedConversionError: U+00DF from UTF-8 to ASCII-8BIT
But now we have a special case where it doesn’t raise an exception
It makes sense to me that transcoding to ‘binary’ (or ‘ASCII-8BIT’)
would just copy bytes but it doesn’t make sense that implicit
transcoding via IO and explicit transcoding via #encode behave
differently.
Gary W.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.