Stefan L. wrote:
Ruby must choose between treating all external data as
text unless told otherwise or treat everything as binary
unless told otherwise, because there is no general way
to know if a file is binary or text.
Yes (and I wouldn’t want it to try to guess)
Given that Ruby is mostly used to work with text, it’s a
sensible decision to use text mode by default.
That’s where I disagree. There are tons of non-text applications:
images, compression, PDFs, Marshall, DRB… Furthermore, as the OP
demonstrated, there are plenty of usage cases where files are presented
which are almost ASCII, but not quite. The default behaviour now is to
crash, rather than to treat these as streams of bytes.
I don’t want my programs to crash in these cases.
It has to default to some encoding.
That’s where I also disagree. It can default to stream of bytes.
File.open("…", :encoding => “ENV”) # Follow the environment
This is the default.
That’s what I don’t want. Given this default, I must either:
(1) Force all my source to have the correct encoding flag set
everywhere. If I don’t test for this, my programs will fail in
unexpected ways. Tests for this are awkward; they’d have to set the
environment to a certain locale (e.g. UTF-8), pass in data which is not
valid in that locale, and check no exception is raised.
(2) Use a wrapper script either to call Ruby with the correct
command-line flags, or to sanitise the environment.
Encoding.default_external=
I guess I can use that at the top of everything in bin/ directory. It
may be sufficient, but it’s annoying to have to remember that too.