Strangeness when reading PDF file

luislavena · June 29, 2011, 7:40am

Hello all.

I am messing around with reading files in jruby as part of creating a
messaging system in torquebox but I am coming across some weirdness in
the command line which may also be messing up my application.

Given I have a file called “test.pdf” and I run the following code:

file = File.open(“test.pdf”,“r”)
the_pdf = file.read
the_pdf

This outputs a the PDF file contents to the command line.

Here is the output on plain jane MRI 1.8.7 IRB

-snipped a lot of output-
<</Size 228/Root 226 0 R/Info 227 0 R/ID
[]>>\nstartxref\n2084449\n%%EOF\n"
irb(main):003:0>

And here is the result on jruby (1.8 mode)

-snipped a lot of output-
<</Size 228/Root 226 0 R/Info 227 0 R/ID
[]>>\nstartxref\n2084449\n%%EOF\n"

jruby-1.6.2 :010 >
?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c

As you can see on jruby there are garbage characters in my command
prompt (the ?1;2c stuff). this is only happening in jruby from what I
can tell.

Does anyone have any insight as to the cause of this?

cheers

Jeff

jeffrey25 · June 29, 2011, 6:00pm

Hi…

Jeffrey J. [email protected] writes:

[…]

Given I have a file called “test.pdf” and I run the following code:

file = File.open(“test.pdf”,“r”)
the_pdf = file.read
the_pdf

This outputs a the PDF file contents to the command line.

[…]

irb(main):003:0>

[…]

jruby-1.6.2 :010 >
?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c?1;2c

As you can see on jruby there are garbage characters in my command
prompt (the ?1;2c stuff). this is only happening in jruby from what I
can tell.

Does anyone have any insight as to the cause of this?

Wouldn’t it depend on the contents of the binary PDF file? Could there
be control characters in there to which your shell or irb session might
have
an adverse reaction? Sorta like doing this?

$ cat test.pdf

Maybe try base64-encoding it to ascii before comparing?

Jim