Hi there
I am trying to read a binary file using Java’s “FileInputStream” to
later store it in HBase.
My problem is the byte-array conversion needed to call the read-method:
inFile = File.new("/home/roger/Downloads/test.jpg")
inputStream = FileInputStream.new(inFile)
length = inFile.length()
buffer = “”
inputStream.read(buffer)
Any Ideas?
You may find the entire code in the attachment.
Thanks
Roger
Would you mind clarifying your goal a little. Is it your intention to
read
the bytes in a “streaming” fashion or is it OK to read the entire file
in
memory as a byte[]?
*simplicity *communication
*feedback *courage *respect
Strange: based on your code, it should warn you that File is already
defined by ruby. “warning: already initialized constant File”
You need to perform at least a :remove_const on Object prior to any
java
class import. Like this:
Object.send(:remove_const,:File) # put this as 1st line
Next error… NameError: no method ‘read’ for arguments
(org.jruby.RubyString) on Java::JavaIo::FileInputStream
Explanation: your buffer is actually a ruby string, not a byte[].
Possible
fix:
buffer = [].to_java(:byte)
I did not check the rest of HBase related code. Good luck!
Hey Ariel
It’s the latter case. These PDFs are rather small. So I just want to
read them into memory and then pass this “stream” (say byte[]) to
another function (the put-method of HBase)
Then please remove the following line from your code:
java_import “java.io.File”
Good luck!
According to the Ruby Documentation, I just use their “file” class,
since my PDFs (here JPGs) are rather small. This seems to work so far,
but might be slower than a stream-based way. It did well for importing 5
files on my test box, we’ll see how it does when running on the real
site with millions of files. Even without my “puts” the HBase shell
produces a lot of messages on screen…
java_import “org.apache.hadoop.hbase.util.Bytes”
java_import “org.apache.hadoop.hbase.client.HTable”
java_import “org.apache.hadoop.hbase.client.Put”
def jbytes(*args)
args.map { |arg| arg.to_s.to_java_bytes }
end
files = Dir.glob("/home/roger/Downloads/*.jpg")
files.each { |x| puts “File #{x}”
inFile = File.new(x)
buffer = inFile.read()
table = HTable.new(@hbase.configuration, “rb_test”)
p = Put.new(*jbytes(File.basename(x)))
p.add(*jbytes(“inhalt”, “”, buffer))
table.put§
table.close()
}