Re: Re: which xml parser to use with jruby?

---- Greg D. [email protected] wrote:

---- Jay McGaffigan [email protected] wrote:

correct me if I am wrong but couldn’t you use Nokogiri? Under the
hood it uses Java libraries to do xml parsing (you might have to
install the beta version)

Jay

Is it fully supported with 1.5.3? I thought nokogiri has native extensions or
is that not an issue anymore? I didn’t even think of it and googling jruby and
nokogiri and see that some people had pains in earlier versions of jruby. I guess
that is worth a shot unless someone knows of any issues.

Okay, I see the pure java nokogiri is in beta and there are still some
things being worked out.

From github:

Please note. Although rake test reports very few problems, pure Java
Nokogiri still has weird behaviors in some areas. For example, handling
spaces is not the same as cRuby version.

The “weird behaviors in some areas” worries me.

Is anyone having success using it? Is it close to prime time?

Any other ideas for XML parsing? These XML docs are configuration
files for a java swing system. Many, many of them. Got to love java
developers and their love for XML config files. This would be so much
easier in YAML and use ruby or java objects. Oh, how I keep pushing for
app config files to be in YAML, but some times it is like talking to a
wall with java developers and the funny thing is I’m a java developer by
day and a ruby enthusiast by night.

Thanks,

GregD

Hi,

On Tue, Oct 19, 2010 at 9:42 AM, Greg D. [email protected] wrote:

Okay, I see the pure java nokogiri is in beta and there are still some things
being worked out.

From github:

Please note. Although rake test reports very few problems, pure Java Nokogiri
still has weird behaviors in some areas. For example, handling spaces is not the
same as cRuby version.

The “weird behaviors in some areas” worries me.

Sorry for making you worry. I wrote that page. I mean you encounter
weird behaviors sparsely. As far as I remember, pure Java Nokogiri
failed just two tests of all Nokogiri’s test cases (about 1000 tests
and more than 2000 assertions) when beta 2 was released. So, if you do
typical XML processing, pure Java Nokogiri should work well.

-Yoko

We are using the nokogiri pure java in production… (or it will be
when our customer “accepts” our app into their labs)

Jay

On Tue, Oct 19, 2010 at 11:42 AM, Brad Robertson

Is this to say then that you’re confident it would work in a production
environment? I use nokogiri as my xml mini backend parser for
ActiveResource in a Rails 2.3.5 app. That’s it.

This has been my only reason for not upgrading from 1.4 to 1.5 because I
get all of these warnings with nokogiri 1.4.1 FFI version under JRuby
1.5.

On Tue, Oct 19, 2010 at 11:59 AM, Jay McGaffigan [email protected]
wrote:

We are using the nokogiri pure java in production… (or it will be
when our customer “accepts” our app into their labs)

Thanks for reporting this.
-Yoko

On Tue, Oct 19, 2010 at 11:42 AM, Brad Robertson
[email protected] wrote:

Is this to say then that you’re confident it would work in a production
environment? I use nokogiri as my xml mini backend parser for ActiveResource in a
Rails 2.3.5 app. That’s it.

Unfortunately, I don’t have any production to test pure Java Nokogiri
on it. I appreciate users’ feedback.

-Yoko

At 11:42 AM -0400 10/19/10, Brad Robertson wrote:

Is this to say then that you’re confident it would work in a production
environment? I use nokogiri as my xml mini backend parser for ActiveResource in a
Rails 2.3.5 app. That’s it.

FYI: If you are using Rails in JRuby you can also use the JDom xml_mini
backend I wrote:

rails/activesupport/lib/active_support/xml_mini/jdom.rb at main · rails/rails · GitHub

It was about 50% as fast as using the libxml backend when I benchmarked
it 18 months ago.

At 5:24 PM -0400 10/19/10, Brad Robertson wrote:

Good to know! I’ll give that a shot.

By “50% as fast” do you mean 50% faster… or 50% slower?

The JDom back took almost twice a much time than the LibXML backend (in
a benchmark I wrote).

For more info see: GitHub - stepheneb/rails_hash_from_xml: Benchmarking ActiveSupport: Hash.from_xml with a 1.7 MB xml file.

The data in the README is from about 18 months ago (which means in the
JRuby tests I was using the older FFI version of Nokogiri):

Here’s a section with the relevant info:

$ ruby -I$RAILS_SOURCE/activesupport/lib bench_hash_from_xml.rb

               user     system      total        real
REXML     10.960000   0.270000  11.230000 ( 11.464308)
Nokogiri   1.230000   0.020000   1.250000 (  1.256476)
LibXML     0.430000   0.000000   0.430000 (  0.434689)

Using: JRuby 1.5.0.dev (ruby 1.8.7 patchlevel 174) (2009-12-31 5cc49c5)
and testing on Java 1.6 and 1.7 running in server mode and reporting the
final
measurements after running the whole benchmark twice:

$ jruby --server -I$RAILS_SOURCE/activesupport/lib 

bench_hash_from_xml.rb 2

# OpenJDK Server VM 1.7.0-internal:

               user     system      total        real
REXML      3.448000   0.000000   3.448000 (  3.448000)
JDOM       0.761000   0.000000   0.761000 (  0.761000)
Nokogiri   6.083000   0.000000   6.083000 (  6.082000)

# Java HotSpot(TM) 64-Bit Server VM 1.6.0_17:

               user     system      total        real
REXML      4.543000   0.000000   4.543000 (  4.543000)
JDOM       1.033000   0.000000   1.033000 (  1.033000)
Nokogiri   6.788000   0.000000   6.788000 (  6.788000)

Good to know! I’ll give that a shot.

By “50% as fast” do you mean 50% faster… or 50% slower?

Give the pure Java one a go; it was a good order of magnitude
faster than the FFI version when I played around with it over the
weekend.

gem install nokogiri --pre

On Tue, Oct 19, 2010 at 4:42 PM, Brad Robertson

Oh a side note, is there a way to use the pure java version of nokogiri
in Java?

I’ve been looking for a clean java xml library but most of the java
libraries have a lot of boilerplate code to perform a simple xpath
query.
Jsoup.org is very close to the most user friendly java html parsing
library but it only supports CSS selectors.

On Mon, Nov 22, 2010 at 12:37 PM, Tommy C. [email protected]
wrote:

Oh a side note, is there a way to use the pure java version of nokogiri in
Java?

You can do that using JRuby’s ScriptingContainer, embedding API. For
example,

ScriptingContainer container = new ScriptingContainer();
container.setLoadPaths(“set list of paths here to load gems”);
// or,
//container.runScriptlet(“ENV[‘GEM_PATH’]=‘set path to gem home’”);
container.runScriptlet(“require ‘rubygems’; require ‘nokogiri’”);
Object ret = container.runScriptlet(“whatever ruby code that uses
Nokogiri here”);

Resources are:
http://kenai.com/projects/jruby/pages/RedBridge
http://jruby.org/apidocs/ - see org.jruby.embed package

Hope this helps,
-Yoko