Nokogiri 1.5.0 Released

Hello JRubyists,

Nokogiri 1.5.0 (“Y U NO RELEASE?” Edition) is out!
http://groups.google.com/group/nokogiri-talk/browse_thread/thread/4bab7b9b72b40e5c

This is a pivotal version to JRuby users. No libxml/libxslt behind
Nokogiri. When you use Nokogiri on JRuby, you don’t need any C
libraries. Instead, Xerces/NekoHtml and a couple more pure Java APIs
are used. Installing nokogiri gem is all. Java APIs are included in
the gem package. Right after the gem installation, nokogiri will work
on various platforms.

Just for JRuby users’ methods are added:
- You can wrap org.w3c.dom.Document to build Nokogiri::XML::Document
- You can get org.w3c.dom.Document from Nokogiri::XML::Document

See “Java Integration” section of
https://github.com/tenderlove/nokogiri/wiki/Pure-Java-Nokogiri-for-JRuby
for details.

Here’s a link for those who are worrying about performance,

Although pure Java Nokogiri is slow to parse a big XML document, it is
4 times faster than FFI version.

Give it a try. Give us your feedback.
-Yoko

On 07/02/2011 08:25 PM, Yoko H. wrote:

Give it a try. Give us your feedback.

I tried Java Nokogiri recently and premailer didn’t work. Are there
limitations on what Java Nokogiri can support? Or will it be able to do
anything regular Nokogiri can do?

Hi,

On Sun, Jul 3, 2011 at 12:40 AM, consiliens [email protected]
wrote:

On 07/02/2011 08:25 PM, Yoko H. wrote:

Give it a try. Give us your feedback.

I tried Java Nokogiri recently and premailer didn’t work. Are there
limitations on what Java Nokogiri can support? Or will it be able to do
anything regular Nokogiri can do?

Did you use pure Java Nokogiri under the environment that a custom
class loader was involved? If so, you need to move Java archives to
the directory that the custom class loader sees. For example,
WEB-INF/lib.
If you have an error message, that will help to figure out.

-Yoko

which versions of the library are these ? which versions of xerces is
nokogiri compatible with ? servlets containers usually comes with some
xml libraries and xercesImpl is quite common ! is it possible not to
require those jar if xerces is already in the parent classloader ?

  • Kristian

On 07/02/2011 11:25 PM, Yoko H. wrote:

Did you use pure Java Nokogiri under the environment that a custom
class loader was involved? If so, you need to move Java archives to
the directory that the custom class loader sees. For example,
WEB-INF/lib.
If you have an error message, that will help to figure out.

-Yoko

There’s no error message and no custom class loader. I don’t have
hpricot installed so premailer is forced to use nokogiri.

nokogiri (1.5.0 java)
jruby 1.6.2
Java 1.6.0_26

git clone GitHub - alexdunae/premailer: Preflight for HTML email

premailer/test/files$ jruby -S premailer base.html

Nothing is displayed using Java nokogiri.

Running the same command with MRI and regular nokogiri prints the
expected text to standard output.

Hi,

On Sun, Jul 3, 2011 at 2:50 AM, consiliens [email protected] wrote:

There’s no error message and no custom class loader. I don’t have hpricot
Nothing is displayed using Java nokogiri.

Running the same command with MRI and regular nokogiri prints the expected
text to standard output.

OK. This reproduced. Please file the bug at
https://github.com/tenderlove/nokogiri/issues?state=open
If possible, would you add a simple reproduce-able code? That will help.

Thanks,
-Yoko

Hi Yoko

Thanks for the update!

Give it a try. Give us your feedback.

I’ve ported one of our bigger nokogiri scripts to jruby/nokogiri 1.5.0
and found the following bug:

raise out.inspect unless out == “a b:c”
This should return a b:c, but it actually returns: a
c. It works fine with JRuby w/ Nokogiri 1.4.7 and with Ruby w/
Nokogiri 1.5.0

I’m jusing jruby 1.61. here!

Cheers!
Reto S.

Hi Kristian,

On Sun, Jul 3, 2011 at 2:00 AM, kristian [email protected] wrote:

which versions of the library are these ? which versions of xerces is
nokogiri compatible with ? servlets containers usually comes with some
xml libraries and xercesImpl is quite common ! is it possible not to
require those jar if xerces is already in the parent classloader ?

I added Java APIs’ info in
https://github.com/tenderlove/nokogiri/wiki/Pure-Java-Nokogiri-for-JRuby
. Thanks for asking this.

It is possible not to require xercesImpl.jar. Please see “Google App
Engine” section of the wiki above. This is a hack for an old version
of Nokogiri.

However, using xerces of parent classloader is not a good idea. Since
a single web application, for example a single war, should be
portable, everything to run the app should be in the war including
xercesImpl.jar. You’d better allow custom classloader to load
everything for the web app.

There’s one more reason. Pure Java Nokogiri uses NekoHTML to parse
html files. Prior to Xerces, NekoHTML must be loaded to make it work.
In light of this, you should not to use the parent classloader.

-Yoko

Hi Reto

2011/7/4 Reto S. [email protected]:

require ‘nokogiri’

input = ‘

doc = Nokogiri::XML(input, nil, ‘UTF-8’)
doc.css(“p”).first.replace(“a b:c”)
out = doc.children.first.to_s

raise out.inspect unless out == “a b:c”

This should return a b:c, but it actually returns: a
c. It works fine with JRuby w/ Nokogiri 1.4.7 and with Ruby w/ Nokogiri
1.5.0

This caused by a wrong fragment processing. I fixed the bug in rev.
798d047.

Thanks for using pure Java Nokogiri.
-Yoko

Hi Yoko

Am 04.07.2011 um 20:16 schrieb Yoko H.:

This should return a b:c, but it actually returns: a
c. It works fine with JRuby w/ Nokogiri 1.4.7 and with Ruby w/ Nokogiri
1.5.0

This caused by a wrong fragment processing. I fixed the bug in rev. 798d047.

Thanks for the quick update, it is better now, but our script still
struggles over the same bug, but in a different variation:

input = “

xyz

doc = Nokogiri::XML(input, nil, ‘UTF-8’)

p = doc.css(“p”).first
p.replace(“A:B”)

puts doc.to_s

This should print out
A:B

But it produces:
A:B

Thank you!

Cheers,
Reto

On 07/04/2011 11:27 AM, Yoko H. wrote:

OK. This reproduced. Please file the bug at
https://github.com/tenderlove/nokogiri/issues?state=open
If possible, would you add a simple reproduce-able code? That will help.

Thanks,
-Yoko

I filed #485
https://github.com/tenderlove/nokogiri/issues/485

To make it easy to reproduce I’ve used base.html which is included as a
sample file in the premailer repository. I couldn’t get any files to
work properly with premailer using Java Nokogiri.

Hello,

I opened the ticket, https://github.com/tenderlove/nokogiri/issues/490

This comes from a difference between libxml and Xerces parers. Xerces
won’t parse not well-formed string but libxml does.

Please follow the issue.
-Yoko

2011/7/5 Reto S. [email protected]:

Thanks for doing this.

I opened two bugs in github that are showstoppers for me. 493 and
492. They are easy to reproduce. For some reason I am not able to add
the label pure-java to them as requested.
Regards,
Erik

Erik,

On Tue, Jul 12, 2011 at 10:20 AM, Erik B. [email protected] wrote:

Thanks for doing this.

I opened two bugs in github that are showstoppers for me. 493 and
492. They are easy to reproduce. For some reason I am not able to add
the label pure-java to them as requested.
Regards,
Erik

Thank you! I’ll have a look.
-Yoko