Bug Report: Segmentation Fault when indexing with a specific set of FieldInfos

I’m submitting this through the mailing list because Trac won’t let me
use its bug report form… Is there some more appropriate way of
submitting bugs if Trac doesn’t work?

This is the Trac error message:

500 Internal Server Error (Submission rejected as potential spam (IP
127.0.0.1 blacklisted by bsb.empty.us, sc.surbl.org, Maximum number of
posts per hour for this IP exceeded))

And this is the bug description:

I’m indexing e-mail messages, and using a specific FieldInfos
configuration for this. Unfortunately, when given certain (spammy)
messages using this configuration, Ferret segfaults.

I’ve tested this in several places. In my local development
environment, it works just fine. The segfaults happen in the remote
EC2 servers used by the project. I managed to isolate a test case,
that both makes the defect easier to see and proves this is a problem
with Ferret as opposed to all the code that was layered on top of it.

Here’s the information on each environment I ran this with:

‘’‘My local environment’‘’:

  • Linux 2.6.23-gentoo-r6 x86_64 AMD Athlon™ 64 Processor 3500+
    AuthenticAMD GNU/Linux

  • ruby 1.8.6 (2008-03-03 patchlevel 114) [x86_64-linux]

  • ferret (0.11.6)

  • Results: Test code runs without error.

‘’‘Remote Server 1’‘’:

  • Linux 2.6.16-xenU SMP i686 GNU/Linux

  • ruby 1.8.6 (2007-09-23 patchlevel 110) [i686-linux] (compiled from
    source)

  • ferret (0.11.6)

Results:

/home/sonian/lib/ruby/gems/1.8/gems/ferret-0.11.6/lib/ferret/index.rb:298:
[BUG] Segmentation fault
ruby 1.8.6 (2007-09-23) [i686-linux]

Aborted

‘’‘Remote Server 2’‘’:

  • Linux 2.6.18-xenU-ec2-v1.0 SMP i686 GNU/Linux

  • ruby 1.8.6 (2008-03-03 patchlevel 114) [i486-linux] (installed
    through apt-get)

  • ferret (0.11.6)

Results:

*** stack smashing detected ***: ruby terminated
======= Backtrace: =========
/lib/libc.so.6(__fortify_fail+0x4b)[0xb7d8f81b]
/lib/libc.so.6(__fortify_fail+0x0)[0xb7d8f7d0]
/var/lib/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.so[0xb7b6bb74]
/var/lib/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.so[0xb7b13a61]
/var/lib/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.so(mb_lcf_next+0x23)[0xb7b11d13]
/var/lib/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.so[0xb7b11659]
/var/lib/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.so[0xb7b11e9e]
/var/lib/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.so(dw_invert_field+0x134)[0xb7b40ab4]
/var/lib/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.so(dw_add_doc+0xa8)[0xb7b40ff8]
/var/lib/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.so(iw_add_doc+0x3a)[0xb7b4116a]
/var/lib/gems/1.8/gems/ferret-0.11.6/lib/ferret_ext.so[0xb7b617ce]
/usr/lib/libruby1.8.so.1.8[0xb7e88592]
/usr/lib/libruby1.8.so.1.8[0xb7e90bbf]
/usr/lib/libruby1.8.so.1.8[0xb7e90e78]
/usr/lib/libruby1.8.so.1.8[0xb7e96dcf]
/usr/lib/libruby1.8.so.1.8[0xb7e9b9b6]
/usr/lib/libruby1.8.so.1.8[0xb7e971d5]
/usr/lib/libruby1.8.so.1.8[0xb7e9b9b6]
/usr/lib/libruby1.8.so.1.8[0xb7e971d5]
/usr/lib/libruby1.8.so.1.8[0xb7e99d73]
/usr/lib/libruby1.8.so.1.8[0xb7e90b0e]
/usr/lib/libruby1.8.so.1.8[0xb7e90e78]
/usr/lib/libruby1.8.so.1.8[0xb7e96f0b]
/usr/lib/libruby1.8.so.1.8[0xb7e9a181]
/usr/lib/libruby1.8.so.1.8[0xb7e99b38]
/usr/lib/libruby1.8.so.1.8[0xb7e90b0e]
/usr/lib/libruby1.8.so.1.8[0xb7e90e78]
/usr/lib/libruby1.8.so.1.8[0xb7e96dcf]
/usr/lib/libruby1.8.so.1.8[0xb7e9a181]
/usr/lib/libruby1.8.so.1.8[0xb7e90b0e]
/usr/lib/libruby1.8.so.1.8[0xb7e90e78]
/usr/lib/libruby1.8.so.1.8[0xb7e96dcf]
/usr/lib/libruby1.8.so.1.8[0xb7e9e857]
/usr/lib/libruby1.8.so.1.8(ruby_exec+0x22)[0xb7e9e8a2]
/usr/lib/libruby1.8.so.1.8(ruby_run+0x2f)[0xb7e9e8df]
ruby[0x80486bd]
/lib/libc.so.6(__libc_start_main+0xe0)[0xb7ccc450]
ruby[0x8048601]

I’ve narrowed this down a bit. There are actually a couple of possible
segfaults that can happen with text containing long URLs while using the
StandardTokenizer. Here’s one that triggers the segfault mentioned
above:

http://s4hyear.com/Giorgio/guernsey/anytime/confrontation/Vivi/medias/ya/microfinance/quilting/mapping/GGF/dye/formally/placement/dramatic/isof/auto/assemblies/even/tow/7sur7/spends/bothered/coffee/uncontaminated/recommendations/genossen/space/Weinstein/Python/acknowledging/transcends.jpg

There is another segfault that happens if you modify the URL slightly:

http://s4hyear.com/Giorgio/guernsey/anytime/confrontation/Vivi/medias/ya/microfinance/quilting/mapping/GGF/dye/formally/placement/dramatic/isof/auto/assemblies/even/tow/7sur7/spends/bothered/coffee/uncontaminated/recommendations/genossen/space/Weinstein/Python1234/acknowledgi

I’m taking a look at the standard tokenizer code, but it is not easy to
understand, and I haven’t done C in years, so my debugging skills are
suspect. Hopefully this will help someone track it down.

  • Ian

Jens K. wrote:

I can confirm that:

*** stack smashing detected ***: ruby terminated

Environment:

Ubuntu 7.10,
2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC 2008 i686 GNU/Linux
ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]

Jens

I can confirm that:

*** stack smashing detected ***: ruby terminated

Environment:

Ubuntu 7.10,
2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC 2008 i686 GNU/Linux
ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]

Jens

On Fri, Apr 04, 2008 at 10:27:10AM -0300, Bira wrote:

And this is the bug description:

Results:

  • Linux 2.6.18-xenU-ec2-v1.0 SMP i686 GNU/Linux
    /lib/libc.so.6(__fortify_fail+0x4b)[0xb7d8f81b]
    /usr/lib/libruby1.8.so.1.8[0xb7e88592]
    /usr/lib/libruby1.8.so.1.8[0xb7e96f0b]
    /usr/lib/libruby1.8.so.1.8(ruby_exec+0x22)[0xb7e9e8a2]
    /usr/lib/libruby1.8.so.1.8(ruby_run+0x2f)[0xb7e9e8df]
    ruby[0x80486bd]
    /lib/libc.so.6(__libc_start_main+0xe0)[0xb7ccc450]
    ruby[0x8048601]


Bira
http://compexplicita.wordpress.com
http://compexplicita.tumblr.com


Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk


Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[email protected] | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold