SEGV by regexp and thread

e$B0J2<$N$h$&$K$9$k$He(B SEGV e$B$7$^$9!#e(B

% cat z.rb
def m(s)
/\Azzzzzz(\S)(\S)(\S)(\S*)\z/ =~ s
[$1.class, $2.class, $3.class, $4.class]
end
s1 = “z” * 10000
s1.force_encoding(“EUC-JP”)
s2 = “a”
Thread.new { loop { p m(s1) } }
loop { m(s2) }
% ./ruby z.rb
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
z.rb:2: [BUG] Segmentation fault
ruby 1.9.0 (2008-04-01 revision 15885) [i686-linux]

– control frame ----------
c:0007 p:0004 s:0012 b:0012 l:000011 d:000011 METHOD z.rb:2
c:0006 p:0011 s:0008 b:0007 l:000884 d:000006 BLOCK z.rb:8
c:0005 p:---- s:0008 b:0008 l:000007 d:000007 FINISH
c:0004 p:---- s:0006 b:0006 l:000005 d:000005 CFUNC :loop
c:0003 p:0007 s:0003 b:0003 l:000884 d:000002 BLOCK z.rb:8
c:0002 p:---- s:0004 b:0004 l:000003 d:000003 FINISH
c:0001 p:---- s:0002 b:0002 l:000001 d:000001 TOP

DBG> : “z.rb:8:in block (2 levels) in <main>'" DBG> : "z.rb:8:inloop’”
DBG> : “z.rb:8:in `block in '”
– backtrace of native function call (Use addr2line) –
0x811195e
0x8137c77
0x8137caf
0x80d5c99
0xb7f8a440
0x80c7fbc
0x80afdc4
0x80b2715
0x80b273d
0x810e408
0x810f543
0x810848e
0x8108532
0x805c6db
0x805c84b
0x805cbb5
0x805c888
0x8107852
0x810f36a
0x810eeb4
0x810b66f
0x810f543
0x810848e
0x81085ed
0x81127cb
0x8111d0e
0xb7f6e240
0xb7ea249e

zsh: abort (core dumped) ./ruby z.rb
% gdb ruby core.10383
GNU gdb 6.4.90-debian
Copyright © 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type “show copying” to see the conditions.
There is absolutely no warranty for GDB. Type “show warranty” for
details.
This GDB was configured as “i486-linux-gnu”…Using host libthread_db
library “/lib/tls/i686/cmov/libthread_db.so.1”.

warning: Can’t read pathname for load map: Input/output error.
Reading symbols from /lib/tls/i686/cmov/libpthread.so.0…done.
Loaded symbols for /lib/tls/i686/cmov/libpthread.so.0
Reading symbols from /lib/tls/i686/cmov/librt.so.1…done.
Loaded symbols for /lib/tls/i686/cmov/librt.so.1
Reading symbols from /lib/tls/i686/cmov/libdl.so.2…done.
Loaded symbols for /lib/tls/i686/cmov/libdl.so.2
Reading symbols from /lib/tls/i686/cmov/libcrypt.so.1…done.
Loaded symbols for /lib/tls/i686/cmov/libcrypt.so.1
Reading symbols from /lib/tls/i686/cmov/libm.so.6…done.
Loaded symbols for /lib/tls/i686/cmov/libm.so.6
Reading symbols from /lib/tls/i686/cmov/libc.so.6…done.
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /lib/ld-linux.so.2…Reading symbols from
/usr/lib/debug/lib/ld-2.3.6.so…done.
done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from
/home/akr/ruby/yarvo0/lib/ruby/1.9.0/i686-linux/enc/euc_jp.so…done.
Loaded symbols for
/home/akr/ruby/yarvo0/lib/ruby/1.9.0/i686-linux/enc/euc_jp.so
Reading symbols from /lib/libgcc_s.so.1…done.
Loaded symbols for /lib/libgcc_s.so.1
Core was generated by `./ruby z.rb’.
Program terminated with signal 6, Aborted.
#0 0xb7f8a410 in ?? ()
(gdb) bt
#0 0xb7f8a410 in ?? ()
#1 0xb7c988ac in ?? ()
#2 0x00000006 in ?? ()
#3 0x00002891 in ?? ()
#4 0xb7dff811 in raise () from /lib/tls/i686/cmov/libc.so.6
#5 0xb7e00fb9 in abort () from /lib/tls/i686/cmov/libc.so.6
#6 0x08137cb4 in rb_bug (fmt=0x816d5c7 “Segmentation fault”) at
error.c:226
#7 0x080d5c99 in sigsegv (sig=11) at signal.c:546
#8 0xb7f8a440 in ?? ()
#9 0x0000000b in ?? ()
#10 0xb7c98a3c in ?? ()
#11 0xb7c98abc in ?? ()
#12 0x0000000b in ?? ()
#13 0x00000000 in ?? ()
(gdb) run z.rb
Starting program: /home/akr/ruby/yarvo0/ruby/ruby z.rb
[Thread debugging using libthread_db enabled]
[New Thread -1209955456 (LWP 10398)]
[New Thread -1210692688 (LWP 10401)]
[New Thread -1211249744 (LWP 10402)]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]
[String, String, String, String]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1211249744 (LWP 10402)]
0x00000003 in ?? ()
(gdb) bt
#0 0x00000003 in ?? ()
#1 0x080be257 in onigenc_mbclen_approximate (p=0x8207a50 ‘z’ <repeats
200 times>…, e=0x820a160 “”, enc=0xb7d947a4)
at regenc.c:56
#2 0x080c7fbc in onig_search (reg=0x823d588, str=0x8207a50 ‘z’ <repeats
200 times>…, end=0x820a160 “”,
start=0x8207a50 ‘z’ <repeats 200 times>…, range=0x8207a51 ‘z’
<repeats 200 times>…, region=0x81a0f50, option=0)
at regexec.c:3610
#3 0x080afdc4 in rb_reg_search (re=3084291360, str=3084288840, pos=0,
reverse=0) at re.c:1271
#4 0x080b2715 in reg_match_pos (re=3084291360, strp=0xb7cdafa4, pos=0)
at re.c:2475
#5 0x080b273d in rb_reg_match (re=3084291360, str=3084288840) at
re.c:2522
#6 0x0810e408 in vm_eval (th=0x8214448, initial=0) at insns.def:2020
#7 0x0810f543 in vm_eval_body (th=0x8214448) at vm.c:1147
#8 0x0810848e in invoke_block (th=0x8214448, block=0xb7d5cf74,
self=3084465740, argc=0, argv=0x0, blockptr=0x0) at vm.c:572
#9 0x08108532 in vm_yield (th=0x8214448, argc=0, argv=0x0) at vm.c:589
#10 0x0805c6db in rb_yield_0 (argc=0, argv=0x0) at eval.c:910
#11 0x0805c84b in loop_i () at eval.c:970
#12 0x0805cbb5 in rb_rescue2 (b_proc=0x805c831 <loop_i>, data1=0,
r_proc=0, data2=0) at eval.c:1108
#13 0x0805c888 in rb_f_loop () at eval.c:992
#14 0x08107852 in call_cfunc (func=0x805c84d <rb_f_loop>,
recv=3084465740, len=0, argc=0, argv=0xb7cdd018)
at vm_insnhelper.c:285
#15 0x0810f36a in vm_call_cfunc (th=0x8214448, reg_cfp=0xb7d5cf60,
num=0, id=4064, recv=3084465740, klass=3084473920,
flag=8, mn=0xb7d8da94, blockptr=0xb7d5cf74) at vm_insnhelper.c:372
#16 0x0810eeb4 in vm_call_method (th=0x8214448, cfp=0xb7d5cf60, num=0,
blockptr=0xb7d5cf74, flag=8, id=4064, mn=0xb7d8da6c,
recv=3084465740, klass=3084465720) at vm_insnhelper.c:504
#17 0x0810b66f in vm_eval (th=0x8214448, initial=0) at insns.def:1056
#18 0x0810f543 in vm_eval_body (th=0x8214448) at vm.c:1147
#19 0x0810848e in invoke_block (th=0x8214448, block=0x8206c68,
self=3084465740, argc=0, argv=0x8207008, blockptr=0x0)
at vm.c:572
#20 0x081085ed in vm_invoke_proc (th=0x8214448, proc=0x8206c68,
self=3084465740, argc=0, argv=0x8207008, blockptr=0x0)
at vm.c:606
#21 0x081127cb in thread_start_func_2 (th=0x8214448,
stack_start=0xb7cdc450) at thread.c:314
#22 0x08111d0e in thread_start_func_1 (th_ptr=0x8214448) at
thread_pthread.c:175
#23 0xb7fb0240 in start_thread () from
/lib/tls/i686/cmov/libpthread.so.0
#24 0xb7ee449e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) up
#1 0x080be257 in onigenc_mbclen_approximate (p=0x8207a50 ‘z’ <repeats
200 times>…, e=0x820a160 “”, enc=0xb7d947a4)
at regenc.c:56
56 int ret = ONIGENC_PRECISE_MBC_ENC_LEN(enc,p,e);
(gdb) p enc
$1 = {precise_mbc_enc_len = 0x3, name = 0xb7d94790 “\003\b”, max_enc_len
= 135983448, min_enc_len = 135983464,
is_mbc_newline = 0, mbc_to_code = 0x410407, code_to_mbclen = 0,
code_to_mbc = 0x436c694e, mbc_case_fold = 0x7373616c,
apply_all_case_fold = 0, get_case_fold_codes_by_str = 0x11f,
property_name_to_ctype = 0, is_code_ctype = 0,
get_ctype_code_range = 0xb7d947e0, left_adjust_char_head = 0,
is_allowed_reverse_match = 0x1f, auxiliary_data = 0x0,
ruby_encoding_index = -1210496316}
(gdb) up
#2 0x080c7fbc in onig_search (reg=0x823d588, str=0x8207a50 ‘z’ <repeats
200 times>…, end=0x820a160 “”,
start=0x8207a50 ‘z’ <repeats 200 times>…, range=0x8207a51 ‘z’
<repeats 200 times>…, region=0x81a0f50, option=0)
at regexec.c:3610
3610 s += enclen(reg->enc, s, end);
(gdb) up
#3 0x080afdc4 in rb_reg_search (re=3084291360, str=3084288840, pos=0,
reverse=0) at re.c:1271
1271 result = onig_search(RREGEXP(re)->ptr,
(gdb) l
1266 rb_reg_prepare_re(re, str, 1);
1267
1268 if (!reverse) {
1269 range += RSTRING_LEN(str);
1270 }
1271 result = onig_search(RREGEXP(re)->ptr,
1272 (UChar
)(RSTRING_PTR(str)),
1273 ((UChar*)(RSTRING_PTR(str)) +
RSTRING_LEN(str)),
1274 ((UChar*)(RSTRING_PTR(str)) + pos),
1275 ((UChar*)range),
(gdb) rp re
T_REGEXP: “\Azzzzzz(\S)(\S)(\S)(\S*)\z” len:27 (literal)
encoding:2 $2 = (struct RRegexp *) 0xb7d68d20
(gdb) p $2
$3 = {basic = {flags = 4259848, klass = 3084404360}, ptr = 0x81f8ab0,
len = 27,
str = 0x8203e98 "\Azzzzzz(\S)(\S)(\S)(\S
)\z"}
(gdb) p *$2->ptr
$4 = {p = 0x8203da8 “#\a\006”, used = 180, alloc = 216, state = 0,
num_mem = 4, num_repeat = 0, num_null_check = 0,
num_comb_exp_check = -1210497116, num_call = 0, capture_history = 0,
bt_mem_start = 0, bt_mem_end = 0,
stack_pop_level = 0, repeat_range_alloc = 0, repeat_range = 0x0, enc =
0x81ac940, options = 0, syntax = 0x8193320,
case_fold_flag = 1073741824, name_table = 0x0, optimize = 2,
threshold_len = 6, anchor = 9, anchor_dmin = 9,
anchor_dmax = 4294967295, sub_anchor = 0, exact = 0x81fb2a8 “zzzzzz”,
exact_end = 0x81fb2ae “”,
map = ‘\006’ <repeats 122 times>, “\001”, ‘\006’ <repeats 133 times>,
int_map = 0x0, int_map_backward = 0x0, dmin = 0,
dmax = 0, chain = 0x0}
(gdb) p *$2->ptr->enc
$5 = {precise_mbc_enc_len = 0x8151b84 <us_ascii_mbc_enc_enc_len>, name =
0x81ac930 “US-ASCII”, max_enc_len = 1,
min_enc_len = 1, is_mbc_newline = 0x80beb97
<onigenc_is_mbc_newline_0x0a>,
mbc_to_code = 0x80bebfc <onigenc_single_byte_mbc_to_code>,
code_to_mbclen = 0x80bec0a <onigenc_single_byte_code_to_mbclen>,
code_to_mbc = 0x80bec14 <onigenc_single_byte_code_to_mbc>,
mbc_case_fold = 0x80bebc4 <onigenc_ascii_mbc_case_fold>,
apply_all_case_fold = 0x80be5f9 <onigenc_ascii_apply_all_case_fold>,
get_case_fold_codes_by_str = 0x80be6b0
<onigenc_ascii_get_case_fold_codes_by_str>,
property_name_to_ctype = 0x80bef53
<onigenc_minimum_property_name_to_ctype>,
is_code_ctype = 0x80bec44 <onigenc_ascii_is_code_ctype>,
get_ctype_code_range = 0x80beb8d
<onigenc_not_support_get_ctype_code_range>,
left_adjust_char_head = 0x80bec28
<onigenc_single_byte_left_adjust_char_head>,
is_allowed_reverse_match = 0x80bec30
<onigenc_always_true_is_allowed_reverse_match>, auxiliary_data =
0xb7d92d3c,
ruby_encoding_index = 2}
(gdb)

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

In message “Re: [ruby-dev:34223] SEGV by regexp and thread”
on Wed, 2 Apr 2008 20:38:24 +0900, Tanaka A. [email protected]
writes:

|e$B0J2<$N$h$&$K$9$k$He(B SEGV e$B$7$^$9!#e(B

onig_search()e$B$NESCf$Ge(BRUBY_VM_CHECK_INTS()e$B$,8F$P$l!"$=$3$G%3e(B
e$B%s%F%-%9%H%9%$%C%A$,8F$P$l$k$?$a!“F1$8@55,I=8=%*%V%8%'%/%H$Ke(B
e$BBP$7$Fe(Brb_reg_prepare_re()e$B$,8F$P$l$F;HMQCf$Ne(Bregex_te$B$,e(Bfreee$B$5$le(B
e$B$F$7$^$&$?$a!”$3$N$h$&$J8=>]$,5/$-$k$h$&$G$9!#e(B

e$B%j%(%s%H%i%s%H$G$J$$4X?t$G$b3d$j9~$_%A%'%C%/$r9T$$$?$$$3$H$,e(B
e$B$"$k$?$a!"e(BRUBY_VM_CHECK_INTS()e$B$G$O%3%s%F%-%9%H%9%$%C%A$r9T$Ce(B
e$B$F$O$$$1$J$$$N$@$H;W$$$^$9!#e(B

e$B<B:]!"e(Brb_thread_execute_interrupts()e$B$+$ie(Brb_thread_schedule()
e$B$r:o$k$H$3$NLdBj$OH/@8$7$^$;$s!#e(B

e$B$7$+$7!"$3$l$r:o$C$F$7$^$&$He(Bbootstraptest/test_thread.rbe$B$,$&e(B
e$B$^$/F0$+$J$/$J$C$F$7$^$&$s$G$9$h$M$(!#$I$&$d$C$FN>N)$5$;$?$ie(B
e$B$h$$$N$+J,$+$j$^$;$s!#e(B

e$B!!$5$5$@$G$9!%e(B

Yukihiro M. wrote:

e$B$r:o$k$H$3$NLdBj$OH/@8$7$^$;$s!#e(B

e$B$7$+$7!"$3$l$r:o$C$F$7$^$&$He(Bbootstraptest/test_thread.rbe$B$,$&e(B
e$B$^$/F0$+$J$/$J$C$F$7$^$&$s$G$9$h$M$(!#$I$&$d$C$FN>N)$5$;$?$ie(B
e$B$h$$$N$+J,$+$j$^$;$s!#e(B

e$B!!%9%1%8%e!<%j%s%02DG=$+$I$&$+$N%A%’%C%/$H3d$j9~$_%A%’%C%/$rJL!9$K$7e(B
e$B$^$9$+!)!!e(B1.8 e$B$@$HJL!9$G$7$?$C$1!)e(B

e$B!!$?$H$($=$&$7$F$b!$3d$j9~$_%O%s%I%iCf$K6/@)E*$KJL$N%9%l%C%I$K=hM}$re(B
e$B0$7$?$iF1$8LdBj$,5/$-$=$&$G$9!%e(B

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

e$BIw<Y0z$-$^$7$?!#e(B

In message “Re: [ruby-dev:34388] Re: SEGV by regexp and thread”
on Mon, 14 Apr 2008 08:51:33 +0900, SASADA Koichi [email protected]
writes:

|Yukihiro M. wrote:
|> onig_search()e$B$NESCf$Ge(BRUBY_VM_CHECK_INTS()e$B$,8F$P$l!"$=$3$G%3e(B
|> e$B%s%F%-%9%H%9%$%C%A$,8F$P$l$k$?$a!"F1$8@55,I=8=%*%V%8%'%/%H$Ke(B
|> e$BBP$7$Fe(Brb_reg_prepare_re()e$B$,8F$P$l$F;HMQCf$Ne(Bregex_te$B$,e(Bfreee$B$5$le(B

|e$B!!%9%1%8%e!<%j%s%02DG=$+$I$&$+$N%A%‘%C%/$H3d$j9~$_%A%’%C%/$rJL!9$K$7e(B
|e$B$^$9$+!)!!e(B1.8 e$B$@$HJL!9$G$7$?$C$1!)e(B

e$B$(!<$H!"JL$K$J$C$F$O$$$J$+$C$?$H;W$$$^$9!#$H$$$&$3$H$O!"e(B1.8
e$B$G$bF1$8LdBj$,5/$-$k$C$F$3$H$+$J!#e(B

|e$B!!$?$H$($=$&$7$F$b!$3d$j9~$_%O%s%I%iCf$K6/@)E*$KJL$N%9%l%C%I$K=hM}$re(B
|e$B0$7$?$iF1$8LdBj$,5/$-$=$&$G$9!%e(B

e$B@oN,$O$$$/$D$+$"$C$Fe(B

  • e$B%j%(%s%H%i%s%H$K$9$k!#e(B

    e$B$3$3$K8B$l$PIT2DG=$G$O$J$$$G$9$M!#$@$$$VFq$7$=$&$@$1$Ie(B

  • e$BNc30$d3d$j9~$_$O5v2D!#%3%s%F%-%9%H%9%$%C%A$d%f!<%6Dj5A%7%0e(B
    e$B%J%k%O%s%I%i$N<B9T$OIT5v2De(B

    e$B<B9T$,CfCG$5$l$k$V$s$K$O9=$o$J$$$o$1$J$N$G!“%j%(%s%H%i%se(B
    e$B%H$G$J$$3d$j9~$_%A%'%C%/$rJL$KMQ0U$9$k$H8@$&<j$O$”$j$($^e(B
    e$B$9e(B

  • e$B$=$NB>e(B

    e$B$J$s$+JL$N<j$,$"$j$^$9$+$M!#e(B

e$B$I$&$7$^$7$g$&$+$M!#e(B