Multibyte string/regex literal with escape sequence

e$BESCf$N%3%a%s%H$G$4$a$s$J$5$$!#e(B

US-ASCII e$B$O%$%s%?!<%M%C%HA4HL$NDj5A$+$i!"%/%j!<%s$Je(B 7bit e$B$Ke(B
e$B$7$?J}$,$$$$$G$9!#FC$Ke(B String#validate e$B$N;~$K@'Hse(B
8e$B%S%C%H$Ne(B
e$B%4%_$,F~$C$?$ie(B false e$B$K$7$J$$$H$^$:$$$H;W$$$^$9!#e(B

ASCII-8BIT e$B$Oe(B ASCII e$B$GJ8K!$,7h$^$k$,!"2?$+$Ne(B 8e$B%S%C%H$Ne(B
e$B$b$N$,2YJ*e(B (payload) e$B$H$7$F0l=o$KMh$k;~$K;H$($k!#e(B

e$B$=$NFs$D$N0c$$$,$I$3$^$GI,MW$+$I$&$+J,$+$j$^$;$s$,!"e(B
e$B>/$J$/$H$be(B validate e$B$$?$$$J$H$3$m$OFs$DM$7$$$G$9!#e(B

e$B59$7$/$*4j$$$7$^$9!#e(B Martin.

At 13:02 07/10/13, Tanaka A. wrote:

31715] e$B$N!Ve(B7bit ASCII e$B$N$_$+$i$J$kJ8;zNs$Oe(B US-ASCIIe$B!W$H$$$&5!G=$O!":#$He(B
e$B$Ke(B encoding e$B%a%=%C%I$r8F$S=P$7$?7k2L$r;H$C$F$$$k$H$=$&$H$b8Be(B
e$B$i$J$$$N$G!"e(BASCII-8BIT e$B$K7h$a$F$*$$$?$[$&$,:$$i$J$$$s$8$c$Je(B
e$B$$$+$H8+9~$s$G$$$^$9!#e(B

US-ASCII e$B$,Dj5A$5$l$l$P!“e(B(e$B$<$s$Ve(B 7bit e$B$N%1!<%9$K$D$$$F$Oe(B)
ASCII-8BIT e$B$G$b$J$/$Fe(B US-ASCII e$B$K$9$k$H$$$&2DG=@-$b=P$F$/$ke(B
e$B$+$b$7$l$^$;$s!#$^$!!”$=$&$7$?$H$7$F$be(B “\x80” e$B$_$?$$$J$N$Oe(B
ASCII-8BIT e$B$G$7$g$&$1$I!#e(B

[e$BEDCfe(B e$BE/e(B][e$B$?$J$+e(B e$B$"$-$ie(B][Tanaka A.]

#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:[email protected]

e$B@.@%$G$9!#e(B

Tanaka A. wrote:

In article [email protected],
“NARUSE, Yui” [email protected] writes:

e$B$J$k$[$I!“:#!Ve(BASCII-8BITe$B!W$H8F$P$l$F$$$k$b$N$O!”:#$^$G!Ve(BUS-ASCIIe$B!W$NL>A0e(B
e$B$K0z$-$:$i$l$F!“e(B0x00-0x7F e$B$,%a%$%s$Ge(B 8bit e$BItJ,$O$*$^$1$H$$$&G’<1$@$C$?$Ne(B
e$B$G$9$,!”$`$7$me(B ASCII e$B8_49%P%$%J%j$H2r<a$9$k$Y$-$J$N$G$9$M!#e(B

US-ASCII e$B$O$=$N$&$AJL8D$KDj5A$9$k$H$$$&$3$H$G!#e(B

e$B4pK\E*$Ke(B String#encoding e$B$Oe(B US-ASCII
e$B$rJV$5$J$$$H$$$&J}?K$b$"$j$+$J$!$He(B
e$B:G6a;W$&$h$&$K$J$C$F$-$^$7$?!#e(B

e$B$=$&$9$k$H!“!Ve(BUS-ASCIIe$B!W$H$N2r<a$G<BAu$5$l$?$H;W$o$l$kItJ,!“e(B[ruby-dev:
31715] e$B$N!Ve(B7bit ASCII e$B$N$_$+$i$J$kJ8;zNs$Oe(B US-ASCIIe$B!W$H$$$&5!G=$O!”:#$He(B
e$B$J$C$F$O<c430cOB46$,$”$k$N$G$9$,$I$&$J$N$G$7$g$&!#e(B

e$B$3$l$OJ8;zNs$N0UL#$,6I=jE*$K7h$^$k$N$GNI$$$3$H$@$H;W$C$F$$$^$9!#e(B

e$B$“$k;~E@$^$Ge(B ASCII e$B$G=q$$$F$$$?%=!<%9$G!”$“$k$H$-e(B UTF-8 e$B$NJ8e(B
e$B;zNs%j%F%i%k$r=q$$$?$H$-!”$=$N%U%!%$%kCf$NB>$N$9$Y$F$NJ8;zNse(B
e$B%j%F%i%k$be(B UTF-8 e$B$K$J$k!“$H$$$&$N$O$A$g$C$H%J%K$@$J!”$H;W$$e(B
e$B$^$9!#e(B

magic comment e$B$rMQ$$$:$K=q$$$F$$$?$,!"e(BUnicode
e$B$NHO0O$rMQ$$$?$/$J$C$?$N$Ge(B
coding: utf-8
e$B$7$?$H$$$&$3$H$G$9$h$M!#$?$7$+$KB>$N%j%F%i%k$b4,$-9~$^$l$^e(B
e$B$9$M!#e(B

e$B$=$&$J$C$F$bLdBj$J$/F0$/$O$:$@!“$H$$$&0U8+$b$”$k$N$G$9$,!"M[e(B
e$B$Ke(B encoding e$B%a%=%C%I$r8F$S=P$7$?7k2L$r;H$C$F$$$k$H$=$&$H$b8Be(B
e$B$i$J$$$N$G!"e(BASCII-8BIT e$B$K7h$a$F$*$$$?$[$&$,:$$i$J$$$s$8$c$Je(B
e$B$$$+$H8+9~$s$G$$$^$9!#e(B

e$B8=:$G$be(B ASCII-8BIT e$B$rJV$5$J$$e(B 7bit
e$BJ8;zNs$,B8:
$7$&$k$3$H$,$3$NJU$N;d$Ne(B
e$B<gD%$NG0F,$K$"$j$^$9!#:G$b<j$C<h$jAa$$:n$jJ}$O0J2<$NDL$j!#e(B
“hoge”.force_encoding(“UTF-8”)

US-ASCII e$B$,Dj5A$5$l$l$P!“e(B(e$B$<$s$Ve(B 7bit e$B$N%1!<%9$K$D$$$F$Oe(B)
ASCII-8BIT e$B$G$b$J$/$Fe(B US-ASCII e$B$K$9$k$H$$$&2DG=@-$b=P$F$/$ke(B
e$B$+$b$7$l$^$;$s!#$^$!!”$=$&$7$?$H$7$F$be(B “\x80” e$B$_$?$$$J$N$Oe(B
ASCII-8BIT e$B$G$7$g$&$1$I!#e(B

e$B;W$&$KJ}?K$OFs0F$“$C$F!”$=$l$,0l$D$N0F$N$h$&$K46$8$^$9!#e(B

-- coding: UTF-8 --

str = “hoge”
puts str.encoding #=> US-ASCII
str2 = str.force_encoding(‘EUC-JP’)
puts str2.encoding #=> US-ASCII

e$B$^$GE0Dl$9$k$N$,A0Ds$K$J$k$G$7$g$&$+!#e(B

e$B$b$&0l0F$,e(B 7bit e$B$+$I$&$+$re(B encoding
e$B$GCN$m$&$H$9$k$N$O$d$a$k$3$H$+$H;W$$e(B
e$B$^$9!#$^$D$b$H$5$s$O:#$3$NJ}8~$J$N$G$9$+$M!#e(BJRubye$B$G$I$&$9$k$+$H$$$&7|G0e(B
e$B$b$J$/$J$k$N$G0l@PFsD;$+$b!#e(B

-- coding: UTF-8 --

str = “hoge”
puts str.encoding #=> UTF-8
str2 = str.force_encoding(‘EUC-JP’)
puts str2.encoding #=> EUC-JP

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

In message “Re: [ruby-dev:32074] Re: multibyte string/regex literal with
escape sequence”
on Tue, 16 Oct 2007 00:35:44 +0900, “NARUSE, Yui”
[email protected] writes:

|e$B$b$&0l0F$,e(B 7bit e$B$+$I$&$+$re(B encoding e$B$GCN$m$&$H$9$k$N$O$d$a$k$3$H$+$H;W$$e(B
|e$B$^$9!#$^$D$b$H$5$s$O:#$3$NJ}8~$J$N$G$9$+$M!#e(BJRubye$B$G$I$&$9$k$+$H$$$&7|G0e(B
|e$B$b$J$/$J$k$N$G0l@PFsD;$+$b!#e(B

e$B$^$:!";d$O86B’E*$K$3$A$i$NJ}?K$G$9!#$3$NJ}?K$K87L)$K=>$&$J$ie(B
e$B$Pe(BASCII
e$B$NHO0O$7$+4^$^$J$$$+$i$H8@$C$FJ8;zNs$Ne(Bencodinge$B$re(B
ASCII-8BITe$B$K$9$kI,MW@-$O$J$$$N$G$9$,!"e(B

|> e$B$“$k;~E@$^$Ge(B ASCII e$B$G=q$$$F$$$?%=!<%9$G!”$“$k$H$-e(B UTF-8 e$B$NJ8e(B
|> e$B;zNs%j%F%i%k$r=q$$$?$H$-!”$=$N%U%!%$%kCf$NB>$N$9$Y$F$NJ8;zNse(B
|> e$B%j%F%i%k$be(B UTF-8 e$B$K$J$k!“$H$$$&$N$O$A$g$C$H%J%K$@$J!”$H;W$$e(B
|> e$B$^$9!#e(B

e$B$H$$$&$3$H$b$“$k$N$G!”!Ve(B(C
Rubye$B$K$*$$$F$Oe(B)e$B$“$C$F$b$$$$$+!W$/e(B
e$B$i$$$N5$;}$A$G<u$1F~$l$F$^$9!#$=$l$K$3$l$,$”$k$+$i$H8@$C$Fe(B

-- coding: UTF-8 --

str = "e$B$“e(Bhoge”
str2 = str[1…-1] # => “hoge”
puts str2.encoding #=> UTF-8

e$B$H$$$&?6$kIq$$$,JQ$o$k$o$1$G$O$"$j$^$;$s$7e(B(e$B$D$^$j!“FbMF$,e(B
ASCIIe$B$N$_$@$+$i$H$$$C$F$$$D$be(BASCII-8BITe$B$G$”$k$H$O8B$i$J$$e(B)e$B!#e(B

                            e$B$^$D$b$He(B e$B$f$-$R$me(B /:|)

In article [email protected],
“NARUSE, Yui” [email protected] writes:

e$B8=:$G$be(B ASCII-8BIT e$B$rJV$5$J$$e(B 7bit e$BJ8;zNs$,B8:$7$&$k$3$H$,$3$NJU$N;d$Ne(B
e$B<gD%$NG0F,$K$"$j$^$9!#:G$b<j$C<h$jAa$$:n$jJ}$O0J2<$NDL$j!#e(B
“hoge”.force_encoding(“UTF-8”)

e$B$O$$!#e(BUS-ASCII e$B$,F3F~$5$l$?$H$7$F$b!“e(BUS-ASCII
e$B$JJ8;zNs$,e(B
8bit e$B$J%P%$%H$r4^$`$3$H$O$”$k$G$7$g$&!#=>$C$F!“e(Bstr.encoding
e$B$,e(B “US-ASCII” e$B$@$+$i$H$$$C$Fe(B 7bit
e$B$G$”$k$H$OJ]>Z$G$-$^$;$s!#e(B

7bit e$B$G$"$k$3$H$r3NG’$9$k$N$Ke(B US-ASCII e$B$r;H$&$N$O!“e(BUS-ASCII
e$B$H$7$Fe(B validate e$B$7$F3N$+$a$k!”$H$$$&$3$H$K$J$k$O$:$G$9!#e(B

e$B$b$&0l0F$,e(B 7bit e$B$+$I$&$+$re(B encoding e$B$GCN$m$&$H$9$k$N$O$d$a$k$3$H$+$H;W$$e(B
e$B$^$9!#$^$D$b$H$5$s$O:#$3$NJ}8~$J$N$G$9$+$M!#e(BJRubye$B$G$I$&$9$k$+$H$$$&7|G0e(B
e$B$b$J$/$J$k$N$G0l@PFsD;$+$b!#e(B

e$B$^$!!“$3$C$A$NJ}8~$G7hCe$5$;$?$$$N$G$”$l$P!"$=$N>pJs$rF@$k%ae(B
e$B%=%C%I$NL>A0$H;EMM$rDs0F$9$k$N$,$$$$$s$8$c$J$$$G$7$g$&$+!#e(B