New replica encodings and encoding aliases

e$B@.@%$G$9!#e(B

e$B<j85$N4D6-$rD/$a$D$D!"?’!9$He(B replica e$B$de(B alias
e$B$r@Dj$7$F$$^$7$?!#8=:_;Xe(B
e$BDj2DG=$J%(%s%3!<%G%#%s%0$O0J2<$N$H$*$j$G$9!#e(BISO 8859
e$B%7%j!<%:$O$H$j$"$(e(B
e$B$:$=$l$C$]$$$N$Ne(B replica
e$B$K$7$F$$$k$N$G4V0c$C$F$$$?$i65$($F$/$@$5$$!#e(B
e$B!Je(BWindows-1251 e$B$Oe(B e$B$J$s$+4V0c$($?5$$,$9$k!&!&!&!Ke(B
“ASCII” is original encoding
“EUC-JP” is original encoding
“ISO-8859-1” is original encoding
“ISO-8859-10” is original encoding
“ISO-8859-11” is original encoding
“ISO-8859-13” is original encoding
“ISO-8859-14” is original encoding
“ISO-8859-15” is original encoding
“ISO-8859-16” is original encoding
“ISO-8859-2” is original encoding
“ISO-8859-3” is original encoding
“ISO-8859-4” is original encoding
“ISO-8859-5” is original encoding
“ISO-8859-6” is original encoding
“ISO-8859-7” is original encoding
“ISO-8859-8” is original encoding
“ISO-8859-9” is original encoding
“Shift_JIS” is original encoding
“US-ASCII” is original encoding
“UTF-8” is original encoding
“UTF-16BE” is original encoding
“UTF-16LE” is original encoding
“UTF-32BE” is original encoding
“UTF-32LE” is original encoding
"“Windows-1255” is replica of “ISO-8859-8”
“Windows-1256” is replica of “ISO-8859-6”
“Windows-1257” is replica of “ISO-8859-4”
“Windows-874” is replica of “ISO-8859-11”
“CP51932” is replica of “EUC-JP”
“eucJP-ms” is replica of “EUC-JP”
“Windows-31J” is replica of “Shift_JIS”
“Windows-1251” is replica of “ISO-8859-5”
“Windowws-1250” is replica of “ISO-8859-2”
“Windows-1252” is replica of “ISO-8859-1”
“Windows-1253” is replica of “ISO-8859-7”
“Windows-1254” is replica of “ISO-8859-9”
“CP932” is alias of “Windows-31J”
“ISO8859-6” is alias of “ISO-8859-6”
“ISO8859-14” is alias of “ISO-8859-14”
“CP1252” is alias of “Windows-1252”
“SJIS” is alias of “Shift_JIS”
“CP1253” is alias of “Windows-1253”
“ISO8859-7” is alias of “ISO-8859-7”
“ISO8859-15” is alias of “ISO-8859-15”
“BINARY” is alias of “ASCII-8BIT”
“646” is alias of “US-ASCII”
“CP1254” is alias of “Windows-1254”
“ISO8859-8” is alias of “ISO-8859-8”
“ISO8859-16” is alias of “ISO-8859-16”
“ISO8859-9” is alias of “ISO-8859-9”
“CP1255” is alias of “Windows-1255”
“eucJP” is alias of “EUC-JP”
“ANSI_X3.4-1986” is alias of “US-ASCII”
“CP1256” is alias of “Windows-1256”
“CP1257” is alias of “Windows-1257”
“ASCII” is alias of “US-ASCII”
“ISO8859-1” is alias of “ISO-8859-1”
“csWindows31J” is alias of “Windows-31J”
“ISO8859-2” is alias of “ISO-8859-2”
“ISO8859-10” is alias of “ISO-8859-10”
“ISO8859-3” is alias of “ISO-8859-3”
“CP874” is alias of “ISO-8859-11”
“ISO8859-11” is alias of “ISO-8859-11”
“euc-jp-ms” is alias of “EUC-JP”
“ISO8859-4” is alias of “ISO-8859-4”
“CP1250” is alias of “Windows-1250”
“CP1251” is alias of “Windows-1251”
“ISO8859-5” is alias of “ISO-8859-5”
“ISO8859-13” is alias of “ISO-8859-13”

e$BL$EPO?$J$b$N$N$&$A!"54<V$,%5%]!<%H$7$F$$$k$N$O!"0J2<$NDL$j!#e(B
bg_BG.CP1251 Bulgarian Cyrillic character set (CP1251), CODESET=CP1251
ko_KR.eucKR 1 0x0000 2 0x8080 2 0x0080 3 0x8000 0x8080 CODESET=eucKR
ru_RU.KOI8-R Russian CODESET=KOI8-R
zh_CN.GB18030 CODESET=GB18030
zh_CN.eucCN 1 0x0000 2 0x8080 2 0x0080 3 0x8000 0x8080 CODESET=eucCN
zh_TW.BIG5 CODESET=BIG5
zh_TW.eucTW CODESET=eucTW

KOI8-U e$B$Oe(B KOI8 e$B$Ne(B replica
e$B$J$N$+JLJ*$J$N$+$,$o$+$j$^$;$s$G$7$?!#$J$K$+1Fe(B
e$B6A$"$k$N$+$J!#e(B
uk_UA.KOI8-U Ukrainian CODESET=KOI8-U character set

e$B$"$H0J2<$N%(%s%3!<%G%#%s%0$,$"$j$^$7$?$,!"$3$l$O%5%]!<%H$9$kI,MW$"$k$N$Ge(B
e$B$9$+$M!)$J$s$+$J$5$=$&!#e(B
ru_RU.CP866 Russian Alternative charset, CODESET=CP866
kk_KZ.PT154 PT154 character set,CODESET=PT154
hy_AM.ARMSCII-8 Armenian ARMSCII-8 character set, CODESET=ARMSCII-8
ru_RU.CP866 Russian Alternative charset, CODESET=CP866

e$B$H$3$m$G!"8=:_e(B rb_locale_encoding e$B$Oe(B locale_charmap e$B$,e(B
nil e$B$@$He(B
ASCII-8BIT e$B$rJV$7$^$9$,!"e(BPOSIX e$BE*$K$$$($Pe(B nil = C
e$B$J$o$1$G!"e(BUS-ASCII e$B$re(B
e$BJV$9$N$O$I$&$G$7$g$&!#$^$?!"L$EPO?$NL>A0$,Mh$?$H$-$Oe(B dummy
e$B$rJV$7$^$;$s$+!#e(B

Index: encoding.c

— encoding.c (revision 15025)
+++ encoding.c (working copy)
@@ -872,13 +872,14 @@ rb_locale_encoding(void)
{
VALUE charmap = rb_locale_charmap(rb_cEncoding);
int idx;

  • char *name;
  • if (NIL_P(charmap))
  •    return rb_ascii8bit_encoding();
    
  • if (NIL_P(charmap)) name = “US-ASCII”;
  • else name = StringValueCStr(charmap)
  • idx = rb_enc_find_index(StringValueCStr(charmap));
  • idx = rb_enc_find_index(name);
    if (idx < 0)
  •    return rb_ascii8bit_encoding();
    
  •    idx = rb_define_dummy_encoding(name);
    

    return rb_enc_from_index(idx);
    }

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

In message “Re: [ruby-dev:33078] NEW REPLICA ENCODINGS AND ENCODING
ALIASES”
on Sun, 13 Jan 2008 23:45:43 +0900, “NARUSE, Yui”
[email protected] writes:

|e$B$H$3$m$G!"8=:_e(B rb_locale_encoding e$B$Oe(B locale_charmap e$B$,e(B nil e$B$@$He(B
|ASCII-8BIT e$B$rJV$7$^$9$,!"e(BPOSIX e$BE*$K$$$($Pe(B nil = C e$B$J$o$1$G!"e(BUS-ASCII e$B$re(B
|e$BJV$9$N$O$I$&$G$7$g$&!#e(B

e$B$$$$$s$8$c$J$$$+$H;W$$$^$9!#e(B

|e$B$^$?!"L$EPO?$NL>A0$,Mh$?$H$-$Oe(B dummy e$B$rJV$7$^$;$s$+!#e(B

encodinge$B$Oe(BGCe$B$NBP>]$K$J$i$J$$$N$G!“$`$d$_$K:n$C$A$c$&$N$K$ODqe(B
e$B93$,$”$j$^$9!#e(B

e$B@.@%$G$9!#e(B

Yukihiro M. wrote:

|e$B$H$3$m$G!"8=:_e(B rb_locale_encoding e$B$Oe(B locale_charmap e$B$,e(B nil e$B$@$He(B
|ASCII-8BIT e$B$rJV$7$^$9$,!"e(BPOSIX e$BE*$K$$$($Pe(B nil = C e$B$J$o$1$G!"e(BUS-ASCII e$B$re(B
|e$BJV$9$N$O$I$&$G$7$g$&!#e(B

e$B$$$$$s$8$c$J$$$+$H;W$$$^$9!#e(B

r15039 e$B$G%3%_%C%H$7$^$7$?!#e(B

|e$B$^$?!"L$EPO?$NL>A0$,Mh$?$H$-$Oe(B dummy e$B$rJV$7$^$;$s$+!#e(B

encodinge$B$Oe(BGCe$B$NBP>]$K$J$i$J$$$N$G!"$`$d$_$K:n$C$A$c$&$N$K$ODqe(B
e$B93$,$"$j$^$9!#e(B

e$B$3$3$G$D$/$i$l$ke(B dummy e$B$O9b!9e(B 1
e$B$D$J$N$GBg$-$J1F6A$O$J$$5$$b$9$k$N$G$9e(B
e$B$,!"$H$j$"$($:EPO?:Q$_$NL>A0$rA}$d$9J}K!$GBP1~$9$k$3$H$K$7$^$9!#e(B

e$B@.@%$5$s!"$3$s$K$A$O!#e(B

e$BJV;v$,CY$/$J$C$F$4$a$s$J$5$$!#e(B

At 23:45 08/01/13, NARUSE, Yui wrote:

e$B@.@%$G$9!#e(B

e$B<j85$N4D6-$rD/$a$D$D!“?'!9$He(B replica e$B$de(B alias e$B$r@Dj$7$F$$^$7$?!#8=:_;Xe(B
e$BDj2DG=$J%(%s%3!<%G%#%s%0$O0J2<$N$H$*$j$G$9!#e(BISO 8859 e$B%7%j!<%:$O$H$j$”$(e(B
e$B$:$=$l$C$]$$$N$Ne(B replica e$B$K$7$F$$$k$N$G4V0c$C$F$$$?$i65$($F$/$@$5$$!#e(B
e$B!Je(BWindows-1251 e$B$Oe(B e$B$J$s$+4V0c$($?5$$,$9$k!&!&!&!Ke(B

windows-15xx e$B$OB?$/$N>l9ge(B iso-8859-yy e$B$H;w$F$$$^$9$,!"e(B
0x80 e$B$+$ie(B 0x9F e$B$^$G$O0c$$$^$9!#e(Biso
e$B$NJ}$O$=$3$K%3%s%H%m!<%ke(B
e$B$,F~$C$F$$$k$+$I$&$+$G!“e(Bwindows e$B$N>l9g$O>/$J$/$H$b0lIt$Ke(B
e$B!VIaDL!W$NJ8;z$,F~$C$F$$$^$9!#e(Bstring[i] e$B$J$I$N%a%>%I$K$Oe(B
e$B1F6A$,$J$$$,!”@55,I=8=$J$I$G0c$$$,$G$k$H;W$$$^$9!#e(B

e$B$H$3$m$G!"2<5-$N%j%9%H$O$I$&$d$C$F=P$;$k$G$7$g$&$+!#e(B

e$B59$7$/$*4j$$$7$^$9!#e(B Martin.

“ISO-8859-3” is original encoding
“UTF-16LE” is original encoding
“Windowws-1250” is replica of “ISO-8859-2”
“ISO8859-15” is alias of “ISO-8859-15”
“CP1257” is alias of “Windows-1257”
“CP1250” is alias of “Windows-1250”
zh_TW.BIG5 CODESET=BIG5
hy_AM.ARMSCII-8 Armenian ARMSCII-8 character set, CODESET=ARMSCII-8
+++ encoding.c (working copy)

NARUSE, Yui [email protected]
DBDB A476 FDBD 9450 02CD 0EFC BCE3 C388 472E C1EA

#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:[email protected]

e$B@.@%$G$9!#e(B

Martin D. wrote:

e$B<j85$N4D6-$rD/$a$D$D!"?’!9$He(B replica e$B$de(B alias e$B$r@Dj$7$F$$^$7$?!#8=:_;Xe(B
e$BDj2DG=$J%(%s%3!<%G%#%s%0$O0J2<$N$H$*$j$G$9!#e(BISO 8859 e$B%7%j!<%:$O$H$j$"$(e(B
e$B$:$=$l$C$]$$$N$Ne(B replica e$B$K$7$F$$$k$N$G4V0c$C$F$$$?$i65$($F$/$@$5$$!#e(B
e$B!Je(BWindows-1251 e$B$Oe(B e$B$J$s$+4V0c$($?5$$,$9$k!&!&!&!Ke(B

windows-15xx e$B$OB?$/$N>l9ge(B iso-8859-yy e$B$H;w$F$$$^$9$,!"e(B
0x80 e$B$+$ie(B 0x9F e$B$^$G$O0c$$$^$9!#e(Biso e$B$NJ}$O$=$3$K%3%s%H%m!<%ke(B
e$B$,F~$C$F$$$k$+$I$&$+$G!“e(Bwindows e$B$N>l9g$O>/$J$/$H$b0lIt$Ke(B
e$B!VIaDL!W$NJ8;z$,F~$C$F$$$^$9!#e(Bstring[i] e$B$J$I$N%a%>%I$K$Oe(B
e$B1F6A$,$J$$$,!”@55,I=8=$J$I$G0c$$$,$G$k$H;W$$$^$9!#e(B

e$B$"!<!"$J$k$[$I!#e(B
Windows-125x e$B$He(B ISO-88590-Y
e$B$N0c$$$O4{CN$N%P%0$H$$$&$3$H$G!"$$$$$$D>$7e(B
e$B$F$*$-$^$9!#e(B

e$B$H$3$m$G!"2<5-$N%j%9%H$O$I$&$d$C$F=P$;$k$G$7$g$&$+!#e(B

e$B0J2<$N$h$&$J%9%/%j%W%H$G=PNO$7$F$$$^$9!#e(B
(Encoding.name_list|Encoding.list.map{|x| x.to_s}).each do |name|
enc = Encoding.find(name)
if enc.dummy?
puts ‘%s is a dummy encoding.’ % name
elsif name != enc.to_s
puts ‘%s is an alias of %s.’ % [name, enc.to_s]
elsif enc.base_encoding
puts ‘%s is a replica of %s.’ % [name, enc.base_encoding.to_s]
else
puts ‘%s is a original encoding.’ % name
end
end