Any converter installed in the system can be used through the iconv command, which uses the iconv library. The iconv command acts as a filter for converting from one code set to another. For example, the following command filters data from PC Code (IBM-850) to ISO8859-1:
The iconv command converts the encoding of characters read from either standard input or the specified file and then writes the results to standard output.
Understanding libiconv
The iconv application programming interface (API) consists of the following subroutines that accomplish conversion:
Performs the initialization required to convert characters from the code set specified by the FromCode parameter to the code set specified by the ToCode parameter. The strings specified are dependent on the converters installed in the system. If initialization is successful, the converter descriptor, iconv_t, is returned in its initial state.
Invokes the converter function using the descriptor obtained from the iconv_open subroutine. The inbuf parameter points to the first character in the input buffer, and the inbytesleft parameter indicates the number of bytes to the end of the buffer being converted. The outbuf parameter points to the first available byte in the output buffer, and the outbytesleft parameter indicates the number of available bytes to the end of the buffer.
For state-dependent encoding, the subroutine is placed in its initial state by a call for which the inbuf value is a null pointer. Subsequent calls with the inbuf parameter as something other than a null pointer cause the internal state of the function to be altered as necessary.
Communication with system using different code set (or receiver's code set is unknown)
Protocol
Protocol
Method to choose
7-bit only
8-bit
7-bit only
8-bit
as is
Not valid
Best choice
Not valid
Not valid if remote code set is unknown
fold7
OK
OK
Best choice
OK
fold8
Not valid
OK
Not valid
Best choice
uucode
Best choice
OK
Not valid
Not valid
If the sender uses the same code set as the receiver, the following possibilities exist:
When protocol allows 8-bit data, the data can be sent without conversions.
When protocol allows only 7-bit data, the 8-bit code points must be mapped to 7-bit values. Use the iconv interface and one of the following methods:
uucode
Provides the same mapping as the uuencode and uudecode commands. This is the recommended method. For more information, see Interchange Converters?uucode.
7?bit
Converts internal code sets using 7-bit data. This method passes ASCII without any change. For more information, see Interchange Converters?7-bit.
If the sender uses a code set different from the receiver, there are two possibilities:
When protocol allows only 7-bit data, use the fold7 method.
When protocol allows 8-bit data and you know the receiver's code set, use the iconv interface to convert the data. If you do not know the receiver's code set, use the following method:
8?bit
Converts internal code sets to standard interchange formats. The 8-bit data is transmitted and the information is preserved so that the receiver can reconstruct the data in its code set. For more information, see Interchange Converters?8-bit.
Using the iconv_open Subroutine
The following examples illustrate how to use the iconv_open subroutine in different situations:
When the sender and receiver use the same code sets, and if the protocol allows 8-bit data, you can send data without converting it. If the protocol allows only 7-bit data, do the following:
Sender:
cd = iconv_open("uucode", nl_langinfo(CODESET));
Receiver:
cd = iconv_open(nl_langinfo(CODESET), "uucode");
Whne the sender and receiver use different code sets, and if the protocol allows 8-bit data and the receiver's code set is unknown, do the following:
Sender:
cd = iconv_open("fold8", nl_langinfo(CODESET));
Receiver:
cd = iconv_open(nl_langinfo(CODESET),"fold8" );
If the protocol allows only 7-bit data, do the following:
Sender:
cd = iconv_open("fold7", nl_langinfo(CODESET));
Receiver:
cd = iconv_open(nl_langinfo(CODESET), "fold7" );
The iconv_open subroutine uses the LOCPATH environment variable to search for a converter whose name is in the following form:
iconv/FromCodeSet_ToCodeSet
The FromCodeSet string represents the sender's code set, and the ToCodeSet string represents the receiver's code set. The underscore character separates the two strings.
Note:
All setuid and setgid programs ignore the LOCPATH environment variable.
Because the iconv converter is a loadable object module, a different object is required when running in the 64-bit environment. In the 64-bit environment, the iconv_open routine uses the LOCPATH environment variable to search for a converter whose name is in the following form:
iconv/FromCodeSet_ToCodeSet__64.
The iconv library automatically chooses whether to load the standard converter object or the 64-bit converter object. If the iconv_open subroutine does not find the converter, it uses the from,to pair to search for a file that defines a table-driven conversion. The file contains a conversion table created by the genxlt command.
The iconvTable converter uses the LOCPATH environment variable to search for a file whose name is in the following form:
iconvTable/FromCodeSet_ToCodeSet
If the converter is found, it performs a load operation and is initialized. The converter descriptor, iconv_t, is returned in its initial state.
Converter Programs versus Tables
Converter programs are executable functions that convert data according to a set of rules. Converter tables are single-byte conversion tables that perform stateless conversions. Programs and tables are in separate directories, as follows:
/usr/lib/nls/loc/iconv
Converter programs
/usr/lib/nls/loc/iconvTable
Converter tables
After a converter program is compiled and linked with the libiconv.a library, the program is placed in the /usr/lib/nls/loc/iconv directory.
To build a table converter, build a source converter table file. Use the genxlt command to compile translation tables into a format understood by the table converter. The output file is then placed in the /usr/lib/nls/loc/iconvTable directory.
Unicode and Universal Converters
Unicode (or UCS-2) conversion tables are found in:
$LOCPATH/uconvTable/*CodeSet*
The $LOCPATH/uconv/UCSTBL converter program is used to perform the conversion to and from UCS-2 using the iconv utilities.
A Universal converter program is provided that can be used to convert between any two code sets whose conversions to and from UCS-2 is defined. Given the following uconv tables:
X -> UCS-2
UCS-2 -> Y
a universal conversion can be defined that maps the following:
X -> UCS-2 -> Y
by use of the $LOCPATH/iconv/Universal_UCS_Conv.
Universal UCS Converter
UCS-2 is a universal 16-bit encoding that can be used as an interchange medium to provide conversion capability between virtually any code sets. The conversion can be accomplished using the Universal UCS Converter, which converts between any two code sets XXX and YYY as follows:
XXX <-> UTF-32 <-> YYY
The XXX and YYY conversions must be included in the supported List of UCS-2 Interchange Converters, and must be installed on the system.
The universal converter is installed as the file /usr/lib/nls/loc/iconv/Universal_UCS_Conv.
The conversion between multibyte and wide character code depends on the current locale setting. Do not exchange wide character codes between two processes, unless you have knowledge that each locale that might be used handles wide character codes in a consistent fashion. Most locales for this operating system use the Unicode character value as a wide character code, except locales based on IBM-eucTW codesets.
Using Converters
The iconv interface is a set of the following subroutines used to open, perform, and close conversions:
The following example shows how you can use these subroutines to create a code set conversion filter that accepts the ToCode and FromCode parameters as input arguments:
/*
* After the next operation,ibuf will
* contain new data plus any truncated
* data left from the previous read.
*/
ileft+=fread(ibuf+ileft,1,BUFSIZ-ileft,stdin);
do {
ip=ibuf;
op=obuf;
oleft=BUFSIZ;
r=iconv(cd,&ip,&ileft,&op,&oleft);
if(ICONV_INVAL()){
fprintf(stderr,
catgets(catd,NL_SETD,ERROR,"invalid input\n"));
exit(2);
}
fwrite(obuf,1,BUFSIZ-oleft,stdout);
if(ICONV_TRUNC() || ICONV_OVER())
/*
*Data remaining in buffer-copy
*it to the beginning
*/
memcpy(ibuf,ip,ileft);
/*
*loop until all characters in the input
*buffer have been converted.
*/
} while(ICONV_OVER());
}
if(ileft!=0){
/*
*This can only happen if the last call
*to iconv() returned ICONV_TRUNC, meaning
*the last data in the input stream was
*incomplete.
*/
fprintf(stderr,catgets(catd,NL_SETD,INCOMP,"input incomplete\n"));
exit(3);
}
iconv_close(cd);
exit(0);
}
Naming Converters
Code set names are in the form CodesetRegistry-CodesetEncoding where:
CodesetRegistry
Identifies the registration authority for the encoding. The CodesetRegistry must be made of characters from the portable code set (usually A-Z and 0-9).
CodesetEncoding
Identifies the coded character set defined by the registered authority.
The from,to variable used by the iconv command and iconv_open subroutine identifies a file whose name should be in the form /usr/lib/nls/loc/iconv/%f_%t or /usr/lib/nls/loc/iconvTable/%f_%t, where:
%f
Represents the FromCode set name
%t
Represents the ToCode set name
List of Converters
Converters change data from one code set to another. The sets of converters supported with the iconv library are listed in the following sections. All converters shipped with the BOS Runtime Environment are located in the /usr/lib/nls/loc/iconv/* or /usr/lib/nls/loc/iconvTable/* directory.
These directories also contain private converters; that is, they are used by other converters. However, users and programs should only depend on the converters in the following lists.
Any converter shipped with the BOS Runtime Environment and not listed here should be considered private and subject to change or deletion. Converters supplied by other products can be placed in the /usr/lib/nls/loc/iconv/* or /usr/lib/nls/loc/iconvTable/* directory.
Programmers are encouraged to use registered code set names or code set names associated with an application. The X Consortium maintains a registry of code set names for reference. See Code Sets for National Language Support for more information about code sets.
PC, ISO, and EBCDIC Code Set Converters
These converters provide conversion between PC, ISO, and EBCDIC single-byte stateless code sets. The following types of conversions are supported: PC to/from ISO, PC to/from EBCDIC, and ISO to/from EBCDIC.
Conversion is provided between compatible code sets such as Latin-1 to Latin-1 and Greek to Greek. However, conversion between different EBCDIC national code sets is not supported. For information about converting between incompatible character sets, refer to the Interchange Converters?7-bit and the Interchange Converters?8-bit.
Conversion tables in the iconvTable directory are created by the genxlt command.
Compatible Code Set Names
The following table lists code set names that are compatible. Each line defines to/from strings that may be used when requesting a converter.
Note:
The PC and ISO code sets are ASCII-based.
Code Set Compatibility
Character Set
Languages
PC
ISO
EBCDIC
Latin-1
U.S. English, Portuguese, Canadian French
N/A
ISO8859-1
IBM-037
Latin-1
Danish, Norwegian
N/A
ISO8859-1
IBM-277
Latin-1
Finnish, Swedish
N/A
ISO8859-1
IBM-278
Latin-1
Italian
N/A
ISO8859-1
IBM-280
Latin-1
Japanese
N/A
ISO8859-1
IBM-281
Latin-1
Spanish
N/A
ISO8859-1
IBM-284
Latin-1
U.K. English
N/A
ISO8859-1
IBM-285
Latin-1
German
N/A
ISO8859-1
IBM-273
Latin-1
French
N/A
ISO8859-1
IBM-297
Latin-1
Belgian, Swiss German
N/A
ISO8859-1
IBM-500
Latin-2
Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian Latin, Slovak, Slovene
IBM-852
ISO88859-2
IBM-870
Cyrillic
Bulgarian, Macedonian, Serbian Cyrillic, Russian
IBM-855
ISO8859-5
IBM-880 IBM-1025
Cyrillic
Russian
IBM-866
ISO8859-5
IBM-1025
Hebrew
Hebrew
IBM-856 IBM-862
ISO8859-8
IBM-424 IBM-803
Turkish
Turkish
IBM-857
ISO8859-9
IBM-1026
Arabic
Arabic
IBM-864 IBM-1046
ISO8859-6
IBM-420
Greek
Greek
IBM-869
ISO8859-7
IBM-875
Greek
Greek
IBM-869
ISO8859-7
IBM-875
Baltic
Lithuanian, Latvian, Estonian
IBM-921 IBM-922
ISO8859-4
IBM-1112 IBM-1122
Note:
A character that exists in the source code set but does not exist in the target code set is converted to a converter-defined substitute character.
Files
The following table describes the inconvTable converters found in the /usr/lib/nls/loc/iconvTable directory:
iconvTable Converters
Converter Table
Description
Language
IBM-037_IBM-850
IBM-037 to IBM-850
U.S. English, Portuguese, Canadian-French
IBM-273_IBM-850
IBM-273 to IBM-850
German
IBM-277_IBM-850
IBM-277 to IBM-850
Danish, Norwegian
IBM-278_IBM-850
IBM-278 to IBM-850
Finnish, Swedish
IBM-280_IBM-850
IBM-280 to IBM-850
Italian
IBM-281_IBM-850
IBM-281 to IBM-850
Japanese-Latin
IBM-284_IBM-850
IBM-284 to IBM-850
Spanish
IBM-285_IBM-850
IBM-285 to IBM-850
U.K. English
IBM-297_IBM-850
IBM-297 to IBM-850
French
IBM-420_IBM_1046
IBM-420 to IBM-1046
Arabic
IBM-424_IBM-856
IBM-424 to IBM-856
Hebrew
IBM-424_IBM-862
IBM-424 to IBM-862
Hebrew
IBM-500_IBM-850
IBM-500 to IBM-850
Belgian, Swiss German
IBM-803_IBM-856
IBM-803 to IBM-856
Hebrew
IBM-803_IBM-862
IBM-803 to IBM-862
Hebrew
IBM-850_IBM-037
IBM-850 to IBM-037
U.S. English, Portuguese, Canadian-French
IBM-850_IBM-273
IBM-850 to IBM-273
German
IBM-850_IBM-277
IBM-850 to IBM-277
Danish, Norwegian
IBM-850_IBM-278
IBM-850 to IBM-278
Finnish, Swedish
IBM-850_IBM-280
IBM-850 to IBM-280
Italian
IBM-850_IBM-281
IBM-850 to IBM-281
Japanese-Latin
IBM-850_IBM-284
IBM-850 to IBM-284
Spanish
IBM-850_IBM-285
IBM-850 to IBM-285
U.K. English
IBM-850_IBM-297
IBM-850 to IBM-297
French
IBM-850_IBM-500
IBM-850 to IBM-500
Belgian, Swiss German
IBM-856_IBM-424
IBM-856 to IBM-424
Hebrew
IBM-856_IBM-803
IBM-856 to IBM-803
Hebrew
IBM-856_IBM-862
IBM-856 to IBM-862
Hebrew
IBM-862_IBM-424
IBM-862 to IBM-424
Hebrew
IBM-862_IBM-803
IBM-862 to IBM-803
Hebrew
IBM-862_IBM-856
IBM-862 to IBM-856
Hebrew
IBM-864_IBM-1046
IBM-864 to IBM-1046
Arabic
IBM-921_IBM-1112
IBM-921 to IBM-1112
Lithuanian, Latvian
IBM-922_IBM-1122
IBM-922 to IBM-1122
Estonian
IBM-1112_IBM-921
IBM-1121 to IBM-921
Lithuanian, Latvian
IBM-1122_IBM-922
IBM-1122 to IBM-922
Estonian
IBM-1046_IBM-420
IBM-1046 to IBM-420
Arabic
IBM-1046_IBM-864
IBM-1046 to IBM-864
Arabic
IBM-037_ISO8859-1
IBM-037 to ISO8859-1
U.S. English, Portuguese, Canadian French
IBM-273_ISO8859-1
IBM-273 to ISO8859-1
German
IBM-277_ISO8859-1
IBM-277 to ISO8859-1
Danish, Norwegian
IBM-278_ISO8859-1
IBM-278 to ISO8859-1
Finnish, Swedish
IBM-280_ISO8859-1
IBM-280 to ISO8859-1
Italian
IBM-281_ISO8859-1
IBM-281 to ISO8859-1
Japanese-Latin
IBM-284_ISO8859-1
IBM-284 to ISO8859-1
Spanish
IBM-285_ISO8859-1
IBM-285 to ISO8859-1
U.K. English
IBM-297_ISO8859-1
IBM-297 to ISO8859-1
French
IBM-420_ISO8859-6
IBM-420 to ISO8859-6
Arabic
IBM-424_ISO8859-8
IBM-424 to ISO8859-8
Hebrew
IBM-500_ISO8859-1
IBM-500 to ISO8859-1
Belgian, Swiss German
IBM-803_ISO8859-8
IBM-803 to ISO8859-8
Hebrew
IBM-852_ISO8859-2
IBM-852 to ISO8859-2
Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian Latin, Slovak, Slovene
The following list describes the Multibyte Code Set converters that are found in the /usr/lib/nls/loc/iconv directory.
Converter
Description
IBM-eucJP_IBM-932
IBM-eucJP to IBM-932
IBM-eucJP_IBM-943
IBM-eucJP to IBM-943
IBM-eucJP_IBM-930
IBM-eucJP to IBM-930
IBM-eucCN_IBM-936(PC5550)
IBM-eucCN to IBM-936(PC5550)
IBM-eucCN_IBM-935
IBM-eucCN to IBM-935
IBM-eucJP_IBM-939
IBM-eucJP to IBM-939
IBM-eucCN_IBM-1381
IBM-eucCN to IBM-1381
IBM-943_IBM-932
IBM-943 to IBM-932
IBM-932_IBM-943
IBM-932 to IBM-943
IBM-930_IBM-932
IBM-930 to IBM-932
IBM-930_IBM-943
IBM-930 to IBM-943
IBM-930_IBM-eucJP
IBM-930 to IBM-eucJP
IBM-932_IBM-eucJP
IBM-932 to IBM-eucJP
IBM-932_IBM-930
IBM-932 to IBM-930
IBM-943_IBM-eucJP
IBM-943 to IBM-eucJP
IBM-943_IBM-930
IBM-943 to IBM-930
IBM-936(PC5550)_IBM-935
IBM-936(PC5550) to IBM-935
IBM-936_IBM-935
IBM-936 to IBM-935
IBM-932_IBM-939
IBM-932 to IBM-939
IBM-939_IBM-932
IBM-939 to IBM-932
IBM-943_IBM-939
IBM-943 to IBM-939
IBM-939_IBM-943
IBM-939 to IBM-943
IBM-935_IBM-936(PC5550)
IBM-935 to IBM-936(PC5550)
IBM-935_IBM-936
IBM-935 to IBM-936
IBM-1381_IBM-935
IBM-1381 to IBM-935
IBM-935_IBM-1381
IBM-935 to IBM-1381
IBM-935_IBM-eucCN
IBM-935 to IBM-eucCN
IBM-936(PC5550)_IBM-eucCN
IBM-936(PC5550) to IBM-eucCN
IBM-eucTW_IBM-eucCN
IBM-eucTW to IBM-eucCN
big5_IBM-eucCN
big5 to IBM-eucCN
IBM-1381_IBM-eucCN
IBM-1381 to IBM-eucCN
IBM-939_IBM-eucJP
IBM-939 to IBM-eucJP
IBM-eucKR_IBM-934
IBM-eucKR to IBM-934
IBM-934_IBM-eucKR
IBM-934 to IBM-eucKR
IBM-eucKR_IBM-933
IBM-eucKR to IBM-933
IBM-933_IBM-eucKR
IBM-933 to IBM-eucKR
IBM-eucTW_IBM-937
IBM-eucTW to IBM-937
IBM-938_IBM-937
IBM-938 to IBM-937
big-5_IBM-937
big-5 to IBM-937
IBM-eucCN_IBM-eucTW
IBM-eucCN to IBM-eucTW
IBM-937_IBM-eucTW
IBM-937 to IBM-eucTW
IBM-937_IBM-938
IBM-937 to IBM-938
IBM-eucTW_IBM-938
IBM_eucTW to IBM_938
IBM-eucCN_big5
IBM-eucCN to big5
IBM-eucTW_big-5
IBM_eucTW to big-5
IBM-937_big-5
IBM-937 to big-5
CNS11643.1992-3_IBM-eucTW
CNS11643.1992-3 to IBM_eucTW
CNS11643.1992-3-GL_IBM-eucTW
CNS11643.1992-3-GL to IBM_eucTW
CNS11643.1992-3-GR_IBM-eucTW
CNS11643.1992-3-GR to IBM_eucTW
CNS11643.1992-4_IBM-eucTW
CNS11643.1992-4 to IBM_eucTW
CNS11643.1992-4-GL_IBM-eucTW
CNS11643.1992-4-GL to IBM_eucTW
CNS11643.1992-4-GR_IBM-eucTW
CNS11643.1992-4-GR to IBM_eucTW
IBM-eucTW_CNS11643.1992-3
IBM_eucTW to CNS11643.1992-3
IBM-eucTW_CNS11643.1992-3-GL
IBM_eucTW to CNS11643.1992-3-GL
IBM-eucTW_CNS11643.1992-3-GR
IBM_eucTW to CNS11643.1992-3-GR
IBM-eucTW_CNS11643.1992-4
IBM_eucTW to CNS11643.1992-4
IBM-eucTW_CNS11643.1992-4-GL
IBM_eucTW to CNS11643.1992-4-GL
IBM-eucTW_CNS11643.1992-4-GR
IBM_eucTW to CNS11643.1992-4-GR
IBM-eucCN_GB2312.1980-1
IBM-eucCN to GB2312.1980-1
IBM-eucCN_GB2312.1980-1-GL
IBM-eucCN to GB2312.1980-1-GL
IBM-eucCN_GB2312.1980-1-GR
IBM-eucCN to GB2312.1980-1-GR
IBM-937_csic
IBM-937 to csic
csic_IBM-937
csic to IBM-937
IBM-938_csic
IBM-938 to csic
csic_IBM-938
csic to IBM-938
IBM-eucTW_ccdc
IBM-eucTW to ccdc
ccdc_IBM-eucTW
ccdc to IBM-eucTW
IBM-eucTW_cns
IBM-eucTW to cns
cns_IBM-eucTW
cnd to IBM-eucTW
IBM-eucTW_csic
IBM-eucTW to csic
csic_IBM-eucTW
csic to IBM-eucTW
IBM-eucTW_sops
IBM-ecuTW to sops
sops_IBM-eucTW
sops to IBM-eucTW
IBM-eucTW_tca
IBM-eucTW to tca
tca_IBM-eucTW
tca to IBM-eucTW
big5_cns
big5 to cns
cns_big5
cns to big5
big5_csic
big5 to csic
csic_big5
csic to big5
big5_ttc
big5 to ttc
ttc_big5
ttc to big5
big5_ttcmin
big5 to ttcmin
ttcmin_big5
ttcmin to big5
big5_unicode
big5 to unicode
unicode_big5
unicode to big5
big5_wang
big5 to wang
wang_big5
wang to big5
ccdc_csic
ccdc to csic
csic_ccdc
csic to_ccdc
csic_sops
csic to sops
sops_csic
sops to csic
CNS11643.1986-1_big5
CNS11643.1986-1 to big5
big5_CNS11643.1986-1
big5 to CNS11643.1986-1
CNS11643.1986-1-GR_big5
CNS11643.1986-1-GR to big5
big5_CNS11643.1986-1-GR
big5 to CNS11643.1986-1-GR
CNS11643.1986-2_big5
CNS11643.1986-2 to big5
big5_CNS11643.1986-2
big5 to CNS11643.1986-2
CNS11643.1986-2-GR_big5
CNS11643.1986-2-GR to big5
big5_CNS11643.1986-2-GR
big5 to CNS11643.1986-2-GR
CNS11643.CT-GR_big5
CNS11643.CT-GR to big5
big5_CNS11643.CT-GR
big5 to CNS11643.CT-GR
IBM-sbdTW-GR_big5
IBM-sbdTW-GR to big5
big5_IBM-sbdTW-GR
big5 to IBM-sbdTW-GR
IBM-sbdTW.CT-GR_big5
IBM-sbdTW.CT-GR to big5
big5_IBM-sbdTW.CT-GR
big5 to IBM-sbdTW.CT-GR
IBM-sbdTW_big5
IBM-sbdTW to big5
big5_IBM-sbdTW
big5 to IBM-sbdTW
IBM-udcTW-GR_big5
IBM-udcTW-GR to big5
big5_IBM-udcTW-GR
big5 to IBM-udcTW-GR
IBM-udcTW.CT-GR_big5
IBM-udcTW.CT-GR to big5
big5_IBM-udcTW.CT-GR
big5 to IBM-udcTW.CT-GR
ISO8859-1_big5
ISO8859 to big5
big5_ISO8859-1
big5 to ISO8859
IBM-sbdTW_big5
IBM-sbdTW to big5
big5_IBM-sbdTW
big5 to IBM-sbdTW
big5_ASCII-GR
big5 to ASCII-GR
ASCII-GR_big5
ASCII-GR to big5
GBK_big5
GBK to big5
big5_GBK
big5 to GBK
GBK_IBM-eucTW
GBK to IBM-eucTW
IBM-eucTW_GBK
IBM-eucTW to GBK
CNS11643.1986-1_GBK
CNS11643.1986-1 to GBK
GBK_CNS11643.1986-1
GBK to CNS11643.1986-1
CNS11643.1986-2_GBK
CNS11643.1986-2 to GBK
GBK_CNS11643.1986-2
GBK to CNS11643.1986-2
CNS11643.1986-1-GR_GBK
CNS11643.1986-1-GR to GBK
GBK_CNS11643.1986-1-GR
GBK to CNS11643.1986-1-GR
CNS11643.1986-2-GR_GBK
CNS11643.1986-2-GR to GBK
GBK_CNS11643.1986-2-GR
GBK to CNS11643.1986-2-GR
CNS11643.1986-1-GL_GBK
CNS11643.1986-1-GL to GBK
GBK_CNS11643.1986-1-GL
GBK to CNS11643.1986-1-GL
CNS11643.1986-2-GL_GBK
CNS11643.1986-2-GL to GBK
GBK_CNS11643.1986-2-GL
GBK to CNS11643.1986-2-GL
CNS11643.CT-GR_GBK
CNS11643.CT-GR to GBK
GBK_CNS11643.CT-GR
GBK to CNS11643.CT-GR
GB2312.1980.CT-GR_GBK
GB2312.1980.CT-GR to GBK
GBK_GB2312.1980.CT-GR
GBK to GB2312.1980.CT-GR
GB2312.1980-0_GBK
GBK2312.1980-0 to GBK
GBK_GB2312.1980-0
GBK to GBK2312.1980-0
GB2312.1980-0-GR_GBK
GB2312.1980-0-GR to GBK
GBK_GB2312.1980-0-GR
GBK to GB2312.1980-0-GR
GB2312.1980-0-GL_GBK
GB2312.1980-0-GL to GBK
GBK_GB2312.1980-0-GL
GBK to GB2312.1980-0-GL
ASCII-GR_GBK
ASCII-GR to GBK
GBK_ASCII-GR
GBK to ASCII-GR
ISO8859-1_GBK
ISO8859-1 to GBK
GBK_ISO8859-1
GBK to ISO8859-1
IBM-eucCN_GBK
IBM-eucCN to GBK
GBK_IBM-eucCN
GBK to IBM-eucCN
Interchange Converters?7-bit
This converter provides conversion between internal code and 7-bit standard interchange formats (fold7). The fold7 name identifies encodings that can be used to pass text data through 7-bit mail protocols. The encodings are based on ISO2022. For more information about fold7, see Understanding libiconv.
The fold7 converters convert characters from a code set to a canonical 7-bit encoding that identifies each character. This type of conversion is useful in networks where clients communicate with different code sets but use the same character sets. For example:
IBM-850 <?> ISO8859-1
Common Latin characters
IBM-932 <?>IBM-eucJP
Common Japanese characters
The following escape sequences designate standard code sets:
Escape Sequence
Standard Code Set
01/11 02/04 04/00
GL JIS X0208.1978-0.
01/11 02/04 02/08 04/01
GL left half of GB2312.1980-0.
01/11 02/08 04/02
GL 7-bit ASCII or left half of ISO8859-1.
01/11 02/14 04/01
GL right half of ISO8859-1.
01/11 02/14 04/02
GL right half of ISO8859-2.
01/11 02/14 04/03
GL right half of ISO8859-3.
01/11 02/14 04/04
GL right half of ISO8859-4.
01/11 02/14 04/06
GL right half of ISO8859-7.
01/11 02/14 04/07
GL right half of ISO8859-6.
01/11 02/14 04/08
GL right half of ISO8859-8.
01/11 02/14 04/12
GL right half of ISO8859-5.
01/11 02/14 04/13
GL right half of ISO8859-9.
01/11 02/08 04/09
GL right half of JIS X0201.1976-0.
01/11 02/08 04/10
GL left half of JIS X0201.1976.
01/11 02/04 04/02
GL JIS X0208.1983-0.
01/11 02/04 02/08 04/02
GL JIS X0208.1983-0.
01/11 02/04 02/08 04/00
GL JISX0208.1978-0.
01/11 02/05 02/15 03/01 M L 06/09 06/02 06/13 02/13 03/08 03/05 03/00 00/02
GL right half of IBM-850 unique characters. Characters common to ISO8859-1 do not use this escape sequence.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02
01/11 02/05 02/15 03/00 M L 05/05 05/04 04/06 02/13 03/07 00/02
UCS-2 encoded as base64; used only for those characters not encoded by any of the other 7-bit escape sequences listed above.
When converting from a code set to fold7, the escape sequence used to designate the code set is chosen according to the order listed. For example, the JISX0208.1983-0 characters use 01/11 01/04 04/02 as the designation.
Files
The following list describes the fold7 converters that are found in the /usr/lib/nls/loc/iconv directory:
Converter
Description
fold7_IBM-850
Interchange format to IBM-850
fold7_IBM-921
Interchange format to IBM-921
fold7_IBM-922
Interchange format to IBM-922
fold7_IBM-932
Interchange format to IBM-932
fold7_IBM-943
Interchange format to IBM-943
fold7_IBM_1124
Interchange format to IBM-1124
fold7_IBM_1129
Interchange format to IBM-1129
fold7_IBM_eucCN
Interchange format to IBM-eucCN
fold7_IBM-eucJP
Interchange format to IBM-eucJP
fold7_IBM-eucKR
Interchange format to IBM-eucKR
fold7_IBM-eucTW
Interchange format to IBM-eucTW
fold7_ISO8859-1
Interchange format to ISO8859-1
fold7_ISO8859-2
Interchange format to ISO8859-2
fold7_ISO8859-3
Interchange format to ISO8859-3
fold7_ISO8859-4
Interchange format to ISO8859-4
fold7_ISO8859-5
Interchange format to ISO8859-5
fold7_ISO8859-6
Interchange format to ISO8859-6
fold7_ISO8859-7
Interchange format to ISO8859-7
fold7_ISO8859-8
Interchange format to ISO8859-8
fold7_ISO8859-9
Interchange format to ISO8859-9
fold7_TIS-620
Interchange format to TIS-620
fold7_UTF-8
Interchange format to UTF-8
fold7_big5
Interchange format to big5
fold7_GBK
Interchange format to GBK
IBM-921_fold7
IBM-921 to interchange format
IBM-922_fold7
IBM-922 to interchange format
IBM-850_fold7
IBM-850 to interchange format
IBM-932_fold7
IBM-932 to interchange format
IBM-943_fold7
IBM-943 to interchange format
IBM-1124_fold7
IBM-1124 to interchange format
IBM-1129_fold7
IBM-1129 to interchange format
IBM-eucCN_fold7
IBM-eucCN to interchange format
IBM-eucJP_fold7
IBM-eucJP to interchange format
IBM-eucKR_fold7
IBM-eucKR to interchange format
IBM-eucTW_fold7
IBM-eucTW to interchange format
ISO8859-1_fold7
ISO8859-1 to interchange format
ISO8859-2_fold7
ISO8859-2 to interchange format
ISO8859-3_fold7
ISO8859-3 to interchange format
ISO8859-4_fold7
ISO8859-4 to interchange format
ISO8859-5_fold7
ISO8859-5 to interchange format
ISO8859-6_fold7
ISO8859-6 to interchange format
ISO8859-7_fold7
ISO8859-7 to interchange format
ISO8859-8_fold7
ISO8859-8 to interchange format
ISO8859-9_fold7
ISO8859-9 to interchange format
TIS-620_fold7
TIS-620 to interchange format
UTF-8_fold7
UTF-8 to interchange format
big5_fold7
big5 to interchange format
GBK_fold7
GBK to interchange format
Interchange Converters?8-bit
This converter provides conversions between internal code and 8-bit standard interchange formats (fold8). The fold8 name identifies encodings that can be used to pass text data through 8-bit mail protocols. The encodings are based on ISO2022. For more information about fold8, see Understanding libiconv.
The fold8 converters convert characters from a specific code set encoding to a canonical 8-bit encoding that identifies each character. This type of conversion is useful in networks where clients communicate with different code sets but use the same character sets. For example:
IBM-850 <?> ISO8859-1
Common Latin characters
IBM-932 <?>IBM-eucJP
Common Japanese characters
The following escape sequences designate standard code sets.
Escape Sequence
Standard Code Set
01/11 02/04 02/09 04/01
GR right half of GB2312.1980-0.
01/11 02/13 04/01
GR right half of ISO8859-1.
01/11 02/13 04/02
GR right half of ISO8859-2.
01/11 02/13 04/03
GR right half of ISO8859-3.
01/11 02/13 04/04
GR right half of ISO8859-4.
01/11 02/13 04/06
GR right half of ISO8859-7.
01/11 02/13 04/07
GR right half of ISO8859-6.
01/11 02/13 04/08
GR right half of ISO8859-8.
01/11 02/13 04/13
GR right half of ISO8859-5.
01/11 02/13 04/13
GR right half of ISO8859-9.
01/11 02/09 04/09
GR right half of JIS X0201.1976-1.
01/11 02/04 02/09 04/02
GR JIS X0208.1983-1.
01/11 02/04 02/09 04/00
GR JISX0208.1978-1.
01/11 02/09 04/02
GR 7-bit ASCII or left half of ISO8859-1.
01/11 02/05 02/15 03/01 M L 04/09 04/02 04/13 02/13 03/08 03/05 03/00 00/02
GR right half of IBM-850 unique characters. Characters common to ISO8859-1 should not use this escape sequence.
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02
GR right half of Japanese user-definable characters.
01/11 02/08 04/02
GL 7-bit ASCII or left half of ISO8859-1.
01/11 02/14 04/01
GL right half of ISO8859-1.
01/11 02/14 04/02
GL right half of ISO8859-2.
01/11 02/14 04/03
GL right half of ISO8859-3.
01/11 02/14 04/04
GL right half of ISO8859-4.
01/11 02/14 04/06
GL right half of ISO8859-7.
01/11 02/14 04/07
GL right half of ISO8859-6.
01/11 02/14 04/08
GL right half of ISO8859-8.
01/11 02/14 04/12
GL right half of ISO8859-5.
01/11 02/14 04/13
GL right half of ISO8859-9.
01/11 02/08 04/09
GL right half of JIS X0201.1976-0.
01/11 02/08 04/10
GL left half of JIS X0201.1976.
01/11 02/04 02/08 04/02
GL JIS X0208.1983-0.
01/11 02/04 04/02
GL JIS X0208.1983-0.
01/11 02/04 04/00
GL JIS X0208.1978-0.
01/11 02/05 02/15 03/01 M L 06/09 06/02 06/13 02/13 03/08 03/05 03/00 00/02
GL right half of IBM-850 unique characters. Characters common to ISO8859-1 do not use this escape sequence.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02
GL Japanese (IBM-udcJP) user-definable characters.
01/11 02/04 02/09 04/03
GR KSC5601-1987.
01/11 02/04 02/09 03/00
GR CNS11643-1986-1.
01/11 02/04 02/10 03/01
GR CNS11643-1986-2.
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/05 06/04 06/03 05/05 05/08 00/02
GR right half of Traditional Chinese user-definable characters.
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/03 06/02 06/04 05/05 05/08 00/02
GR right half of IBM-850 unique symbols.
01/11 02/04 02/08 04/03
GL KSC5601-1987.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 05/05 05/08 00/02
GL Traditional Chinese (IBM-udcTW) user-definable characters.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/03 06/02 06/04 05/05 05/08 00/02
GL Traditional Chinese IBM-850 unique symbols (IBM-shdTW) user-definable characters.
01/11 02/05 02/15 03/00 M L 05/05 05/04 04/06 02/13 03/08 00/02
UCS-2 encoded as UTF-8; used only for those characters not encoded by any of the above escape sequences listed above.
When converting from a code set to fold8, the escape sequence used to designate the code set is chosen according to the order listed. For example, the JISX0208.1983-0 characters use 01/11 02/04 02/08 04/02 as the designation.
Files
The following list describes the fold8 converters found in the /usr/lib/nls/loc/iconv directory:
Converter
Description
fold8_IBM-850
Interchange format to IBM-850
fold8_IBM-921
Interchange format to IBM-921
fold8_IBM-922
Interchange format to IBM-922
fold8_IBM-932
Interchange format to IBM-932
fold8_IBM-943
Interchange format to IBM-943
fold8_IBM-1124
Interchange format to IBM-1124
fold8_IBM-1129
Interchange format to IBM-1129
fold8_IBM-eucCN
Interchange format to IBM-eucCN
fold8_IBM-eucJP
Interchange format to IBM-eucJP
fold8_IBM-eucKR
Interchange format to IBM-eucKR
fold8_IBM-eucTW
Interchange format to IBM-eucTW
fold8_IBM-eucCN
Interchange fromat to IBM-eucCN
fold8_ISO8859-1
Interchange format to ISO8859-1
fold8_ISO8859-2
Interchange format to ISO8859-2
fold8_ISO8859-3
Interchange format to ISO8859-3
fold8_ISO8859-4
Interchange format to ISO8859-4
fold8_ISO8859-5
Interchange format to ISO8859-5
fold8_ISO8859-6
Interchange format to ISO8859-6
fold8_ISO8859-7
Interchange format to ISO8859-7
fold8_ISO8859-8
Interchange format to ISO8859-8
fold8_ISO8859-9
Interchange format to ISO8859-9
fold8_TIS-620
Interchange format to TIS-620
fold8_UTF-8
Interchange format to UTF-8
fold8_big5
Interchange format to big5
fold8_GBK
Interchange format to GBK
IBM-921_fold8
IBM-921 to interchange format
IBM-922_fold8
IBM-922 to interchange format
IBM-850_fold8
IBM-850 to interchange format
IBM-932_fold8
IBM-932 to interchange format
IBM-943_fold8
IBM-943 to interchange format
IBM-1124_fold8
IBM-1124 to interchange format
IBM-1129_fold8
IBM-1129 to interchange format
IBM-eucCN_fold8
IBM-eucCN to interchange format
IBM-eucJP_fold8
IBM-eucJP to interchange format
IBM-eucKR_fold8
IBM-eucKR to interchange format
IBM-eucTW_fold8
IBM-eucTW to interchange format
IBM-eucCN_fold8
IBM-eucCN to interchange format
ISO8859-1_fold8
ISO8859-1 to interchange format
ISO8859-2_fold8
ISO8859-2 to interchange format
ISO8859-3_fold8
ISO8859-3 to interchange format
ISO8859-4_fold8
ISO8859-4 to interchange format
ISO8859-5_fold8
ISO8859-5 to interchange format
ISO8859-6_fold8
ISO8859-6 to interchange format
ISO8859-7_fold8
ISO8859-7 to interchange format
ISO8859-8_fold8
ISO8859-8 to interchange format
ISO8859-9_fold8
ISO8859-9 to interchange format
TIS-620_fold8
TIS-620 to interchange format
UTF-8_fold8
UTF-8 to interchange format
big5_fold8
big5 to interchange format
GBK_fold8
GBK to interchange format
Interchange Converters?Compound Text
Compound text interchange converters convert between compound text and internal code sets.
Compound text is an interchange encoding defined by the X Consortium. It is used to communicate text between X clients. Compound text is based on ISO2022 and can encode most character sets using standard escape sequences. It also provides extensions for encoding private character sets. The supported code sets provide a converter to and from compound text. The name used to identify the compound text encoding is ct.
The following escape sequences are used to designate standard code sets in the order listed below.
01/11 02/05 02/15 03/01 M L 04/09 04/02 04/13 02/13 03/08 03/05 03/00 00/02
GR right half of IBM-850 unique characters. Characters common to ISO8859-1 should not use this escape sequence.
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02
GR right half of Japanese user-definable characters.
01/11 02/05 02/15 03/01 M L 06/09 06/02 06/13 02/13 03/08 03/05 03/00 00/02
GL right half of IBM-850 unique characters. Characters common to ISO8859-1 do not use this escape sequence.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02
GL Japanese (IBM-udcJP) user-definable characters.
Files
The following list describes the compound text converters that are found in the /usr/lib/nls/loc/iconv directory:
During conversion from uucode, 62 bytes at a time (including a new-line character trailing the record) are converted, and generating 45 bytes in outbuf.
Files
The following list describes the uucode converters found in the /usr/lib/nls/loc/iconv directory:
Converter
Description
IBM-850_uucode
IBM-850 to uucode
IBM-921_uucode
IBM-921 to uucode
IBM-922_uucode
IBM-922 to uucode
IBM-932_uucode
IBM-932 to uucode
IBM-943_uucode
IBM-943 to uucode
IBM-1124_uucode
IBM-1124 to uucode
IBM-1129_uucode
IBM-1129 to uucode
IBM-eucJP_uucode
IBM-eucJP to uucode
IBM-eucKR_uucode
IBM-eucKR to uucode
IBM-eucTW_uucode
IBM-eucTW to uucode
IBM-eucCN_uucode
IBM-eucCN to uucode
ISO8859-1_uucode
ISO8859-1 to uucode
ISO8859-2_uucode
ISO8859-2 to uucode
ISO8859-3_uucode
ISO8859-3 to uucode
ISO8859-4_uucode
ISO8859-4 to uucode
ISO8859-5_uucode
ISO8859-5 to uucode
ISO8859-6_uucode
ISO8859-6 to uucode
ISO8859-7_uucode
ISO8859-7 to uucode
ISO8859-8_uucode
ISO8859-8 to uucode
ISO8859-9_uucode
ISO8859-9 to uucode
TIS-620_uucode
TIS-620 to uucode
big5_uucode
big5 to uucode
GBK_uucode
GBK to uucode
uucode_IBM-850
uucode to IBM-850
uucode_IBM-921
uucode to IBM-921
uucode_IBM-922
uucode to IBM-922
uucode_IBM-932
uucode to IBM-932
uucode_IBM-943
uucode to IBM-943
uucode_IBM-1124
uucode to IBM-1124
uucode_IBM-1129
uucode to IBM-1129
uucode_IBM-eucCN
uucode to IBM-eucCN
uucode_IBM-eucJP
uucode to IBM-eucJP
uucode_IBM-eucKR
uucode to IBM-eucKR
uucode_IBM-eucTW
uucode to IBM-eucTW
uucode_ISO8859-1
uucode to ISO8859-1
uucode_ISO8859-2
uucode to ISO8859-2
uucode_ISO8859-3
uucode to ISO8859-3
uucode_ISO8859-4
uucode to ISO8859-4
uucode_ISO8859-5
uucode to ISO8859-5
uucode_ISO8859-6
uucode to ISO8859-6
uucode_ISO8859-7
uucode to ISO8859-7
uucode_ISO8859-8
uucode to ISO8859-8
uucode_ISO8859-9
uucode to ISO8859-9
uucode_TIS-1124
uucode to TIS-1129
uucode_big5
uucode to big5
uucode_GBK
uucode to GBK
UCS-2 Interchange Converters
UCS-2 uses a universal 16-bit encoding. Conversions for each code set are provided in both directions, between the code set and UCS-2. For more information, see Code Sets for National Language Support.
UCS-2 converters are found in /usr/lib/nls/loc/uconvTable and /usr/lib/nls/loc/uconv directories. The uconvdef command is used to generate new converters or to customize existing UCS-2 converters.
Converter
Description
ISO8859-1
UCS-2 <?> ISO Latin-1
ISO8859-2
UCS-2 <?> ISO Latin-2
ISO8859-3
UCS-2 <?> ISO Latin-3
ISO8859-4
UCS-2 <?> ISO Baltic
ISO8859-5
UCS-2 <?> ISO Cyrillic
ISO8859-6
UCS-2 <?> ISO Arabic
ISO8859-7
UCS-2 <?> ISO Greek
ISO8859-8
UCS-2 <?> ISO Hebrew
ISO8859-9
UCS-2 <?> ISO Turkish
JISX0201.1976-0
UCS-2 <?> Japanese JISX0201-0
JISX0208.1983-0
UCS-2 <?> Japanese JISX0208-0
CNS11643.1986-1
UCS-2 <?> Chinese CNS11643-1
CNS11643.1986-2
UCS-2 <?> Chinese CNS11643-2
KSC5601.1987-0
UCS-2 <?> Korean KSC5601-0
IBM-eucCN
UCS-2 <?> Simplified Chinese EUC
IBM-udcCN
UCS-2 <?> Simplified Chinese user-defined characters
IBM-sbdCN
UCS-2 <?> Simplified Chinese IBM-specific characters
GB2312.1980-0
UCS-2 <?> Simplified Chinese GB
IBM-1381
UCS-2 <?> Simplified Chinese PC data code
IBM-935
UCS-2 <?> Simplified Chinese EBCDIC
IBM-936
UCS-2 <?> Simplified Chinese PC5550
IBM-eucJP
UCS-2 <?> Japanese EUC
IBM-eucKR
UCS-2 <?> Korean EUC
IBM-eucTW
UCS-2 <?> Traditional Chinese EUC
IBM-udcJP
UCS-2 <?> Japanese user-defined characters
IBM-udcTW
UCS-2 <?> Traditional Chinese user-defined characters
IBM-sbdTW
UCS-2 <?> Traditional Chinese IBM-specific characters
UTF-8
UCS-2 <?> UTF-8
IBM-437
UCS-2 <?> USA PC data code
IBM-850
UCS-2 <?> Latin-1 PC data code
IBM-852
UCS-2 <?> Latin-2 PC data code
IBM-857
UCS-2 <?> Turkish PC data code
IBM-860
UCS-2 <?> Portuguese PC data code
IBM-861
UCS-2 <?> Icelandic PC data code
IBM-863
UCS-2 <?> French Canadian PC data code
IBM-865
UCS-2 <?> Nordic PC data code
IBM-869
UCS-2 <?> Greek PC data code
IBM-921
UCS-2 <?> Baltic Multilingual data code
IBM-922
UCS-2 <?> Estonian data code
IBM-932
UCS-2 <?> Japanese PC data code
IBM-943
UCS-2 <?> Japanese PC data code
IBM-934
UCS-2 <?> Korea PC data code
IBM-936
UCS-2 <?> People's Republic of China PC data code
IBM-938
UCS-2 <?> Taiwanese PC data code
IBM-942
UCS-2 <?> Extended Japanese PC data code
IBM-944
UCS-2 <?> Korean PC data code
IBM-946
UCS-2 <?> People's Republic of China SAA data code
IBM-948
UCS-2 <?> Traditional Chinese PC data code
IBM-1124
UCS-2 <?> Ukranian PC data code
IBM-1129
UCS-2 <?> Vietnamese PC data code
TIS-620
UCS-2 <?> Thailand PC data code
IBM-037
UCS-2 <?> USA, Canada EBCDIC
IBM-273
UCS-2 <?> Germany, Austria EBCDIC
IBM-277
UCS-2 <?> Denmark, Norway EBCDIC
IBM-278
UCS-2 <?> Finland, Sweden EBCDIC
IBM-280
UCS-2 <?> Italy EBCDIC
IBM-284
UCS-2 <?> Spain, Latin America EBCDIC
IBM-285
UCS-2 <?> United Kingdom EBCDIC
IBM-297
UCS-2 <?> France EBCDIC
IBM-500
UCS-2 <?> International EBCDIC
IBM-875
UCS-2 <?> Greek EBCDIC
IBM-930
UCS-2 <?> Japanese Katakana-Kanji EBCDIC
IBM-933
UCS-2 <?> Korean EBCDIC
IBM-937
UCS-2 <?> Traditional Chinese EBCDIC
IBM-939
UCS-2 <?> Japanese Latin-Kanji EBCDIC
IBM-1026
UCS-2 <?> Turkish EBCDIC
IBM-1112
UCS-2 <?> Baltic Multilingual EBCDIC
IBM-1122
UCS-2 <?> Estonian EBCDIC
IBM-1124
UCS-2 <?> Ukranian EBCDIC
IBM-1129
UCS-2 <?> Vietnamese EBCDIC
TIS-620
UCS-2 <?>Thailand EBCDIC
UTF-8 Interchange Converters
UTF-8 is a universal, multibyte encoding described in the UCS-2 and UTF-8. Conversions for each code set are provided in both directions, between the code set and UTF-8.
UTF-8 conversions are usually done by using the Universal_UCS_Conv and /usr/lib/nls/loc/uconv/UTF-8 converter. For more information, see UCS-2 Interchange Converters.
Converter
Description
ISO8859-1
UTF-8 <?> ISO Latin-1
ISO8859-2
UTF-8 <?> ISO Latin-2
ISO8859-3
UTF-8 <?> ISO Latin-3
ISO8859-4
UTF-8 <?> ISO Baltic
ISO8859-5
UTF-8 <?> ISO Cyrillic
ISO8859-6
UTF-8 <?> ISO Arabic
ISO8859-7
UTF-8 <?> ISO Greek
ISO8859-8
UTF-8 <?> ISO Hebrew
ISO8859-9
UTF-8 <?> ISO Turkish
JISX0201.1976-0
UTF-8 <?> Japanese JISX0201-0
JISX0208.1983-0
UTF-8 <?> Japanese JISX0208-0
CNS11643.1986-1
UTF-8 <?> Chinese CNS11643-1
CNS11643.1986-2
UTF-8 <?> Chinese CNS11643-2
KSC5601.1987-0
UTF-8 <?> Korean KSC5601-0
IBM-eucCN
UTF-8 <?> Simplified Chinese EUC
IBM-eucJP
UTF-8 <?> Japanese EUC
IBM-eucKR
UTF-8 <?> Korean EUC
IBM-eucTW
UTF-8 <?> Traditional Chinese EUC
IBM-udcJP
UTF-8 <?> Japanese user-defined characters
IBM-udcTW
UTF-8 <?> Traditional Chinese user-defined characters
IBM-sbdTW
UTF-8 <?> Traditional Chinese IBM-specific characters
UCS-2
UTF-8 <?> UCS-2
IBM-437
UTF-8 <?> USA PC data code
IBM-850
UTF-8 <?> Latin-1 PC data code
IBM-852
UTF-8 <?> Latin-2 PC data code
IBM-857
UTF-8 <?> Turkish PC data code
IBM-860
UTF-8 <?> Portuguese PC data code
IBM-861
UTF-8 <?> Icelandic PC data code
IBM-863
UTF-8 <?> French Canadian PC data code
IBM-865
UTF-8 <?> Nordic PC data code
IBM-869
UTF-8 <?> Greek PC data code
IBM-921
UTF-8 <?> Baltic Multilingual data code
IBM-922
UTF-8 <?> Estonian data code
IBM-932
UTF-8 <?> Japanese PC data code
IBM-943
UTF-8 <?> Japanese PC data code
IBM-934
UTF-8 <?> Korea PC data code
IBM-935
UTF-8 <?> Simplified Chinese EBCDIC
IBM-936
UTF-8 <?> People's Republic of China PC data code
IBM-938
UTF-8 <?> Taiwanese PC data code
IBM-942
UTF-8 <?> Extended Japanese PC data code
IBM-944
UTF-8 <?> Korean PC data code
IBM-946
UTF-8 <?> People's Republic of China SAA data code
IBM-948
UTF-8 <?> Traditional Chinese PC data code
IBM-1124
UTF-8 <?> Ukrainian PC data code
IBM-1129
UTF-8 <?> Vietnamese PC data code
TIS-620
UTF-8 <?> Thailand PC data code
IBM-037
UTF-8 <?> USA, Canada EBCDIC
IBM-273
UTF-8 <?> Germany, Austria EBCDIC
IBM-277
UTF-8 <?> Denmark, Norway EBCDIC
IBM-278
UTF-8 <?> Finland, Sweden EBCDIC
IBM-280
UTF-8 <?> Italy EBCDIC
IBM-284
UTF-8 <?> Spain, Latin America EBCDIC
IBM-285
UTF-8 <?> United Kingdom EBCDIC
IBM-297
UTF-8 <?> France EBCDIC
IBM-500
UTF-8 <?> International EBCDIC
IBM-875
UTF-8 <?> Greek EBCDIC
IBM-930
UTF-8 <?> Japanese Katakana-Kanji EBCDIC
IBM-933
UTF-8 <?> Korean EBCDIC
IBM-937
UTF-8 <?> Traditional Chinese EBCDIC
IBM-939
UTF-8 <?> Japanese Latin-Kanji EBCDIC
IBM-1026
UTF-8 <?> Turkish EBCDIC
IBM-1112
UTF-8 <?> Baltic Multilingual EBCDIC
IBM-1122
UTF-8 <?> Estonian EBCDIC
IBM-1124
UTF-8 <?> Ukranian EBCDIC
IBM-1129
UTF-8 <?> Vietnamese EBCDIC
IBM-1381
UTF-8 <?> Simplified Chinese PC data code
GB18030
UTF-8<?> Simplified Chinese
TIS-620
UTF-8 <?> Thailand EBCDIC
Miscellaneous Converters
A set of low-level converters used by the code set and interchange converters is provided. These converters are called miscellaneous converters. These low-level converters may be used by some of the interchange converters. However, the use of these converters is discouraged because they are intended for support of other converters.
Files
The following list describes the miscellaneous converters found in the /usr/lib/nls/loc/iconv and /usr/lib/nls/loc/iconvTable directories:
Converter
Description
IBM-932_JISX0201.1976-0
IBM-932 to JISX0201.1976-0
IBM-932_JISX0208.1983-0
IBM-932 to JISX0208.1983-0
IBM-932_IBM-udcJP
IBM-932 to IBM-udcJP (Japanese user-defined characters)
IBM-943_JISX0201.1976-0
IBM-943 to JISX0201.1976-0
IBM-943_JISX0208.1983-0
IBM-943 to JISX0208.1983-0
IBM-943_IBM-udcJP
IBM-943 to IBM-udcJP (Japanese user-defined characters
IBM-eucJP_JISX0201.1976-0
IBM-eucJP to JISX0201.1976-0
IBM-eucJP_JISX0208.1983-0
IBM-eucJP to JISX0208.1983-0
IBM-eucJP_IBM-udcJP
IBM-eucJP to IBM-udcJP (Japanese user-defined characters)
Any converter installed in the system can be used through the iconv command, which uses the iconv library. The iconv command acts as a filter for converting from one code set to another. For example, the following command filters data from PC Code (IBM-850) to ISO8859-1:
The iconv command converts the encoding of characters read from either standard input or the specified file and then writes the results to standard output.
Understanding libiconv
The iconv application programming interface (API) consists of the following subroutines that accomplish conversion:
Performs the initialization required to convert characters from the code set specified by the FromCode parameter to the code set specified by the ToCode parameter. The strings specified are dependent on the converters installed in the system. If initialization is successful, the converter descriptor, iconv_t, is returned in its initial state.
Invokes the converter function using the descriptor obtained from the iconv_open subroutine. The inbuf parameter points to the first character in the input buffer, and the inbytesleft parameter indicates the number of bytes to the end of the buffer being converted. The outbuf parameter points to the first available byte in the output buffer, and the outbytesleft parameter indicates the number of available bytes to the end of the buffer.
For state-dependent encoding, the subroutine is placed in its initial state by a call for which the inbuf value is a null pointer. Subsequent calls with the inbuf parameter as something other than a null pointer cause the internal state of the function to be altered as necessary.
Communication with system using different code set (or receiver's code set is unknown)
Protocol
Protocol
Method to choose
7-bit only
8-bit
7-bit only
8-bit
as is
Not valid
Best choice
Not valid
Not valid if remote code set is unknown
fold7
OK
OK
Best choice
OK
fold8
Not valid
OK
Not valid
Best choice
uucode
Best choice
OK
Not valid
Not valid
If the sender uses the same code set as the receiver, the following possibilities exist:
When protocol allows 8-bit data, the data can be sent without conversions.
When protocol allows only 7-bit data, the 8-bit code points must be mapped to 7-bit values. Use the iconv interface and one of the following methods:
uucode
Provides the same mapping as the uuencode and uudecode commands. This is the recommended method. For more information, see Interchange Converters?uucode.
7?bit
Converts internal code sets using 7-bit data. This method passes ASCII without any change. For more information, see Interchange Converters?7-bit.
If the sender uses a code set different from the receiver, there are two possibilities:
When protocol allows only 7-bit data, use the fold7 method.
When protocol allows 8-bit data and you know the receiver's code set, use the iconv interface to convert the data. If you do not know the receiver's code set, use the following method:
8?bit
Converts internal code sets to standard interchange formats. The 8-bit data is transmitted and the information is preserved so that the receiver can reconstruct the data in its code set. For more information, see Interchange Converters?8-bit.
Using the iconv_open Subroutine
The following examples illustrate how to use the iconv_open subroutine in different situations:
When the sender and receiver use the same code sets, and if the protocol allows 8-bit data, you can send data without converting it. If the protocol allows only 7-bit data, do the following:
Sender:
cd = iconv_open("uucode", nl_langinfo(CODESET));
Receiver:
cd = iconv_open(nl_langinfo(CODESET), "uucode");
Whne the sender and receiver use different code sets, and if the protocol allows 8-bit data and the receiver's code set is unknown, do the following:
Sender:
cd = iconv_open("fold8", nl_langinfo(CODESET));
Receiver:
cd = iconv_open(nl_langinfo(CODESET),"fold8" );
If the protocol allows only 7-bit data, do the following:
Sender:
cd = iconv_open("fold7", nl_langinfo(CODESET));
Receiver:
cd = iconv_open(nl_langinfo(CODESET), "fold7" );
The iconv_open subroutine uses the LOCPATH environment variable to search for a converter whose name is in the following form:
iconv/FromCodeSet_ToCodeSet
The FromCodeSet string represents the sender's code set, and the ToCodeSet string represents the receiver's code set. The underscore character separates the two strings.
Note:
All setuid and setgid programs ignore the LOCPATH environment variable.
Because the iconv converter is a loadable object module, a different object is required when running in the 64-bit environment. In the 64-bit environment, the iconv_open routine uses the LOCPATH environment variable to search for a converter whose name is in the following form:
iconv/FromCodeSet_ToCodeSet__64.
The iconv library automatically chooses whether to load the standard converter object or the 64-bit converter object. If the iconv_open subroutine does not find the converter, it uses the from,to pair to search for a file that defines a table-driven conversion. The file contains a conversion table created by the genxlt command.
The iconvTable converter uses the LOCPATH environment variable to search for a file whose name is in the following form:
iconvTable/FromCodeSet_ToCodeSet
If the converter is found, it performs a load operation and is initialized. The converter descriptor, iconv_t, is returned in its initial state.
Converter Programs versus Tables
Converter programs are executable functions that convert data according to a set of rules. Converter tables are single-byte conversion tables that perform stateless conversions. Programs and tables are in separate directories, as follows:
/usr/lib/nls/loc/iconv
Converter programs
/usr/lib/nls/loc/iconvTable
Converter tables
After a converter program is compiled and linked with the libiconv.a library, the program is placed in the /usr/lib/nls/loc/iconv directory.
To build a table converter, build a source converter table file. Use the genxlt command to compile translation tables into a format understood by the table converter. The output file is then placed in the /usr/lib/nls/loc/iconvTable directory.
Unicode and Universal Converters
Unicode (or UCS-2) conversion tables are found in:
$LOCPATH/uconvTable/*CodeSet*
The $LOCPATH/uconv/UCSTBL converter program is used to perform the conversion to and from UCS-2 using the iconv utilities.
A Universal converter program is provided that can be used to convert between any two code sets whose conversions to and from UCS-2 is defined. Given the following uconv tables:
X -> UCS-2
UCS-2 -> Y
a universal conversion can be defined that maps the following:
X -> UCS-2 -> Y
by use of the $LOCPATH/iconv/Universal_UCS_Conv.
Universal UCS Converter
UCS-2 is a universal 16-bit encoding that can be used as an interchange medium to provide conversion capability between virtually any code sets. The conversion can be accomplished using the Universal UCS Converter, which converts between any two code sets XXX and YYY as follows:
XXX <-> UTF-32 <-> YYY
The XXX and YYY conversions must be included in the supported List of UCS-2 Interchange Converters, and must be installed on the system.
The universal converter is installed as the file /usr/lib/nls/loc/iconv/Universal_UCS_Conv.
The conversion between multibyte and wide character code depends on the current locale setting. Do not exchange wide character codes between two processes, unless you have knowledge that each locale that might be used handles wide character codes in a consistent fashion. Most locales for this operating system use the Unicode character value as a wide character code, except locales based on IBM-eucTW codesets.
Using Converters
The iconv interface is a set of the following subroutines used to open, perform, and close conversions:
The following example shows how you can use these subroutines to create a code set conversion filter that accepts the ToCode and FromCode parameters as input arguments:
/*
* After the next operation,ibuf will
* contain new data plus any truncated
* data left from the previous read.
*/
ileft+=fread(ibuf+ileft,1,BUFSIZ-ileft,stdin);
do {
ip=ibuf;
op=obuf;
oleft=BUFSIZ;
r=iconv(cd,&ip,&ileft,&op,&oleft);
if(ICONV_INVAL()){
fprintf(stderr,
catgets(catd,NL_SETD,ERROR,"invalid input\n"));
exit(2);
}
fwrite(obuf,1,BUFSIZ-oleft,stdout);
if(ICONV_TRUNC() || ICONV_OVER())
/*
*Data remaining in buffer-copy
*it to the beginning
*/
memcpy(ibuf,ip,ileft);
/*
*loop until all characters in the input
*buffer have been converted.
*/
} while(ICONV_OVER());
}
if(ileft!=0){
/*
*This can only happen if the last call
*to iconv() returned ICONV_TRUNC, meaning
*the last data in the input stream was
*incomplete.
*/
fprintf(stderr,catgets(catd,NL_SETD,INCOMP,"input incomplete\n"));
exit(3);
}
iconv_close(cd);
exit(0);
}
Naming Converters
Code set names are in the form CodesetRegistry-CodesetEncoding where:
CodesetRegistry
Identifies the registration authority for the encoding. The CodesetRegistry must be made of characters from the portable code set (usually A-Z and 0-9).
CodesetEncoding
Identifies the coded character set defined by the registered authority.
The from,to variable used by the iconv command and iconv_open subroutine identifies a file whose name should be in the form /usr/lib/nls/loc/iconv/%f_%t or /usr/lib/nls/loc/iconvTable/%f_%t, where:
%f
Represents the FromCode set name
%t
Represents the ToCode set name
List of Converters
Converters change data from one code set to another. The sets of converters supported with the iconv library are listed in the following sections. All converters shipped with the BOS Runtime Environment are located in the /usr/lib/nls/loc/iconv/* or /usr/lib/nls/loc/iconvTable/* directory.
These directories also contain private converters; that is, they are used by other converters. However, users and programs should only depend on the converters in the following lists.
Any converter shipped with the BOS Runtime Environment and not listed here should be considered private and subject to change or deletion. Converters supplied by other products can be placed in the /usr/lib/nls/loc/iconv/* or /usr/lib/nls/loc/iconvTable/* directory.
Programmers are encouraged to use registered code set names or code set names associated with an application. The X Consortium maintains a registry of code set names for reference. See Code Sets for National Language Support for more information about code sets.
PC, ISO, and EBCDIC Code Set Converters
These converters provide conversion between PC, ISO, and EBCDIC single-byte stateless code sets. The following types of conversions are supported: PC to/from ISO, PC to/from EBCDIC, and ISO to/from EBCDIC.
Conversion is provided between compatible code sets such as Latin-1 to Latin-1 and Greek to Greek. However, conversion between different EBCDIC national code sets is not supported. For information about converting between incompatible character sets, refer to the Interchange Converters?7-bit and the Interchange Converters?8-bit.
Conversion tables in the iconvTable directory are created by the genxlt command.
Compatible Code Set Names
The following table lists code set names that are compatible. Each line defines to/from strings that may be used when requesting a converter.
Note:
The PC and ISO code sets are ASCII-based.
Code Set Compatibility
Character Set
Languages
PC
ISO
EBCDIC
Latin-1
U.S. English, Portuguese, Canadian French
N/A
ISO8859-1
IBM-037
Latin-1
Danish, Norwegian
N/A
ISO8859-1
IBM-277
Latin-1
Finnish, Swedish
N/A
ISO8859-1
IBM-278
Latin-1
Italian
N/A
ISO8859-1
IBM-280
Latin-1
Japanese
N/A
ISO8859-1
IBM-281
Latin-1
Spanish
N/A
ISO8859-1
IBM-284
Latin-1
U.K. English
N/A
ISO8859-1
IBM-285
Latin-1
German
N/A
ISO8859-1
IBM-273
Latin-1
French
N/A
ISO8859-1
IBM-297
Latin-1
Belgian, Swiss German
N/A
ISO8859-1
IBM-500
Latin-2
Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian Latin, Slovak, Slovene
IBM-852
ISO88859-2
IBM-870
Cyrillic
Bulgarian, Macedonian, Serbian Cyrillic, Russian
IBM-855
ISO8859-5
IBM-880 IBM-1025
Cyrillic
Russian
IBM-866
ISO8859-5
IBM-1025
Hebrew
Hebrew
IBM-856 IBM-862
ISO8859-8
IBM-424 IBM-803
Turkish
Turkish
IBM-857
ISO8859-9
IBM-1026
Arabic
Arabic
IBM-864 IBM-1046
ISO8859-6
IBM-420
Greek
Greek
IBM-869
ISO8859-7
IBM-875
Greek
Greek
IBM-869
ISO8859-7
IBM-875
Baltic
Lithuanian, Latvian, Estonian
IBM-921 IBM-922
ISO8859-4
IBM-1112 IBM-1122
Note:
A character that exists in the source code set but does not exist in the target code set is converted to a converter-defined substitute character.
Files
The following table describes the inconvTable converters found in the /usr/lib/nls/loc/iconvTable directory:
iconvTable Converters
Converter Table
Description
Language
IBM-037_IBM-850
IBM-037 to IBM-850
U.S. English, Portuguese, Canadian-French
IBM-273_IBM-850
IBM-273 to IBM-850
German
IBM-277_IBM-850
IBM-277 to IBM-850
Danish, Norwegian
IBM-278_IBM-850
IBM-278 to IBM-850
Finnish, Swedish
IBM-280_IBM-850
IBM-280 to IBM-850
Italian
IBM-281_IBM-850
IBM-281 to IBM-850
Japanese-Latin
IBM-284_IBM-850
IBM-284 to IBM-850
Spanish
IBM-285_IBM-850
IBM-285 to IBM-850
U.K. English
IBM-297_IBM-850
IBM-297 to IBM-850
French
IBM-420_IBM_1046
IBM-420 to IBM-1046
Arabic
IBM-424_IBM-856
IBM-424 to IBM-856
Hebrew
IBM-424_IBM-862
IBM-424 to IBM-862
Hebrew
IBM-500_IBM-850
IBM-500 to IBM-850
Belgian, Swiss German
IBM-803_IBM-856
IBM-803 to IBM-856
Hebrew
IBM-803_IBM-862
IBM-803 to IBM-862
Hebrew
IBM-850_IBM-037
IBM-850 to IBM-037
U.S. English, Portuguese, Canadian-French
IBM-850_IBM-273
IBM-850 to IBM-273
German
IBM-850_IBM-277
IBM-850 to IBM-277
Danish, Norwegian
IBM-850_IBM-278
IBM-850 to IBM-278
Finnish, Swedish
IBM-850_IBM-280
IBM-850 to IBM-280
Italian
IBM-850_IBM-281
IBM-850 to IBM-281
Japanese-Latin
IBM-850_IBM-284
IBM-850 to IBM-284
Spanish
IBM-850_IBM-285
IBM-850 to IBM-285
U.K. English
IBM-850_IBM-297
IBM-850 to IBM-297
French
IBM-850_IBM-500
IBM-850 to IBM-500
Belgian, Swiss German
IBM-856_IBM-424
IBM-856 to IBM-424
Hebrew
IBM-856_IBM-803
IBM-856 to IBM-803
Hebrew
IBM-856_IBM-862
IBM-856 to IBM-862
Hebrew
IBM-862_IBM-424
IBM-862 to IBM-424
Hebrew
IBM-862_IBM-803
IBM-862 to IBM-803
Hebrew
IBM-862_IBM-856
IBM-862 to IBM-856
Hebrew
IBM-864_IBM-1046
IBM-864 to IBM-1046
Arabic
IBM-921_IBM-1112
IBM-921 to IBM-1112
Lithuanian, Latvian
IBM-922_IBM-1122
IBM-922 to IBM-1122
Estonian
IBM-1112_IBM-921
IBM-1121 to IBM-921
Lithuanian, Latvian
IBM-1122_IBM-922
IBM-1122 to IBM-922
Estonian
IBM-1046_IBM-420
IBM-1046 to IBM-420
Arabic
IBM-1046_IBM-864
IBM-1046 to IBM-864
Arabic
IBM-037_ISO8859-1
IBM-037 to ISO8859-1
U.S. English, Portuguese, Canadian French
IBM-273_ISO8859-1
IBM-273 to ISO8859-1
German
IBM-277_ISO8859-1
IBM-277 to ISO8859-1
Danish, Norwegian
IBM-278_ISO8859-1
IBM-278 to ISO8859-1
Finnish, Swedish
IBM-280_ISO8859-1
IBM-280 to ISO8859-1
Italian
IBM-281_ISO8859-1
IBM-281 to ISO8859-1
Japanese-Latin
IBM-284_ISO8859-1
IBM-284 to ISO8859-1
Spanish
IBM-285_ISO8859-1
IBM-285 to ISO8859-1
U.K. English
IBM-297_ISO8859-1
IBM-297 to ISO8859-1
French
IBM-420_ISO8859-6
IBM-420 to ISO8859-6
Arabic
IBM-424_ISO8859-8
IBM-424 to ISO8859-8
Hebrew
IBM-500_ISO8859-1
IBM-500 to ISO8859-1
Belgian, Swiss German
IBM-803_ISO8859-8
IBM-803 to ISO8859-8
Hebrew
IBM-852_ISO8859-2
IBM-852 to ISO8859-2
Croatian, Czechoslovakian, Hungarian, Polish, Romanian, Serbian Latin, Slovak, Slovene
The following list describes the Multibyte Code Set converters that are found in the /usr/lib/nls/loc/iconv directory.
Converter
Description
IBM-eucJP_IBM-932
IBM-eucJP to IBM-932
IBM-eucJP_IBM-943
IBM-eucJP to IBM-943
IBM-eucJP_IBM-930
IBM-eucJP to IBM-930
IBM-eucCN_IBM-936(PC5550)
IBM-eucCN to IBM-936(PC5550)
IBM-eucCN_IBM-935
IBM-eucCN to IBM-935
IBM-eucJP_IBM-939
IBM-eucJP to IBM-939
IBM-eucCN_IBM-1381
IBM-eucCN to IBM-1381
IBM-943_IBM-932
IBM-943 to IBM-932
IBM-932_IBM-943
IBM-932 to IBM-943
IBM-930_IBM-932
IBM-930 to IBM-932
IBM-930_IBM-943
IBM-930 to IBM-943
IBM-930_IBM-eucJP
IBM-930 to IBM-eucJP
IBM-932_IBM-eucJP
IBM-932 to IBM-eucJP
IBM-932_IBM-930
IBM-932 to IBM-930
IBM-943_IBM-eucJP
IBM-943 to IBM-eucJP
IBM-943_IBM-930
IBM-943 to IBM-930
IBM-936(PC5550)_IBM-935
IBM-936(PC5550) to IBM-935
IBM-936_IBM-935
IBM-936 to IBM-935
IBM-932_IBM-939
IBM-932 to IBM-939
IBM-939_IBM-932
IBM-939 to IBM-932
IBM-943_IBM-939
IBM-943 to IBM-939
IBM-939_IBM-943
IBM-939 to IBM-943
IBM-935_IBM-936(PC5550)
IBM-935 to IBM-936(PC5550)
IBM-935_IBM-936
IBM-935 to IBM-936
IBM-1381_IBM-935
IBM-1381 to IBM-935
IBM-935_IBM-1381
IBM-935 to IBM-1381
IBM-935_IBM-eucCN
IBM-935 to IBM-eucCN
IBM-936(PC5550)_IBM-eucCN
IBM-936(PC5550) to IBM-eucCN
IBM-eucTW_IBM-eucCN
IBM-eucTW to IBM-eucCN
big5_IBM-eucCN
big5 to IBM-eucCN
IBM-1381_IBM-eucCN
IBM-1381 to IBM-eucCN
IBM-939_IBM-eucJP
IBM-939 to IBM-eucJP
IBM-eucKR_IBM-934
IBM-eucKR to IBM-934
IBM-934_IBM-eucKR
IBM-934 to IBM-eucKR
IBM-eucKR_IBM-933
IBM-eucKR to IBM-933
IBM-933_IBM-eucKR
IBM-933 to IBM-eucKR
IBM-eucTW_IBM-937
IBM-eucTW to IBM-937
IBM-938_IBM-937
IBM-938 to IBM-937
big-5_IBM-937
big-5 to IBM-937
IBM-eucCN_IBM-eucTW
IBM-eucCN to IBM-eucTW
IBM-937_IBM-eucTW
IBM-937 to IBM-eucTW
IBM-937_IBM-938
IBM-937 to IBM-938
IBM-eucTW_IBM-938
IBM_eucTW to IBM_938
IBM-eucCN_big5
IBM-eucCN to big5
IBM-eucTW_big-5
IBM_eucTW to big-5
IBM-937_big-5
IBM-937 to big-5
CNS11643.1992-3_IBM-eucTW
CNS11643.1992-3 to IBM_eucTW
CNS11643.1992-3-GL_IBM-eucTW
CNS11643.1992-3-GL to IBM_eucTW
CNS11643.1992-3-GR_IBM-eucTW
CNS11643.1992-3-GR to IBM_eucTW
CNS11643.1992-4_IBM-eucTW
CNS11643.1992-4 to IBM_eucTW
CNS11643.1992-4-GL_IBM-eucTW
CNS11643.1992-4-GL to IBM_eucTW
CNS11643.1992-4-GR_IBM-eucTW
CNS11643.1992-4-GR to IBM_eucTW
IBM-eucTW_CNS11643.1992-3
IBM_eucTW to CNS11643.1992-3
IBM-eucTW_CNS11643.1992-3-GL
IBM_eucTW to CNS11643.1992-3-GL
IBM-eucTW_CNS11643.1992-3-GR
IBM_eucTW to CNS11643.1992-3-GR
IBM-eucTW_CNS11643.1992-4
IBM_eucTW to CNS11643.1992-4
IBM-eucTW_CNS11643.1992-4-GL
IBM_eucTW to CNS11643.1992-4-GL
IBM-eucTW_CNS11643.1992-4-GR
IBM_eucTW to CNS11643.1992-4-GR
IBM-eucCN_GB2312.1980-1
IBM-eucCN to GB2312.1980-1
IBM-eucCN_GB2312.1980-1-GL
IBM-eucCN to GB2312.1980-1-GL
IBM-eucCN_GB2312.1980-1-GR
IBM-eucCN to GB2312.1980-1-GR
IBM-937_csic
IBM-937 to csic
csic_IBM-937
csic to IBM-937
IBM-938_csic
IBM-938 to csic
csic_IBM-938
csic to IBM-938
IBM-eucTW_ccdc
IBM-eucTW to ccdc
ccdc_IBM-eucTW
ccdc to IBM-eucTW
IBM-eucTW_cns
IBM-eucTW to cns
cns_IBM-eucTW
cnd to IBM-eucTW
IBM-eucTW_csic
IBM-eucTW to csic
csic_IBM-eucTW
csic to IBM-eucTW
IBM-eucTW_sops
IBM-ecuTW to sops
sops_IBM-eucTW
sops to IBM-eucTW
IBM-eucTW_tca
IBM-eucTW to tca
tca_IBM-eucTW
tca to IBM-eucTW
big5_cns
big5 to cns
cns_big5
cns to big5
big5_csic
big5 to csic
csic_big5
csic to big5
big5_ttc
big5 to ttc
ttc_big5
ttc to big5
big5_ttcmin
big5 to ttcmin
ttcmin_big5
ttcmin to big5
big5_unicode
big5 to unicode
unicode_big5
unicode to big5
big5_wang
big5 to wang
wang_big5
wang to big5
ccdc_csic
ccdc to csic
csic_ccdc
csic to_ccdc
csic_sops
csic to sops
sops_csic
sops to csic
CNS11643.1986-1_big5
CNS11643.1986-1 to big5
big5_CNS11643.1986-1
big5 to CNS11643.1986-1
CNS11643.1986-1-GR_big5
CNS11643.1986-1-GR to big5
big5_CNS11643.1986-1-GR
big5 to CNS11643.1986-1-GR
CNS11643.1986-2_big5
CNS11643.1986-2 to big5
big5_CNS11643.1986-2
big5 to CNS11643.1986-2
CNS11643.1986-2-GR_big5
CNS11643.1986-2-GR to big5
big5_CNS11643.1986-2-GR
big5 to CNS11643.1986-2-GR
CNS11643.CT-GR_big5
CNS11643.CT-GR to big5
big5_CNS11643.CT-GR
big5 to CNS11643.CT-GR
IBM-sbdTW-GR_big5
IBM-sbdTW-GR to big5
big5_IBM-sbdTW-GR
big5 to IBM-sbdTW-GR
IBM-sbdTW.CT-GR_big5
IBM-sbdTW.CT-GR to big5
big5_IBM-sbdTW.CT-GR
big5 to IBM-sbdTW.CT-GR
IBM-sbdTW_big5
IBM-sbdTW to big5
big5_IBM-sbdTW
big5 to IBM-sbdTW
IBM-udcTW-GR_big5
IBM-udcTW-GR to big5
big5_IBM-udcTW-GR
big5 to IBM-udcTW-GR
IBM-udcTW.CT-GR_big5
IBM-udcTW.CT-GR to big5
big5_IBM-udcTW.CT-GR
big5 to IBM-udcTW.CT-GR
ISO8859-1_big5
ISO8859 to big5
big5_ISO8859-1
big5 to ISO8859
IBM-sbdTW_big5
IBM-sbdTW to big5
big5_IBM-sbdTW
big5 to IBM-sbdTW
big5_ASCII-GR
big5 to ASCII-GR
ASCII-GR_big5
ASCII-GR to big5
GBK_big5
GBK to big5
big5_GBK
big5 to GBK
GBK_IBM-eucTW
GBK to IBM-eucTW
IBM-eucTW_GBK
IBM-eucTW to GBK
CNS11643.1986-1_GBK
CNS11643.1986-1 to GBK
GBK_CNS11643.1986-1
GBK to CNS11643.1986-1
CNS11643.1986-2_GBK
CNS11643.1986-2 to GBK
GBK_CNS11643.1986-2
GBK to CNS11643.1986-2
CNS11643.1986-1-GR_GBK
CNS11643.1986-1-GR to GBK
GBK_CNS11643.1986-1-GR
GBK to CNS11643.1986-1-GR
CNS11643.1986-2-GR_GBK
CNS11643.1986-2-GR to GBK
GBK_CNS11643.1986-2-GR
GBK to CNS11643.1986-2-GR
CNS11643.1986-1-GL_GBK
CNS11643.1986-1-GL to GBK
GBK_CNS11643.1986-1-GL
GBK to CNS11643.1986-1-GL
CNS11643.1986-2-GL_GBK
CNS11643.1986-2-GL to GBK
GBK_CNS11643.1986-2-GL
GBK to CNS11643.1986-2-GL
CNS11643.CT-GR_GBK
CNS11643.CT-GR to GBK
GBK_CNS11643.CT-GR
GBK to CNS11643.CT-GR
GB2312.1980.CT-GR_GBK
GB2312.1980.CT-GR to GBK
GBK_GB2312.1980.CT-GR
GBK to GB2312.1980.CT-GR
GB2312.1980-0_GBK
GBK2312.1980-0 to GBK
GBK_GB2312.1980-0
GBK to GBK2312.1980-0
GB2312.1980-0-GR_GBK
GB2312.1980-0-GR to GBK
GBK_GB2312.1980-0-GR
GBK to GB2312.1980-0-GR
GB2312.1980-0-GL_GBK
GB2312.1980-0-GL to GBK
GBK_GB2312.1980-0-GL
GBK to GB2312.1980-0-GL
ASCII-GR_GBK
ASCII-GR to GBK
GBK_ASCII-GR
GBK to ASCII-GR
ISO8859-1_GBK
ISO8859-1 to GBK
GBK_ISO8859-1
GBK to ISO8859-1
IBM-eucCN_GBK
IBM-eucCN to GBK
GBK_IBM-eucCN
GBK to IBM-eucCN
Interchange Converters?7-bit
This converter provides conversion between internal code and 7-bit standard interchange formats (fold7). The fold7 name identifies encodings that can be used to pass text data through 7-bit mail protocols. The encodings are based on ISO2022. For more information about fold7, see Understanding libiconv.
The fold7 converters convert characters from a code set to a canonical 7-bit encoding that identifies each character. This type of conversion is useful in networks where clients communicate with different code sets but use the same character sets. For example:
IBM-850 <?> ISO8859-1
Common Latin characters
IBM-932 <?>IBM-eucJP
Common Japanese characters
The following escape sequences designate standard code sets:
Escape Sequence
Standard Code Set
01/11 02/04 04/00
GL JIS X0208.1978-0.
01/11 02/04 02/08 04/01
GL left half of GB2312.1980-0.
01/11 02/08 04/02
GL 7-bit ASCII or left half of ISO8859-1.
01/11 02/14 04/01
GL right half of ISO8859-1.
01/11 02/14 04/02
GL right half of ISO8859-2.
01/11 02/14 04/03
GL right half of ISO8859-3.
01/11 02/14 04/04
GL right half of ISO8859-4.
01/11 02/14 04/06
GL right half of ISO8859-7.
01/11 02/14 04/07
GL right half of ISO8859-6.
01/11 02/14 04/08
GL right half of ISO8859-8.
01/11 02/14 04/12
GL right half of ISO8859-5.
01/11 02/14 04/13
GL right half of ISO8859-9.
01/11 02/08 04/09
GL right half of JIS X0201.1976-0.
01/11 02/08 04/10
GL left half of JIS X0201.1976.
01/11 02/04 04/02
GL JIS X0208.1983-0.
01/11 02/04 02/08 04/02
GL JIS X0208.1983-0.
01/11 02/04 02/08 04/00
GL JISX0208.1978-0.
01/11 02/05 02/15 03/01 M L 06/09 06/02 06/13 02/13 03/08 03/05 03/00 00/02
GL right half of IBM-850 unique characters. Characters common to ISO8859-1 do not use this escape sequence.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02
01/11 02/05 02/15 03/00 M L 05/05 05/04 04/06 02/13 03/07 00/02
UCS-2 encoded as base64; used only for those characters not encoded by any of the other 7-bit escape sequences listed above.
When converting from a code set to fold7, the escape sequence used to designate the code set is chosen according to the order listed. For example, the JISX0208.1983-0 characters use 01/11 01/04 04/02 as the designation.
Files
The following list describes the fold7 converters that are found in the /usr/lib/nls/loc/iconv directory:
Converter
Description
fold7_IBM-850
Interchange format to IBM-850
fold7_IBM-921
Interchange format to IBM-921
fold7_IBM-922
Interchange format to IBM-922
fold7_IBM-932
Interchange format to IBM-932
fold7_IBM-943
Interchange format to IBM-943
fold7_IBM_1124
Interchange format to IBM-1124
fold7_IBM_1129
Interchange format to IBM-1129
fold7_IBM_eucCN
Interchange format to IBM-eucCN
fold7_IBM-eucJP
Interchange format to IBM-eucJP
fold7_IBM-eucKR
Interchange format to IBM-eucKR
fold7_IBM-eucTW
Interchange format to IBM-eucTW
fold7_ISO8859-1
Interchange format to ISO8859-1
fold7_ISO8859-2
Interchange format to ISO8859-2
fold7_ISO8859-3
Interchange format to ISO8859-3
fold7_ISO8859-4
Interchange format to ISO8859-4
fold7_ISO8859-5
Interchange format to ISO8859-5
fold7_ISO8859-6
Interchange format to ISO8859-6
fold7_ISO8859-7
Interchange format to ISO8859-7
fold7_ISO8859-8
Interchange format to ISO8859-8
fold7_ISO8859-9
Interchange format to ISO8859-9
fold7_TIS-620
Interchange format to TIS-620
fold7_UTF-8
Interchange format to UTF-8
fold7_big5
Interchange format to big5
fold7_GBK
Interchange format to GBK
IBM-921_fold7
IBM-921 to interchange format
IBM-922_fold7
IBM-922 to interchange format
IBM-850_fold7
IBM-850 to interchange format
IBM-932_fold7
IBM-932 to interchange format
IBM-943_fold7
IBM-943 to interchange format
IBM-1124_fold7
IBM-1124 to interchange format
IBM-1129_fold7
IBM-1129 to interchange format
IBM-eucCN_fold7
IBM-eucCN to interchange format
IBM-eucJP_fold7
IBM-eucJP to interchange format
IBM-eucKR_fold7
IBM-eucKR to interchange format
IBM-eucTW_fold7
IBM-eucTW to interchange format
ISO8859-1_fold7
ISO8859-1 to interchange format
ISO8859-2_fold7
ISO8859-2 to interchange format
ISO8859-3_fold7
ISO8859-3 to interchange format
ISO8859-4_fold7
ISO8859-4 to interchange format
ISO8859-5_fold7
ISO8859-5 to interchange format
ISO8859-6_fold7
ISO8859-6 to interchange format
ISO8859-7_fold7
ISO8859-7 to interchange format
ISO8859-8_fold7
ISO8859-8 to interchange format
ISO8859-9_fold7
ISO8859-9 to interchange format
TIS-620_fold7
TIS-620 to interchange format
UTF-8_fold7
UTF-8 to interchange format
big5_fold7
big5 to interchange format
GBK_fold7
GBK to interchange format
Interchange Converters?8-bit
This converter provides conversions between internal code and 8-bit standard interchange formats (fold8). The fold8 name identifies encodings that can be used to pass text data through 8-bit mail protocols. The encodings are based on ISO2022. For more information about fold8, see Understanding libiconv.
The fold8 converters convert characters from a specific code set encoding to a canonical 8-bit encoding that identifies each character. This type of conversion is useful in networks where clients communicate with different code sets but use the same character sets. For example:
IBM-850 <?> ISO8859-1
Common Latin characters
IBM-932 <?>IBM-eucJP
Common Japanese characters
The following escape sequences designate standard code sets.
Escape Sequence
Standard Code Set
01/11 02/04 02/09 04/01
GR right half of GB2312.1980-0.
01/11 02/13 04/01
GR right half of ISO8859-1.
01/11 02/13 04/02
GR right half of ISO8859-2.
01/11 02/13 04/03
GR right half of ISO8859-3.
01/11 02/13 04/04
GR right half of ISO8859-4.
01/11 02/13 04/06
GR right half of ISO8859-7.
01/11 02/13 04/07
GR right half of ISO8859-6.
01/11 02/13 04/08
GR right half of ISO8859-8.
01/11 02/13 04/13
GR right half of ISO8859-5.
01/11 02/13 04/13
GR right half of ISO8859-9.
01/11 02/09 04/09
GR right half of JIS X0201.1976-1.
01/11 02/04 02/09 04/02
GR JIS X0208.1983-1.
01/11 02/04 02/09 04/00
GR JISX0208.1978-1.
01/11 02/09 04/02
GR 7-bit ASCII or left half of ISO8859-1.
01/11 02/05 02/15 03/01 M L 04/09 04/02 04/13 02/13 03/08 03/05 03/00 00/02
GR right half of IBM-850 unique characters. Characters common to ISO8859-1 should not use this escape sequence.
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02
GR right half of Japanese user-definable characters.
01/11 02/08 04/02
GL 7-bit ASCII or left half of ISO8859-1.
01/11 02/14 04/01
GL right half of ISO8859-1.
01/11 02/14 04/02
GL right half of ISO8859-2.
01/11 02/14 04/03
GL right half of ISO8859-3.
01/11 02/14 04/04
GL right half of ISO8859-4.
01/11 02/14 04/06
GL right half of ISO8859-7.
01/11 02/14 04/07
GL right half of ISO8859-6.
01/11 02/14 04/08
GL right half of ISO8859-8.
01/11 02/14 04/12
GL right half of ISO8859-5.
01/11 02/14 04/13
GL right half of ISO8859-9.
01/11 02/08 04/09
GL right half of JIS X0201.1976-0.
01/11 02/08 04/10
GL left half of JIS X0201.1976.
01/11 02/04 02/08 04/02
GL JIS X0208.1983-0.
01/11 02/04 04/02
GL JIS X0208.1983-0.
01/11 02/04 04/00
GL JIS X0208.1978-0.
01/11 02/05 02/15 03/01 M L 06/09 06/02 06/13 02/13 03/08 03/05 03/00 00/02
GL right half of IBM-850 unique characters. Characters common to ISO8859-1 do not use this escape sequence.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02
GL Japanese (IBM-udcJP) user-definable characters.
01/11 02/04 02/09 04/03
GR KSC5601-1987.
01/11 02/04 02/09 03/00
GR CNS11643-1986-1.
01/11 02/04 02/10 03/01
GR CNS11643-1986-2.
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/05 06/04 06/03 05/05 05/08 00/02
GR right half of Traditional Chinese user-definable characters.
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/03 06/02 06/04 05/05 05/08 00/02
GR right half of IBM-850 unique symbols.
01/11 02/04 02/08 04/03
GL KSC5601-1987.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 05/05 05/08 00/02
GL Traditional Chinese (IBM-udcTW) user-definable characters.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/03 06/02 06/04 05/05 05/08 00/02
GL Traditional Chinese IBM-850 unique symbols (IBM-shdTW) user-definable characters.
01/11 02/05 02/15 03/00 M L 05/05 05/04 04/06 02/13 03/08 00/02
UCS-2 encoded as UTF-8; used only for those characters not encoded by any of the above escape sequences listed above.
When converting from a code set to fold8, the escape sequence used to designate the code set is chosen according to the order listed. For example, the JISX0208.1983-0 characters use 01/11 02/04 02/08 04/02 as the designation.
Files
The following list describes the fold8 converters found in the /usr/lib/nls/loc/iconv directory:
Converter
Description
fold8_IBM-850
Interchange format to IBM-850
fold8_IBM-921
Interchange format to IBM-921
fold8_IBM-922
Interchange format to IBM-922
fold8_IBM-932
Interchange format to IBM-932
fold8_IBM-943
Interchange format to IBM-943
fold8_IBM-1124
Interchange format to IBM-1124
fold8_IBM-1129
Interchange format to IBM-1129
fold8_IBM-eucCN
Interchange format to IBM-eucCN
fold8_IBM-eucJP
Interchange format to IBM-eucJP
fold8_IBM-eucKR
Interchange format to IBM-eucKR
fold8_IBM-eucTW
Interchange format to IBM-eucTW
fold8_IBM-eucCN
Interchange fromat to IBM-eucCN
fold8_ISO8859-1
Interchange format to ISO8859-1
fold8_ISO8859-2
Interchange format to ISO8859-2
fold8_ISO8859-3
Interchange format to ISO8859-3
fold8_ISO8859-4
Interchange format to ISO8859-4
fold8_ISO8859-5
Interchange format to ISO8859-5
fold8_ISO8859-6
Interchange format to ISO8859-6
fold8_ISO8859-7
Interchange format to ISO8859-7
fold8_ISO8859-8
Interchange format to ISO8859-8
fold8_ISO8859-9
Interchange format to ISO8859-9
fold8_TIS-620
Interchange format to TIS-620
fold8_UTF-8
Interchange format to UTF-8
fold8_big5
Interchange format to big5
fold8_GBK
Interchange format to GBK
IBM-921_fold8
IBM-921 to interchange format
IBM-922_fold8
IBM-922 to interchange format
IBM-850_fold8
IBM-850 to interchange format
IBM-932_fold8
IBM-932 to interchange format
IBM-943_fold8
IBM-943 to interchange format
IBM-1124_fold8
IBM-1124 to interchange format
IBM-1129_fold8
IBM-1129 to interchange format
IBM-eucCN_fold8
IBM-eucCN to interchange format
IBM-eucJP_fold8
IBM-eucJP to interchange format
IBM-eucKR_fold8
IBM-eucKR to interchange format
IBM-eucTW_fold8
IBM-eucTW to interchange format
IBM-eucCN_fold8
IBM-eucCN to interchange format
ISO8859-1_fold8
ISO8859-1 to interchange format
ISO8859-2_fold8
ISO8859-2 to interchange format
ISO8859-3_fold8
ISO8859-3 to interchange format
ISO8859-4_fold8
ISO8859-4 to interchange format
ISO8859-5_fold8
ISO8859-5 to interchange format
ISO8859-6_fold8
ISO8859-6 to interchange format
ISO8859-7_fold8
ISO8859-7 to interchange format
ISO8859-8_fold8
ISO8859-8 to interchange format
ISO8859-9_fold8
ISO8859-9 to interchange format
TIS-620_fold8
TIS-620 to interchange format
UTF-8_fold8
UTF-8 to interchange format
big5_fold8
big5 to interchange format
GBK_fold8
GBK to interchange format
Interchange Converters?Compound Text
Compound text interchange converters convert between compound text and internal code sets.
Compound text is an interchange encoding defined by the X Consortium. It is used to communicate text between X clients. Compound text is based on ISO2022 and can encode most character sets using standard escape sequences. It also provides extensions for encoding private character sets. The supported code sets provide a converter to and from compound text. The name used to identify the compound text encoding is ct.
The following escape sequences are used to designate standard code sets in the order listed below.
01/11 02/05 02/15 03/01 M L 04/09 04/02 04/13 02/13 03/08 03/05 03/00 00/02
GR right half of IBM-850 unique characters. Characters common to ISO8859-1 should not use this escape sequence.
01/11 02/05 02/15 03/02 M L 04/09 04/02 04/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02
GR right half of Japanese user-definable characters.
01/11 02/05 02/15 03/01 M L 06/09 06/02 06/13 02/13 03/08 03/05 03/00 00/02
GL right half of IBM-850 unique characters. Characters common to ISO8859-1 do not use this escape sequence.
01/11 02/05 02/15 03/02 M L 06/09 06/02 06/13 02/13 07/05 06/04 06/03 04/10 05/00 00/02
GL Japanese (IBM-udcJP) user-definable characters.
Files
The following list describes the compound text converters that are found in the /usr/lib/nls/loc/iconv directory:
During conversion from uucode, 62 bytes at a time (including a new-line character trailing the record) are converted, and generating 45 bytes in outbuf.
Files
The following list describes the uucode converters found in the /usr/lib/nls/loc/iconv directory:
Converter
Description
IBM-850_uucode
IBM-850 to uucode
IBM-921_uucode
IBM-921 to uucode
IBM-922_uucode
IBM-922 to uucode
IBM-932_uucode
IBM-932 to uucode
IBM-943_uucode
IBM-943 to uucode
IBM-1124_uucode
IBM-1124 to uucode
IBM-1129_uucode
IBM-1129 to uucode
IBM-eucJP_uucode
IBM-eucJP to uucode
IBM-eucKR_uucode
IBM-eucKR to uucode
IBM-eucTW_uucode
IBM-eucTW to uucode
IBM-eucCN_uucode
IBM-eucCN to uucode
ISO8859-1_uucode
ISO8859-1 to uucode
ISO8859-2_uucode
ISO8859-2 to uucode
ISO8859-3_uucode
ISO8859-3 to uucode
ISO8859-4_uucode
ISO8859-4 to uucode
ISO8859-5_uucode
ISO8859-5 to uucode
ISO8859-6_uucode
ISO8859-6 to uucode
ISO8859-7_uucode
ISO8859-7 to uucode
ISO8859-8_uucode
ISO8859-8 to uucode
ISO8859-9_uucode
ISO8859-9 to uucode
TIS-620_uucode
TIS-620 to uucode
big5_uucode
big5 to uucode
GBK_uucode
GBK to uucode
uucode_IBM-850
uucode to IBM-850
uucode_IBM-921
uucode to IBM-921
uucode_IBM-922
uucode to IBM-922
uucode_IBM-932
uucode to IBM-932
uucode_IBM-943
uucode to IBM-943
uucode_IBM-1124
uucode to IBM-1124
uucode_IBM-1129
uucode to IBM-1129
uucode_IBM-eucCN
uucode to IBM-eucCN
uucode_IBM-eucJP
uucode to IBM-eucJP
uucode_IBM-eucKR
uucode to IBM-eucKR
uucode_IBM-eucTW
uucode to IBM-eucTW
uucode_ISO8859-1
uucode to ISO8859-1
uucode_ISO8859-2
uucode to ISO8859-2
uucode_ISO8859-3
uucode to ISO8859-3
uucode_ISO8859-4
uucode to ISO8859-4
uucode_ISO8859-5
uucode to ISO8859-5
uucode_ISO8859-6
uucode to ISO8859-6
uucode_ISO8859-7
uucode to ISO8859-7
uucode_ISO8859-8
uucode to ISO8859-8
uucode_ISO8859-9
uucode to ISO8859-9
uucode_TIS-1124
uucode to TIS-1129
uucode_big5
uucode to big5
uucode_GBK
uucode to GBK
UCS-2 Interchange Converters
UCS-2 uses a universal 16-bit encoding. Conversions for each code set are provided in both directions, between the code set and UCS-2. For more information, see Code Sets for National Language Support.
UCS-2 converters are found in /usr/lib/nls/loc/uconvTable and /usr/lib/nls/loc/uconv directories. The uconvdef command is used to generate new converters or to customize existing UCS-2 converters.
Converter
Description
ISO8859-1
UCS-2 <?> ISO Latin-1
ISO8859-2
UCS-2 <?> ISO Latin-2
ISO8859-3
UCS-2 <?> ISO Latin-3
ISO8859-4
UCS-2 <?> ISO Baltic
ISO8859-5
UCS-2 <?> ISO Cyrillic
ISO8859-6
UCS-2 <?> ISO Arabic
ISO8859-7
UCS-2 <?> ISO Greek
ISO8859-8
UCS-2 <?> ISO Hebrew
ISO8859-9
UCS-2 <?> ISO Turkish
JISX0201.1976-0
UCS-2 <?> Japanese JISX0201-0
JISX0208.1983-0
UCS-2 <?> Japanese JISX0208-0
CNS11643.1986-1
UCS-2 <?> Chinese CNS11643-1
CNS11643.1986-2
UCS-2 <?> Chinese CNS11643-2
KSC5601.1987-0
UCS-2 <?> Korean KSC5601-0
IBM-eucCN
UCS-2 <?> Simplified Chinese EUC
IBM-udcCN
UCS-2 <?> Simplified Chinese user-defined characters
IBM-sbdCN
UCS-2 <?> Simplified Chinese IBM-specific characters
GB2312.1980-0
UCS-2 <?> Simplified Chinese GB
IBM-1381
UCS-2 <?> Simplified Chinese PC data code
IBM-935
UCS-2 <?> Simplified Chinese EBCDIC
IBM-936
UCS-2 <?> Simplified Chinese PC5550
IBM-eucJP
UCS-2 <?> Japanese EUC
IBM-eucKR
UCS-2 <?> Korean EUC
IBM-eucTW
UCS-2 <?> Traditional Chinese EUC
IBM-udcJP
UCS-2 <?> Japanese user-defined characters
IBM-udcTW
UCS-2 <?> Traditional Chinese user-defined characters
IBM-sbdTW
UCS-2 <?> Traditional Chinese IBM-specific characters
UTF-8
UCS-2 <?> UTF-8
IBM-437
UCS-2 <?> USA PC data code
IBM-850
UCS-2 <?> Latin-1 PC data code
IBM-852
UCS-2 <?> Latin-2 PC data code
IBM-857
UCS-2 <?> Turkish PC data code
IBM-860
UCS-2 <?> Portuguese PC data code
IBM-861
UCS-2 <?> Icelandic PC data code
IBM-863
UCS-2 <?> French Canadian PC data code
IBM-865
UCS-2 <?> Nordic PC data code
IBM-869
UCS-2 <?> Greek PC data code
IBM-921
UCS-2 <?> Baltic Multilingual data code
IBM-922
UCS-2 <?> Estonian data code
IBM-932
UCS-2 <?> Japanese PC data code
IBM-943
UCS-2 <?> Japanese PC data code
IBM-934
UCS-2 <?> Korea PC data code
IBM-936
UCS-2 <?> People's Republic of China PC data code
IBM-938
UCS-2 <?> Taiwanese PC data code
IBM-942
UCS-2 <?> Extended Japanese PC data code
IBM-944
UCS-2 <?> Korean PC data code
IBM-946
UCS-2 <?> People's Republic of China SAA data code
IBM-948
UCS-2 <?> Traditional Chinese PC data code
IBM-1124
UCS-2 <?> Ukranian PC data code
IBM-1129
UCS-2 <?> Vietnamese PC data code
TIS-620
UCS-2 <?> Thailand PC data code
IBM-037
UCS-2 <?> USA, Canada EBCDIC
IBM-273
UCS-2 <?> Germany, Austria EBCDIC
IBM-277
UCS-2 <?> Denmark, Norway EBCDIC
IBM-278
UCS-2 <?> Finland, Sweden EBCDIC
IBM-280
UCS-2 <?> Italy EBCDIC
IBM-284
UCS-2 <?> Spain, Latin America EBCDIC
IBM-285
UCS-2 <?> United Kingdom EBCDIC
IBM-297
UCS-2 <?> France EBCDIC
IBM-500
UCS-2 <?> International EBCDIC
IBM-875
UCS-2 <?> Greek EBCDIC
IBM-930
UCS-2 <?> Japanese Katakana-Kanji EBCDIC
IBM-933
UCS-2 <?> Korean EBCDIC
IBM-937
UCS-2 <?> Traditional Chinese EBCDIC
IBM-939
UCS-2 <?> Japanese Latin-Kanji EBCDIC
IBM-1026
UCS-2 <?> Turkish EBCDIC
IBM-1112
UCS-2 <?> Baltic Multilingual EBCDIC
IBM-1122
UCS-2 <?> Estonian EBCDIC
IBM-1124
UCS-2 <?> Ukranian EBCDIC
IBM-1129
UCS-2 <?> Vietnamese EBCDIC
TIS-620
UCS-2 <?>Thailand EBCDIC
UTF-8 Interchange Converters
UTF-8 is a universal, multibyte encoding described in the UCS-2 and UTF-8. Conversions for each code set are provided in both directions, between the code set and UTF-8.
UTF-8 conversions are usually done by using the Universal_UCS_Conv and /usr/lib/nls/loc/uconv/UTF-8 converter. For more information, see UCS-2 Interchange Converters.
Converter
Description
ISO8859-1
UTF-8 <?> ISO Latin-1
ISO8859-2
UTF-8 <?> ISO Latin-2
ISO8859-3
UTF-8 <?> ISO Latin-3
ISO8859-4
UTF-8 <?> ISO Baltic
ISO8859-5
UTF-8 <?> ISO Cyrillic
ISO8859-6
UTF-8 <?> ISO Arabic
ISO8859-7
UTF-8 <?> ISO Greek
ISO8859-8
UTF-8 <?> ISO Hebrew
ISO8859-9
UTF-8 <?> ISO Turkish
JISX0201.1976-0
UTF-8 <?> Japanese JISX0201-0
JISX0208.1983-0
UTF-8 <?> Japanese JISX0208-0
CNS11643.1986-1
UTF-8 <?> Chinese CNS11643-1
CNS11643.1986-2
UTF-8 <?> Chinese CNS11643-2
KSC5601.1987-0
UTF-8 <?> Korean KSC5601-0
IBM-eucCN
UTF-8 <?> Simplified Chinese EUC
IBM-eucJP
UTF-8 <?> Japanese EUC
IBM-eucKR
UTF-8 <?> Korean EUC
IBM-eucTW
UTF-8 <?> Traditional Chinese EUC
IBM-udcJP
UTF-8 <?> Japanese user-defined characters
IBM-udcTW
UTF-8 <?> Traditional Chinese user-defined characters
IBM-sbdTW
UTF-8 <?> Traditional Chinese IBM-specific characters
UCS-2
UTF-8 <?> UCS-2
IBM-437
UTF-8 <?> USA PC data code
IBM-850
UTF-8 <?> Latin-1 PC data code
IBM-852
UTF-8 <?> Latin-2 PC data code
IBM-857
UTF-8 <?> Turkish PC data code
IBM-860
UTF-8 <?> Portuguese PC data code
IBM-861
UTF-8 <?> Icelandic PC data code
IBM-863
UTF-8 <?> French Canadian PC data code
IBM-865
UTF-8 <?> Nordic PC data code
IBM-869
UTF-8 <?> Greek PC data code
IBM-921
UTF-8 <?> Baltic Multilingual data code
IBM-922
UTF-8 <?> Estonian data code
IBM-932
UTF-8 <?> Japanese PC data code
IBM-943
UTF-8 <?> Japanese PC data code
IBM-934
UTF-8 <?> Korea PC data code
IBM-935
UTF-8 <?> Simplified Chinese EBCDIC
IBM-936
UTF-8 <?> People's Republic of China PC data code
IBM-938
UTF-8 <?> Taiwanese PC data code
IBM-942
UTF-8 <?> Extended Japanese PC data code
IBM-944
UTF-8 <?> Korean PC data code
IBM-946
UTF-8 <?> People's Republic of China SAA data code
IBM-948
UTF-8 <?> Traditional Chinese PC data code
IBM-1124
UTF-8 <?> Ukrainian PC data code
IBM-1129
UTF-8 <?> Vietnamese PC data code
TIS-620
UTF-8 <?> Thailand PC data code
IBM-037
UTF-8 <?> USA, Canada EBCDIC
IBM-273
UTF-8 <?> Germany, Austria EBCDIC
IBM-277
UTF-8 <?> Denmark, Norway EBCDIC
IBM-278
UTF-8 <?> Finland, Sweden EBCDIC
IBM-280
UTF-8 <?> Italy EBCDIC
IBM-284
UTF-8 <?> Spain, Latin America EBCDIC
IBM-285
UTF-8 <?> United Kingdom EBCDIC
IBM-297
UTF-8 <?> France EBCDIC
IBM-500
UTF-8 <?> International EBCDIC
IBM-875
UTF-8 <?> Greek EBCDIC
IBM-930
UTF-8 <?> Japanese Katakana-Kanji EBCDIC
IBM-933
UTF-8 <?> Korean EBCDIC
IBM-937
UTF-8 <?> Traditional Chinese EBCDIC
IBM-939
UTF-8 <?> Japanese Latin-Kanji EBCDIC
IBM-1026
UTF-8 <?> Turkish EBCDIC
IBM-1112
UTF-8 <?> Baltic Multilingual EBCDIC
IBM-1122
UTF-8 <?> Estonian EBCDIC
IBM-1124
UTF-8 <?> Ukranian EBCDIC
IBM-1129
UTF-8 <?> Vietnamese EBCDIC
IBM-1381
UTF-8 <?> Simplified Chinese PC data code
GB18030
UTF-8<?> Simplified Chinese
TIS-620
UTF-8 <?> Thailand EBCDIC
Miscellaneous Converters
A set of low-level converters used by the code set and interchange converters is provided. These converters are called miscellaneous converters. These low-level converters may be used by some of the interchange converters. However, the use of these converters is discouraged because they are intended for support of other converters.
Files
The following list describes the miscellaneous converters found in the /usr/lib/nls/loc/iconv and /usr/lib/nls/loc/iconvTable directories:
Converter
Description
IBM-932_JISX0201.1976-0
IBM-932 to JISX0201.1976-0
IBM-932_JISX0208.1983-0
IBM-932 to JISX0208.1983-0
IBM-932_IBM-udcJP
IBM-932 to IBM-udcJP (Japanese user-defined characters)
IBM-943_JISX0201.1976-0
IBM-943 to JISX0201.1976-0
IBM-943_JISX0208.1983-0
IBM-943 to JISX0208.1983-0
IBM-943_IBM-udcJP
IBM-943 to IBM-udcJP (Japanese user-defined characters
IBM-eucJP_JISX0201.1976-0
IBM-eucJP to JISX0201.1976-0
IBM-eucJP_JISX0208.1983-0
IBM-eucJP to JISX0208.1983-0
IBM-eucJP_IBM-udcJP
IBM-eucJP to IBM-udcJP (Japanese user-defined characters)