xerces支持字符集问题

travelsky2008 发表于 2008-07-17 21:12

1。Xerces-C支持的编码格式仅可数几种，不支持中文。详细描述见英文描述：
Xerces-C has intrinsic support for ASCII, UTF-8, UTF-16 (Big/Small
   Endian), UCS4 (Big/Small Endian), EBCDIC code pages IBM037, IBM1047 and IBM1140
   encodings, ISO-8859-1 (aka Latin1) and Windows-1252. This means that it can
   parse input XML files in these above mentioned encodings.
2。IBM支持的另一开源项目ICU提供超过100种字符集。

XML4C -- the version of Xerces-C available from IBM -- combines Xerces-C
      and
      International Components for Unicode (ICU) and
      extends the encoding support to over 100 different encodings that are allowed
      by ICU.In particular, all the encodings registered with the

      Internet Assigned Numbers Authority (IANA)are supported in XML4C.

Some implementations or ports of Xerces-C provide support for
   additional encodings. The exact set will depend on the supplier of the parser
   and on the character set transcoding services in use.
http://blogimg.chinaunix.net/blog/upfile2/080717213209.gif

本文来自ChinaUnix博客，如果查看原文请点：http://blog.chinaunix.net/u2/63150/showart_1084766.html

页: [1]

Chinaunix's Archiver

xerces支持字符集问题