- 论坛徽章:
- 1
|
原帖由 hareqiqi 于 2007-2-1 14:44 发表于 1楼
网上的一些资料表明使用Expat或者LibXml的库对中文的支持都不太好,需要转码,可能会影响效率。
TinyXml看似可以解决中文问题,不过是用DOM方式的解析(效率可能不如SAX方式高)
你所有的数据如果是UTF-8的,那么就不需要任何转码工作。
如果你又不想转码,那么就让你的数据是UTF-8的吧。
- 经过不断的接触,发现LibXML2自身已经支持了中文编码.只是他的所有api处理的数据都是UTF-8类型的,所以只要在读入和写入数据时进行相应转换即可!而且libxml2已融合了iconv,以下是代码!flags标示是读入(0)还是写入(1)!已测试通过
- uint8_t *convert(uint8_t *in, char *encoding, uint8_t flags)
- {
- uint8_t *out;
- int ret, size, out_size, temp;
- xmlCharEncodingHandlerPtr handler;
- size = (int) strlen( (char*)in ) + 1;
- out_size = size * 2 - 1;
- out = (uint8_t *)malloc((size_t) out_size);
- if (out) {
- handler = xmlFindCharEncodingHandler(encoding);
- if (!handler) {
- free(out);
- out = NULL;
- }
- }
- if (out) {
- temp = size - 1;
- if ( flags ) {
- ret = handler->input(out, &out_size, in, &temp);
- }else {
- ret = handler->output(out, &out_size, in, &temp);
- }
- if (ret || temp - size + 1) {
- if (ret) {
- printf("conversion wasn't successful.\n");
- } else {
- printf("conversion wasn't successful. converted: ");
- }
- free(out);
- out = NULL;
- } else {
- out =(uint8_t *) realloc(out, out_size + 1);
- out[out_size] = 0; /*null terminating out */
- }
- } else {
- printf("no mem\n");
- }
- return (out);
- }
- int main(int argc, char **argv)
- {
- uint8_t *content, *out, *in;
- xmlDocPtr doc;
- xmlNodePtr rootnode;
- char *encoding = "ISO-8859-1";
- //char *encoding = "utf-8";
- if (argc <= 1) {
- printf("Usage: %s content\n", argv[0]);
- return (0);
- }
- content = (uint8_t *)argv[1];
- out = convert(content, encoding, 1);
- in = convert( out, encoding, 0 );
- doc = xmlNewDoc( (xmlChar*)"1.0" );
- printf( "%s:%s\n", encoding, out );
- printf( "%s:%s\n", encoding, in );
- rootnode = xmlNewDocNode(doc, NULL, (const xmlChar *) "root", out);
- xmlDocSetRootElement(doc, rootnode);
- xmlSaveFormatFileEnc("-", doc, encoding, 1);
- free( out );
- free( in );
- return (1);
- }
复制代码 |
|