论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2006-10-10 09:43 |只看该作者 |倒序浏览

代码只有一句话如下：
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <locale.h>

main()
{
wchar_t wsz[80] = L"中国123";
}

测试系统：FedoraCore5和redflag ws4.0
编译命令：gcc -c -o main.o -c -g main.c
编译输出：error: converting to execution character set: Invalid or incomplete multibyte or wide character

原帖由 albcamus 于 2006-10-12 11:00 发表
C/C++中使用Unicode编码是不现实的，因为Unicode不回避'\0'的出现，而这恰恰是C/C++中字符串结束的标志。

Windows也是一样的，有次在朋友机器上用UE转成Unicode， VC也编译不了

原帖由 cuicp 于 2006-10-12 13:21 发表

这个我不太明白,c和c++源代码不是Unicode的吧?
可是这个问题是不是不需要把源代码转换成Unicode,只要把字符串转换成Unicode的就可以吧?
不知道对不对,还请高人赐教!

这个说法我赞同！把源代码转换成UNICODE vc6是不认识了，可vc2005一样可以打开并且正常编译。

但是和我想表达的意思就不是一会事了，不知道斑竹有没有看过xmpp协议它要求xmpp的内容是UTF8变码的

如下：
11.5. Character Encoding
Implementations MUST support the UTF-8 (RFC 3629 (Yergeau, F., “UTF-8, a transformation

format of ISO 10646,” November 2003.) [UTF‑8]) transformation of Universal Character Set
(ISO/IEC 10646-1 (International Organization for Standardization, “Information Technology - Univ

ersal Multiple-octet coded Character Set (UCS) - Amendment 2: UCS Transformation Format 8
(UTF-,” October 1996.) [UCS2]) characters, as required by RFC 2277 (Alvestrand, H., “IETF Policy

on Character Sets and Languages,” January 1998.) [CHARSET]. Implementations MUST NOT attempt to use any other encoding.

可是中文呢是用的ANSI编码（ASCII编码的GB扩展），问题就变成了怎么把ANSI编码转换成UTF-8编码
一般的做法是先把ANSI编码的字符串转换成UNICODE然后把UNICODE

采用UTF-8编码来存储（按照字符在unicode表中的位置转换）。

或者直接在写代码时就把字符串使用unicode来表示，
然后再把unicode字符串使用UTF-8编码来存储（按照字符在unicode表中的位置转换）。

等待高人指点~~~~~~~~~~~~

[ 本帖最后由 xiaoligang 于 2006-10-12 13:42 编辑 ]

文库|博客

xiaoligang

白手起家

论坛徽章:: 0

2楼 [报告]

发表于 2006-10-10 15:33 |只看该作者

现在的问题是怎么把

char ac[80] = "中国123";

中的 ac字符串转换成 UTF-8编码存储的字符串。

或者

wchar_t wsz[80] = L"中国123";

可以让编译器认识，然后再把这个wsz字符串（unicode字符集的）转换成使用UTF-8编码来存储。

[ 本帖最后由 xiaoligang 于 2006-10-12 13:47 编辑 ]