免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 1005 | 回复: 0
打印 上一主题 下一主题

数据的符号扩展/非符号扩展及其对fgetc()的影响 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2007-05-16 17:20 |只看该作者 |倒序浏览
还是前两天提到的那本关于汇编的书,讲二进制数字的符号扩展和非符号扩展的时候,从汇编语言实现方法的角度,讨论了带符号类型和无符号类型数据进行类型提升时的效果,以及这一效果引起的对fgetc()的一个常见的使用错误。说得简明扼要,清晰易懂,记下来以备今后参考。
Extending of unsigned and signed integers also occurs in C. Variables in
C may be declared as either signed or unsigned (int is signed). Consider
the code in Figure 2.1. In line 3, the variable a is extended using the rules
for unsigned values (using MOVZX), but in line 4, the signed rules are used
for b (using MOVSX).
1 unsigned char uchar = 0xFF;
2 signed char schar = 0xFF;
3 int a = (int ) uchar ; /* a = 255 (0x000000FF) */
4 int b = (int ) schar ; /* b = −1 (0xFFFFFFFF) */
Figure 2.1:
char ch;
while( (ch = fgetc(fp)) != EOF ) {
/* do something with ch */
}
Figure 2.2:
There is a common C programming bug that directly relates to this
subject. Consider the code in Figure 2.2. The prototype of fgetc()is:
int fgetc( FILE * );
One might question why does the function return back an int since it reads
characters? The reason is that it normally does return back an char (extended
to an int value using zero extension). However, there is one value
that it may return that is not a character, EOF. This is a macro that is
usually defined as −1. Thus, fgetc() either returns back a char extended
to an int value (which looks like 000000xx in hex) or EOF (which looks like
FFFFFFFF in hex).
The basic problem with the program in Figure 2.2 is that fgetc() returns
an int, but this value is stored in a char. C will truncate the higher
order bits to fit the int value into the char. The only problem is that the
numbers (in hex) 000000FF and FFFFFFFF both will be truncated to the
byte FF. Thus, the while loop can not distinguish between reading the byte
FF from the file and end of file.
Exactly what the code does in this case, depends on whether char is
signed or unsigned. Why? Because in line 2, ch is compared with EOF.
Since EOF is an int value, ch will be extended to an int so that two values
being compared are of the same size. As Figure 2.1 showed, where the
variable is signed or unsigned is very important.
If char is unsigned, FF is extended to be 000000FF. This is compared to
EOF (FFFFFFFF) and found to be not equal. Thus, the loop never ends!
If char is signed, FF is extended to FFFFFFFF. This does compare as
equal and the loop ends. However, since the byte FF may have been read
from the file, the loop could be ending prematurely.
The solution to this problem is to define the ch variable as an int, not a
char. When this is done, no truncating or extension is done in line 2. Inside
the loop, it is safe to truncate the value since ch must actually be a simple
byte there.
               
               
               

本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/13667/showart_302599.html
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP