- 论坛徽章:
- 1
|
原帖由 Namelessxp 于 2006-11-27 08:19 发表
这个倒没大注意,目前不管PHP还是Perl都是在Win平台下运行,看来是个隐患,受教
以下内容摘录自 perldoc perlport:
- Newlines
- In most operating systems, lines in files are terminated by newlines. Just what is used as a newline may vary from OS to OS. Unix traditionally uses \012, one type of DOSish I/O uses \015\012, and Mac OS uses \015.
- Perl uses \n to represent the ``logical'' newline, where what is logical may depend on the platform in use. In MacPerl, \n always means \015. In DOSish perls, \n usually means \012, but when accessing a file in ``text'' mode, STDIO translates it to (or from) \015\012, depending on whether you're reading or writing. Unix does the same thing on ttys in canonical mode. \015\012 is commonly referred to as CRLF.
- A common cause of unportable programs is the misuse of chop() to trim newlines:
- # XXX UNPORTABLE!
- while(<FILE>) {
- chop;
- @array = split(/:/);
- #...
- }
- You can get away with this on Unix and Mac OS (they have a single character end-of-line), but the same program will break under DOSish perls because you're only chop()ing half the end-of-line. Instead, chomp() should be used to trim newlines. The the Dunce::Files manpage module can help audit your code for misuses of chop().
- When dealing with binary files (or text files in binary mode) be sure to explicitly set $/ to the appropriate value for your file format before using chomp().
- Because of the ``text'' mode translation, DOSish perls have limitations in using seek and tell on a file accessed in ``text'' mode. Stick to seek-ing to locations you got from tell (and no others), and you are usually free to use seek and tell even in ``text'' mode. Using seek or tell or other file operations may be non-portable. If you use binmode on a file, however, you can usually seek and tell with arbitrary values in safety.
- A common misconception in socket programming is that \n eq \012 everywhere. When using protocols such as common Internet protocols, \012 and \015 are called for specifically, and the values of the logical \n and \r (carriage return) are not reliable.
- print SOCKET "Hi there, client!\r\n"; # WRONG
- print SOCKET "Hi there, client!\015\012"; # RIGHT
- However, using \015\012 (or \cM\cJ, or \x0D\x0A) can be tedious and unsightly, as well as confusing to those maintaining the code. As such, the Socket module supplies the Right Thing for those who want it.
- use Socket qw(:DEFAULT :crlf);
- print SOCKET "Hi there, client!$CRLF" # RIGHT
- When reading from a socket, remember that the default input record separator $/ is \n, but robust socket code will recognize as either \012 or \015\012 as end of line:
- while (<SOCKET>) {
- # ...
- }
- Because both CRLF and LF end in LF, the input record separator can be set to LF and any CR stripped later. Better to write:
- use Socket qw(:DEFAULT :crlf);
- local($/) = LF; # not needed if $/ is already \012
- while (<SOCKET>) {
- s/$CR?$LF/\n/; # not sure if socket uses LF or CRLF, OK
- # s/\015?\012/\n/; # same thing
- }
- This example is preferred over the previous one--even for Unix platforms--because now any \015's (\cM's) are stripped out (and there was much rejoicing).
- Similarly, functions that return text data--such as a function that fetches a web page--should sometimes translate newlines before returning the data, if they've not yet been translated to the local newline representation. A single line of code will often suffice:
- $data =~ s/\015?\012/\n/g;
- return $data;
- Some of this may be confusing. Here's a handy reference to the ASCII CR and LF characters. You can print it out and stick it in your wallet.
- LF eq \012 eq \x0A eq \cJ eq chr(10) eq ASCII 10
- CR eq \015 eq \x0D eq \cM eq chr(13) eq ASCII 13
- | Unix | DOS | Mac |
- ---------------------------
- \n | LF | LF | CR |
- \r | CR | CR | LF |
- \n * | LF | CRLF | CR |
- \r * | CR | CR | LF |
- ---------------------------
- * text-mode STDIO
- The Unix column assumes that you are not accessing a serial line (like a tty) in canonical mode. If you are, then CR on input becomes ``\n'', and ``\n'' on output becomes CRLF.
- These are just the most common definitions of \n and \r in Perl. There may well be others. For example, on an EBCDIC implementation such as z/OS (OS/390) or OS/400 (using the ILE, the PASE is ASCII-based) the above material is similar to ``Unix'' but the code numbers change:
- LF eq \025 eq \x15 eq \cU eq chr(21) eq CP-1047 21
- LF eq \045 eq \x25 eq chr(37) eq CP-0037 37
- CR eq \015 eq \x0D eq \cM eq chr(13) eq CP-1047 13
- CR eq \015 eq \x0D eq \cM eq chr(13) eq CP-0037 13
- | z/OS | OS/400 |
- ----------------------
- \n | LF | LF |
- \r | CR | CR |
- \n * | LF | LF |
- \r * | CR | CR |
- ----------------------
- * text-mode STDIO
复制代码 |
|