- 论坛徽章:
- 0
|
关于这段代码,还有最后两个问题。
第一是健壮性问题,并不是每一段都一定有<span face=XXX><span class=XXX><span lang=XXX>这样的TAG。这个时候导致my @content = $p->content_list;这行代码时返回一个空列表。导致程序编译出错。
尝试着向下面这样改,依然提示编译出错。
- my $encode = "GBK";
- my $h = HTML::TreeBuilder->new_from_content( decode($encode, $s) );
-
- for my $p ($h->look_down(_tag => q{p}) ) {
-
- for my $span( $h->look_down(_tag => q{span}) ) {
- if ( defined $span->attr('lang') ) {
- $span->attr(lang=>undef);
- $mainspan = $span;
- last;
- } else {
- $span = "";
- }
- }
-
- for my $span( $h->look_down(_tag => q{span}) ) {
-
- if ( $span == "" ) {
- last;
- } else {
- $span->replace_with_content($span->content_refs_list);
-
- my @content = $p->content_list;
- $p->detach_content();
-
- $mainspan->push_content(@content);
- $p->push_content($mainspan);
- }
- }
- }
- $s = encode( $encode, $h->as_HTML('<>&',' ',{}) ), "\n";
复制代码
第二个问题是,目前对于多段的处理还是有问题。只有最后一段的处理是完全正确的。
比如,原始代码:
- <body>
- <p style='margin-top:0pt;margin-right:0pt;margin-bottom:0pt;margin-left:.0pt; '><span face=Arial><span class=GramE><span class=grame><span lang=EN-US style='font-size:10.0pt; font-family:Arial'><b> AAAAA </b><span face=Arial>BBBBB</span></b><span face=Arial>CCCCC </span><b><span face=Arial> AAAA</span></b><span face=Arial>DDDDD</span><span
- face=Arial><span style="mso-spacerun:yes">EEEEE</span></span><span face=Arial>FFFFF</span>
- <span face=Arial>GGGGG</span><o:p></o:p></span></span></span></span></p>
- <p style='margin-top:0pt;margin-right:0pt;margin-bottom:0pt;margin-left:.0pt; '><span face=Arial><span class=GramE><span class=grame><span lang=EN-US style='font-size:10.0pt; font-family:Arial'><b> AAAAA </b><span face=Arial>BBBBB</span></b><span face=Arial>CCCCC </span><b><span face=Arial> AAAA</span></b><span face=Arial>DDDDD</span><span
- face=Arial><span style="mso-spacerun:yes">EEEEE</span></span><span face=Arial>FFFFF</span>
- <span face=Arial>GGGGG</span><o:p></o:p></span></span></span></span></p>
-
- <p style='margin-top:0pt;margin-right:0pt;margin-bottom:0pt;margin-left:.0pt; '><span face=Arial><span class=GramE><span class=grame><span lang=EN-US style='font-size:10.0pt; font-family:Arial'><b> AAAAA </b><span face=Arial>BBBBB</span></b><span face=Arial>CCCCC </span><b><span face=Arial> AAAA</span></b><span face=Arial>DDDDD</span><span
- face=Arial><span style="mso-spacerun:yes">EEEEE</span></span><span face=Arial>FFFFF</span>
- <span face=Arial>GGGGG</span><o:p></o:p></span></span></span></span></p>
- </body>
复制代码
修正后代码
- <body>
- <p style="margin-top:0pt;margin-right:0pt;margin-bottom:0pt;margin-left:.0pt; "><b> AAAAA </b>BBBBBCCCCC <b> AAAA</b>DDDDDEEEEEFFFFF GGGGG</p>
- <p style="margin-top:0pt;margin-right:0pt;margin-bottom:0pt;margin-left:.0pt; "><b> AAAAA </b>BBBBBCCCCC <b> AAAA</b>DDDDDEEEEEFFFFF GGGGG</p>
- <p style="margin-top:0pt;margin-right:0pt;margin-bottom:0pt;margin-left:.0pt; "><span style="font-size:10.0pt; font-family:Arial"><b> AAAAA </b>BBBBBCCCCC <b> AAAA</b>DDDDDEEEEEFFFFF GGGGG</span></p>
- </body>
复制代码
前两段的<span style="font-size:10.0pt; font-family:Arial">这个TAG都已经被删掉了。
小弟驽钝,百思不得其解,求赐教。 |
|