免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
楼主: frankhyk
打印 上一主题 下一主题

求教:重复字符的删除 [复制链接]

论坛徽章:
145
技术图书徽章
日期:2013-10-01 15:32:13戌狗
日期:2013-10-25 13:31:35金牛座
日期:2013-11-04 16:22:07子鼠
日期:2013-11-18 18:48:57白羊座
日期:2013-11-29 10:09:11狮子座
日期:2013-12-12 09:57:42白羊座
日期:2013-12-24 16:24:46辰龙
日期:2014-01-08 15:26:12技术图书徽章
日期:2014-01-17 13:24:40巳蛇
日期:2014-02-18 14:32:59未羊
日期:2014-02-20 14:12:13白羊座
日期:2014-02-26 12:06:59
11 [报告]
发表于 2012-09-26 17:18 |只看该作者
回复 10# yinyuemi

some bug

# echo 'AQBATBABCCDD' |perl -nle 'print join "\t",map{s/(\w).*\1/\1/g;$_}split;'
ABCD
   

论坛徽章:
0
12 [报告]
发表于 2012-09-26 17:59 |只看该作者
本帖最后由 yizhengming 于 2012-09-26 18:02 编辑
  1. echo 'ABABCCDD' |perl -nle 's/(\w)(?=.*\1)//g; print'
  2. ABCD
  3. echo 'CCDDENG' |perl -nle 's/(\w)(?=.*\1)//g; print'
  4. CDENG
  5. echo 'AQBATBABCCDD' |perl -nle 's/(\w)(?=.*\1)//g; print'
  6. QTABCD
复制代码

论坛徽章:
2
射手座
日期:2014-10-10 15:59:4715-16赛季CBA联赛之上海
日期:2016-03-03 10:27:14
13 [报告]
发表于 2012-09-26 21:37 |只看该作者
回复 11# jason680


    thx,
echo 'AQBATBABCCDD' |perl -nle 'print join "\t",map{while(s/(\w)(.*)\1/\1\2/g){};$_}split;'
AQBTCD

论坛徽章:
0
14 [报告]
发表于 2012-09-26 23:22 |只看该作者
本帖最后由 yizhengming 于 2012-09-26 23:24 编辑

原来while(s///g){} 和while(m//g){} 是有区别的  一个是每次循环都从字符串首开始全局替换,整个while结束条件是下一次循环开始没有可替换的文本。
另一个是每次循环从上一次替换末位置开始,直到整个字符串末结束。

论坛徽章:
0
15 [报告]
发表于 2012-09-26 23:37 |只看该作者
回复 1# frankhyk

echo "AABBCCDD          EEFFGG
> HHIIJJKKLLMM    NNOOPPQQRRSS
> TTUU                  VVWWXXYYZZaabbcc
>
> "|perl -lpe 'tr/a-zA-Z//s;'
ABCD          EFG
HIJKLM    NOPQRS
TU                  VWXYZabc


   

论坛徽章:
0
16 [报告]
发表于 2012-09-27 00:41 |只看该作者
回复 2# jason680
这里的\1 与$1有什么区别?

   

论坛徽章:
145
技术图书徽章
日期:2013-10-01 15:32:13戌狗
日期:2013-10-25 13:31:35金牛座
日期:2013-11-04 16:22:07子鼠
日期:2013-11-18 18:48:57白羊座
日期:2013-11-29 10:09:11狮子座
日期:2013-12-12 09:57:42白羊座
日期:2013-12-24 16:24:46辰龙
日期:2014-01-08 15:26:12技术图书徽章
日期:2014-01-17 13:24:40巳蛇
日期:2014-02-18 14:32:59未羊
日期:2014-02-20 14:12:13白羊座
日期:2014-02-26 12:06:59
17 [报告]
发表于 2012-09-27 08:31 |只看该作者
回复 16# inchonline

it hard to say what are different between $1 and \1

simple rule for normal using.
use \1 on the left side  of s///.   ex: s/(..)\1/.../
use $1 on the right side of s///. ex: s/....../$1/

please refer the "perlre" for detail information or below:

$ perldoc perlre
NAME
       perlre - Perl regular expressions
DESCRIPTION
       This page describes the syntax of regular expressions in Perl.
    ...

       Capture buffers

       The bracketing construct "( ... )" creates capture buffers. To refer to the
       current contents of a buffer later on, within the same pattern, use \1 for the
       first, \2 for the second, and so on.  Outside the match use "$" instead of "\".
       (The \<digit> notation works in certain circumstances outside the match.  See
       "Warning on \1 Instead of $1" below for details.)  Referring back to another part
       of the match is called a backreference.

   ...
   Warning on \1 Instead of $1
       Some people get too used to writing things like:

           $pattern =~ s/(\W)/\\\1/g;

       This is grandfathered (for \1 to \9) for the RHS of a substitute to avoid
       shocking the sed addicts, but it's a dirty habit to get into.  That's because in
       PerlThink, the righthand side of an "s///" is a double-quoted string.  "\1" in
       the usual double-quoted string means a control-A.  The customary Unix meaning of
       "\1" is kludged in for "s///".  However, if you get into the habit of doing that,
       you get yourself into trouble if you then add an "/e" modifier.

           s/(\d+)/ \1 + 1 /eg;        # causes warning under -w

       Or if you try to do

           s/(\d+)/\1000/;

       You can't disambiguate that by saying "\{1}000", whereas you can fix it with
       "${1}000".  The operation of interpolation should not be confused with the
       operation of matching a backreference.  Certainly they mean two different things
       on the left side of the "s///".



论坛徽章:
0
18 [报告]
发表于 2012-09-27 08:39 |只看该作者
There is a special form of this construct, called \K , which causes the regex engine to "keep" everything it had matched prior to the \K and not include it in $& .  For various reasons \K may be significantly more efficient than the equivalent (?<=...) construct, and it is especially useful in situations where you want to efficiently remove something following something else in a string. For instance:
  s/(foo)bar/$1/g;
can be rewritten as the much more efficient:
  s/foo\Kbar//g;

论坛徽章:
0
19 [报告]
发表于 2012-09-27 09:34 |只看该作者
本帖最后由 sjdy521 于 2012-09-27 09:34 编辑

回复 17# jason680


    \1是posix所支持的正则形式,$1是perl的正则捕获变量,就这么简单,虽然perl出于好心,在你想用$1的时候错误写成了\1,它也不会抱怨,但我们不应该把简单的问题复杂化

论坛徽章:
145
技术图书徽章
日期:2013-10-01 15:32:13戌狗
日期:2013-10-25 13:31:35金牛座
日期:2013-11-04 16:22:07子鼠
日期:2013-11-18 18:48:57白羊座
日期:2013-11-29 10:09:11狮子座
日期:2013-12-12 09:57:42白羊座
日期:2013-12-24 16:24:46辰龙
日期:2014-01-08 15:26:12技术图书徽章
日期:2014-01-17 13:24:40巳蛇
日期:2014-02-18 14:32:59未羊
日期:2014-02-20 14:12:13白羊座
日期:2014-02-26 12:06:59
20 [报告]
发表于 2012-09-27 09:59 |只看该作者
本帖最后由 jason680 于 2012-09-27 10:06 编辑

回复 19# sjdy521

Would you please tell us what's "Certainly they mean two different things on the left side of the "s///"" on "Warning on \1 Instead of $1" section in perlre



Warning on \1 Instead of $1
       ...
       You can't disambiguate that by saying "\{1}000", whereas you can fix it with
       "${1}000".  The operation of interpolation should not be confused with the
       operation of matching a backreference.  Certainly they mean two different things
       on the left side of the "s///".


您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP