免费注册	查看新帖 \|


平台论坛博客文库

› 论坛 › 程序设计 › Perl › 求教：重复字符的删除

123 / 3 页下一页

最近访问板块

发新帖

楼主: frankhyk

上一主题

下一主题

求教：重复字符的删除 [复制链接]

论坛徽章:: 145

技术图书徽章
日期:2013-10-01 15:32:13

戌狗
日期:2013-10-25 13:31:35

金牛座
日期:2013-11-04 16:22:07

子鼠
日期:2013-11-18 18:48:57

白羊座
日期:2013-11-29 10:09:11

狮子座
日期:2013-12-12 09:57:42

白羊座
日期:2013-12-24 16:24:46

辰龙
日期:2014-01-08 15:26:12

技术图书徽章
日期:2014-01-17 13:24:40

巳蛇
日期:2014-02-18 14:32:59

未羊
日期:2014-02-20 14:12:13

白羊座
日期:2014-02-26 12:06:59

11楼 [报告]

发表于 2012-09-26 17:18 |只看该作者

回复 10# yinyuemi

some bug

# echo 'AQBATBABCCDD' |perl -nle 'print join "\t",map{s/(\w).*\1/\1/g;$_}split;'
ABCD

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

12楼 [报告]

发表于 2012-09-26 17:59 |只看该作者

本帖最后由 yizhengming 于 2012-09-26 18:02 编辑

echo 'ABABCCDD' |perl -nle 's/(\w)(?=.*\1)//g; print'
ABCD
echo 'CCDDENG' |perl -nle 's/(\w)(?=.*\1)//g; print'
CDENG
echo 'AQBATBABCCDD' |perl -nle 's/(\w)(?=.*\1)//g; print'
QTABCD

复制代码

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 2

射手座
日期:2014-10-10 15:59:47

15-16赛季CBA联赛之上海
日期:2016-03-03 10:27:14

13楼 [报告]

发表于 2012-09-26 21:37 |只看该作者

回复 11# jason680

thx，
echo 'AQBATBABCCDD' |perl -nle 'print join "\t",map{while(s/(\w)(.*)\1/\1\2/g){};$_}split;'
AQBTCD

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

14楼 [报告]

发表于 2012-09-26 23:22 |只看该作者

本帖最后由 yizhengming 于 2012-09-26 23:24 编辑

原来while(s///g){} 和while(m//g){} 是有区别的一个是每次循环都从字符串首开始全局替换，整个while结束条件是下一次循环开始没有可替换的文本。
另一个是每次循环从上一次替换末位置开始，直到整个字符串末结束。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

15楼 [报告]

发表于 2012-09-26 23:37 |只看该作者

回复 1# frankhyk

echo "AABBCCDD       EEFFGG
> HHIIJJKKLLMM NNOOPPQQRRSS
> TTUU                VVWWXXYYZZaabbcc
>
> "|perl -lpe 'tr/a-zA-Z//s;'
ABCD       EFG
HIJKLM NOPQRS
TU                VWXYZabc

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

16楼 [报告]

发表于 2012-09-27 00:41 |只看该作者

回复 2# jason680
这里的\1 与$1有什么区别？

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 145

技术图书徽章
日期:2013-10-01 15:32:13

戌狗
日期:2013-10-25 13:31:35

金牛座
日期:2013-11-04 16:22:07

子鼠
日期:2013-11-18 18:48:57

白羊座
日期:2013-11-29 10:09:11

狮子座
日期:2013-12-12 09:57:42

白羊座
日期:2013-12-24 16:24:46

辰龙
日期:2014-01-08 15:26:12

技术图书徽章
日期:2014-01-17 13:24:40

巳蛇
日期:2014-02-18 14:32:59

未羊
日期:2014-02-20 14:12:13

白羊座
日期:2014-02-26 12:06:59

17楼 [报告]

发表于 2012-09-27 08:31 |只看该作者

回复 16# inchonline

it hard to say what are different between $1 and \1

simple rule for normal using.
use \1 on the left side  of s///. ex: s/(..)\1/.../
use $1 on the right side of s///. ex: s/....../$1/

please refer the "perlre" for detail information or below:

$ perldoc perlre
NAME
   perlre - Perl regular expressions
DESCRIPTION
   This page describes the syntax of regular expressions in Perl.
...

   Capture buffers

   The bracketing construct "( ... )" creates capture buffers. To refer to the
   current contents of a buffer later on, within the same pattern, use \1 for the
   first, \2 for the second, and so on.  Outside the match use "$" instead of "\".
   (The \<digit> notation works in certain circumstances outside the match.  See
   "Warning on \1 Instead of $1" below for details.)  Referring back to another part
   of the match is called a backreference.

...
Warning on \1 Instead of $1
   Some people get too used to writing things like:

         $pattern =~ s/(\W)/\\\1/g;

   This is grandfathered (for \1 to \9) for the RHS of a substitute to avoid
   shocking the sed addicts, but it's a dirty habit to get into.  That's because in
   PerlThink, the righthand side of an "s///" is a double-quoted string.  "\1" in
   the usual double-quoted string means a control-A.  The customary Unix meaning of
   "\1" is kludged in for "s///".  However, if you get into the habit of doing that,
   you get yourself into trouble if you then add an "/e" modifier.

         s/(\d+)/ \1 + 1 /eg;       # causes warning under -w

   Or if you try to do

         s/(\d+)/\1000/;

   You can't disambiguate that by saying "\{1}000", whereas you can fix it with
   "${1}000".  The operation of interpolation should not be confused with the
   operation of matching a backreference.  Certainly they mean two different things
   on the left side of the "s///".

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

18楼 [报告]

发表于 2012-09-27 08:39 |只看该作者

There is a special form of this construct, called \K , which causes the regex engine to "keep" everything it had matched prior to the \K and not include it in $& .  For various reasons \K may be significantly more efficient than the equivalent (?<=...) construct, and it is especially useful in situations where you want to efficiently remove something following something else in a string. For instance:
  s/(foo)bar/$1/g;
can be rewritten as the much more efficient:
  s/foo\Kbar//g;

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

19楼 [报告]

发表于 2012-09-27 09:34 |只看该作者

本帖最后由 sjdy521 于 2012-09-27 09:34 编辑

回复 17# jason680

\1是posix所支持的正则形式，$1是perl的正则捕获变量，就这么简单，虽然perl出于好心，在你想用$1的时候错误写成了\1，它也不会抱怨，但我们不应该把简单的问题复杂化

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 145

技术图书徽章
日期:2013-10-01 15:32:13

戌狗
日期:2013-10-25 13:31:35

金牛座
日期:2013-11-04 16:22:07

子鼠
日期:2013-11-18 18:48:57

白羊座
日期:2013-11-29 10:09:11

狮子座
日期:2013-12-12 09:57:42

白羊座
日期:2013-12-24 16:24:46

辰龙
日期:2014-01-08 15:26:12

技术图书徽章
日期:2014-01-17 13:24:40

巳蛇
日期:2014-02-18 14:32:59

未羊
日期:2014-02-20 14:12:13

白羊座
日期:2014-02-26 12:06:59

20楼 [报告]

发表于 2012-09-27 09:59 |只看该作者

本帖最后由 jason680 于 2012-09-27 10:06 编辑

回复 19# sjdy521

Would you please tell us what's "Certainly they mean two different things on the left side of the "s///"" on "Warning on \1 Instead of $1" section in perlre

Warning on \1 Instead of $1
   ...
   You can't disambiguate that by saying "\{1}000", whereas you can fix it with
   "${1}000".  The operation of interpolation should not be confused with the
   operation of matching a backreference.  Certainly they mean two different things
   on the left side of the "s///".

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

123 / 3 页下一页

发新帖

Chinaunix › 论坛 › 程序设计 › Perl › 求教：重复字符的删除

北京盛拓优讯信息技术有限公司. 版权所有京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号：11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员联系我们：huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP