免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 82131 | 回复: 3
打印 上一主题 下一主题

怎么把文件中的ip过滤出来 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2012-08-14 23:13 |只看该作者 |倒序浏览
文件有以下规律
1、夹杂在br(html标签)
2、夹杂在td标签(html标签)中
3、ip和端口分开夹杂在2个连续的td标签(html标签)中
4、直接连续的ip

有什么方法能将文件过滤成连续的ip形式
122.232.228.217:6675
190.207.107.17:8080
61.18.76.127:9415
类型是这样的
下面的文件四种类型会互相夹杂。

万分感谢!!!
  1. 第一种情况<br>190.74.185.65:8080<br>122.232.228.217:6675<br>201.208.228.243:8080<br>201.243.35.106:8080<br>193.116.157.195:80<br>2.49.91.33:8118<br>190.42.25.190:8080<br>190.207.228.95:8080<br>110.139.100.35:3128<br>200.54.92.187:80<br>180.183.147.93:3128<br>190.207.107.17:8080<br>61.18.76.127:9415<br>189.107.78.42:8080<br>190.42.129.112:8080<br>24.234.146.189:80<br>190.199.150.191:8080<br>190.202.230.192:8080<br>187.78.64.25:8080<br>66.39.5.184:80<br>177.36.242.93:8080<br>92.96.146.148:8118<br>82.99.254.146:8080<br>201.249.114.63:8080<br>209.190.7.250:80<br>190.78.25.207:8080<br>180.243.235.40:8080<br>201.249.20.161:8080<br>190.201.102.213:8080<br>190.198.112.37:8080<br>92.96.192.126:8118<br>92.97.95.150:8118<br>201.211.141.143:8080<br>122.113.39.40:80<br>91.121.115.66:3128<br>190.199.96.186:8080<br>201.210.8.152:8080<br>201.208.110.155:8080<br>217.119.81.106:3128<br>110.77.219.114:3128<br>92.99.136.27:8118<br>41.35.48.91:8080<br>123.100.5.57:8080<br>190.38.185.109:8080<br>186.94.21.83:8080<br>123.153.168.11:6675<br>190.202.124.18:3128<br>195.175.37.8:80<br>201.211.239.149:8080<br>187.78.147.66:8080<br>190.206.155.213:8080<br>190.198.156.244:8080<br>186.93.218.196:8080<br>222.214.130.89:8909<br>212.122.235.220:57<br>201.211.6.20:8080<br>92.99.162.242:8118<br>186.94.255.135:8080<br>190.199.33.200:8080<br>37.59.125.190:3128<br>176.31.99.49:80<br>110.77.213.27:3128<br>60.185.223.143:8909<br>188.255.147.65:6666<br>186.92.119.119:8080<br>186.95.65.79:8080<br>151.100.152.48:80<br>58.222.141.118:3128<br>186.93.105.14:8080<br>110.77.232.124:3128<br>61.138.6.89:8080<br>107.20.182.77:3128<br>125.209.94.12:8080<br>190.77.211.189:8080<br>186.88.67.109:8080<br>125.39.93.69:8888<br>27.8.126.96:6675<br>187.32.63.54:8080<br>203.223.47.246:3128<br>190.79.106.205:8080<br>202.232.97.11:8080<br>190.200.166.5:8080<br>92.99.248.84:8118<br>189.111.209.224:80<br>67.228.176.65:443<br>2.49.209.252:8118<br><br>
  2.                   <tr class="list_sorted">
  3.                   </tr>
  4. 第二种情况
  5. <tr><td>201.219.17.45:3128</td><td>transparent </td><td>832 minutes ago</td><td>Ecuador</td></tr><tr><td>79.173.37.19:8080</td><td>transparent </td><td>1431 minutes ago</td><td>Poland</td></tr><tr><td>189.13.207.196:8080</td><td>transparent </td><td>1202 minutes ago</td><td>Brazil</td></tr><tr><td>85.185.95.194:8080</td><td>transparent </td><td>585 minutes ago</td><td>Iran, Islamic Republic of</td></tr><tr><td style="padding-top:10px; padding-bottom:10px" align="center" colspan="4">
  6.         <div style="margin: 0 auto; padding: 10px; font-size:15px; background-color:#FFF; color: #483651; text-align:center"><a style="color:#00F" href="http://proxy.lc/"><strong>Personal Premium Proxy</strong></a><!--<br />Think. Feel. Enjoy--></div></td></tr><tr><td>118.97.44.154:8080</td><td>transparent </td><td>873 minutes ago</td><td>Indonesia</td></tr><tr><td>92.99.128.80:8118</td><td>transparent </td><td>842 minutes ago</td><td>United Arab Emirates</td></tr><tr><td>201.251.62.137:8080</td><td>transparent </td><td>607 minutes ago</td><td>Argentina</td></tr><tr><td>202.165.88.109:80</td><td>transparent </td><td>564 minutes ago</td><td>Australia</td></tr><tr><td>177.36.242.97:8080</td><td>transparent </td><td>1052 minutes ago</td><td>Brazil</td></tr><tr><td>89.218.94.166:3128</td><td>transparent </td><td>645 minutes ago</td><td>Kazakstan</td></tr><tr><td>177.36.242.61:8080</td><td>transparent </td><td>992 minutes ago</td><td>Brazil</td></tr><tr><td>92.99.207.231:8118</td><td>transparent </td><td>1040 minutes ago</td><td>United Arab Emirates</td></tr></table>
  7.           <div style="width: 780px; text-align:center; margin: 15px 0 15px 0">

  8. <!--
  9.     <table  class="navbar" border="0" cellspacing="5" cellpadding="0">
  10.       <tr>
  11.       </tr>
  12.       <tr>
  13.         <td><a href="http://proxy.lc/index.html#France">Buy Now - just  &euro;5,00</a></td>
  14.         <td><a href="http://proxy.lc/index.html#USA">Buy Now - just  $6.50</a></td>
  15.         <td><a href="http://proxy.lc/index.html#UK">Buy Now - just &pound;5.00</a></td>
  16.       </tr>
  17.     </table>
  18. -->
  19. 第三种情况
  20. <td class=ip>41.35.48.134</td><td class=port>8080</td><td class=isssl>Yes</td><td class=proxytype>Transparent</td><td class=cc><img width=20 height=20 src=img/flags/EG.png> EG Egypt</td><td class=registredto>All-01</td><td class=latency><div class=speedbar><div class=fast style="width:86%"></div><div class=value>86%</div></div></td><td class=reliability><div class=speedbar><div class=slow style="width:4%"></div><div class=value>4%</div></div></td><td class=uptime>0.17</td></tr><tr id=453><td class=ip>161.139.195.99</td><td class=port>80</td><td class=isssl>Yes</td><td class=proxytype>Anonymous</td><td class=cc><img width=20 height=20 src=img/flags/MY.png> MY Malaysia</td><td class=registredto></td><td class=latency><div class=speedbar><div class=fast style="width:79%"></div><div class=value>79%</div></div></td><td class=reliability><div class=speedbar><div class=slow style="width:1%"></div><div class=value>1%</div></div></td><td class=uptime>0.18</td></tr><tr id=3699><td class=ip>190.95.206.178</td><td class=port>8080</td>
  21. 第四种情况
  22. 178.32.5.178:3128
  23. 94.23.154.55:3128
  24. 178.32.5.190:3128
  25. 178.32.5.164:3128
  26. 178.32.5.161:3128
  27. 212.45.5.172:3128
  28. 81.200.26.217:3128
  29. 134.121.64.4:3127
  30. 222.52.99.131:8081
  31. 61.190.28.166:8080
  32. 94.232.65.104:3128
  33. 137.165.1.111:3127
  34. 82.199.113.2:3128
  35. 212.23.70.188:3128
  36. 92.50.152.62:3128
  37. 69.163.96.2:8080
  38. 95.31.2.114:3128
  39. 82.91.170.116:8080
  40. 218.204.240.26:8080
  41. 60.63.79.127:8909
  42. 114.241.36.11:8909
  43. 85.237.46.141:8080
  44. 209.97.203.64:8080
复制代码
如果其他脚本语言更好实现都可以,再次感谢

论坛徽章:
0
2 [报告]
发表于 2012-08-14 23:31 |只看该作者
在线等,亲

论坛徽章:
4
水瓶座
日期:2013-09-06 12:27:30摩羯座
日期:2013-09-28 14:07:46处女座
日期:2013-10-24 14:25:01酉鸡
日期:2014-04-07 11:54:15
3 [报告]
发表于 2012-08-15 00:00 |只看该作者
  1. # -*- coding:gb2312 -*-

  2. import re

  3. content = """
  4. 第一种情况<br>190.74.185.65:8080<br>122.232.228.217:6675<br>201.208.228.243:8080<br>201.243.35.106:8080<br>193.116.157.195:80<br>2.49.91.33:8118<br>190.42.25.190:8080<br>190.207.228.95:8080<br>110.139.100.35:3128<br>200.54.92.187:80<br>180.183.147.93:3128<br>190.207.107.17:8080<br>61.18.76.127:9415<br>189.107.78.42:8080<br>190.42.129.112:8080<br>24.234.146.189:80<br>190.199.150.191:8080<br>190.202.230.192:8080<br>187.78.64.25:8080<br>66.39.5.184:80<br>177.36.242.93:8080<br>92.96.146.148:8118<br>82.99.254.146:8080<br>201.249.114.63:8080<br>209.190.7.250:80<br>190.78.25.207:8080<br>180.243.235.40:8080<br>201.249.20.161:8080<br>190.201.102.213:8080<br>190.198.112.37:8080<br>92.96.192.126:8118<br>92.97.95.150:8118<br>201.211.141.143:8080<br>122.113.39.40:80<br>91.121.115.66:3128<br>190.199.96.186:8080<br>201.210.8.152:8080<br>201.208.110.155:8080<br>217.119.81.106:3128<br>110.77.219.114:3128<br>92.99.136.27:8118<br>41.35.48.91:8080<br>123.100.5.57:8080<br>190.38.185.109:8080<br>186.94.21.83:8080<br>123.153.168.11:6675<br>190.202.124.18:3128<br>195.175.37.8:80<br>201.211.239.149:8080<br>187.78.147.66:8080<br>190.206.155.213:8080<br>190.198.156.244:8080<br>186.93.218.196:8080<br>222.214.130.89:8909<br>212.122.235.220:57<br>201.211.6.20:8080<br>92.99.162.242:8118<br>186.94.255.135:8080<br>190.199.33.200:8080<br>37.59.125.190:3128<br>176.31.99.49:80<br>110.77.213.27:3128<br>60.185.223.143:8909<br>188.255.147.65:6666<br>186.92.119.119:8080<br>186.95.65.79:8080<br>151.100.152.48:80<br>58.222.141.118:3128<br>186.93.105.14:8080<br>110.77.232.124:3128<br>61.138.6.89:8080<br>107.20.182.77:3128<br>125.209.94.12:8080<br>190.77.211.189:8080<br>186.88.67.109:8080<br>125.39.93.69:8888<br>27.8.126.96:6675<br>187.32.63.54:8080<br>203.223.47.246:3128<br>190.79.106.205:8080<br>202.232.97.11:8080<br>190.200.166.5:8080<br>92.99.248.84:8118<br>189.111.209.224:80<br>67.228.176.65:443<br>2.49.209.252:8118<br><br>
  5.                   <tr class="list_sorted">
  6.                   </tr>
  7. 第二种情况
  8. <tr><td>201.219.17.45:3128</td><td>transparent </td><td>832 minutes ago</td><td>Ecuador</td></tr><tr><td>79.173.37.19:8080</td><td>transparent </td><td>1431 minutes ago</td><td>Poland</td></tr><tr><td>189.13.207.196:8080</td><td>transparent </td><td>1202 minutes ago</td><td>Brazil</td></tr><tr><td>85.185.95.194:8080</td><td>transparent </td><td>585 minutes ago</td><td>Iran, Islamic Republic of</td></tr><tr><td style="padding-top:10px; padding-bottom:10px" align="center" colspan="4">
  9.         <div style="margin: 0 auto; padding: 10px; font-size:15px; background-color:#FFF; color: #483651; text-align:center"><a style="color:#00F" href="http://proxy.lc/"><strong>Personal Premium Proxy</strong></a><!--<br />Think. Feel. Enjoy--></div></td></tr><tr><td>118.97.44.154:8080</td><td>transparent </td><td>873 minutes ago</td><td>Indonesia</td></tr><tr><td>92.99.128.80:8118</td><td>transparent </td><td>842 minutes ago</td><td>United Arab Emirates</td></tr><tr><td>201.251.62.137:8080</td><td>transparent </td><td>607 minutes ago</td><td>Argentina</td></tr><tr><td>202.165.88.109:80</td><td>transparent </td><td>564 minutes ago</td><td>Australia</td></tr><tr><td>177.36.242.97:8080</td><td>transparent </td><td>1052 minutes ago</td><td>Brazil</td></tr><tr><td>89.218.94.166:3128</td><td>transparent </td><td>645 minutes ago</td><td>Kazakstan</td></tr><tr><td>177.36.242.61:8080</td><td>transparent </td><td>992 minutes ago</td><td>Brazil</td></tr><tr><td>92.99.207.231:8118</td><td>transparent </td><td>1040 minutes ago</td><td>United Arab Emirates</td></tr></table>
  10.           <div style="width: 780px; text-align:center; margin: 15px 0 15px 0">

  11. <!--
  12.     <table  class="navbar" border="0" cellspacing="5" cellpadding="0">
  13.       <tr>
  14.       </tr>
  15.       <tr>
  16.         <td><a href="http://proxy.lc/index.html#France">Buy Now - just  &euro;5,00</a></td>
  17.         <td><a href="http://proxy.lc/index.html#USA">Buy Now - just  $6.50</a></td>
  18.         <td><a href="http://proxy.lc/index.html#UK">Buy Now - just &pound;5.00</a></td>
  19.       </tr>
  20.     </table>
  21. -->
  22. 第三种情况
  23. <td class=ip>41.35.48.134</td><td class=port>8080</td><td class=isssl>Yes</td><td class=proxytype>Transparent</td><td class=cc><img width=20 height=20 src=img/flags/EG.png> EG Egypt</td><td class=registredto>All-01</td><td class=latency><div class=speedbar><div class=fast style="width:86%"></div><div class=value>86%</div></div></td><td class=reliability><div class=speedbar><div class=slow style="width:4%"></div><div class=value>4%</div></div></td><td class=uptime>0.17</td></tr><tr id=453><td class=ip>161.139.195.99</td><td class=port>80</td><td class=isssl>Yes</td><td class=proxytype>Anonymous</td><td class=cc><img width=20 height=20 src=img/flags/MY.png> MY Malaysia</td><td class=registredto></td><td class=latency><div class=speedbar><div class=fast style="width:79%"></div><div class=value>79%</div></div></td><td class=reliability><div class=speedbar><div class=slow style="width:1%"></div><div class=value>1%</div></div></td><td class=uptime>0.18</td></tr><tr id=3699><td class=ip>190.95.206.178</td><td class=port>8080</td>
  24. 第四种情况
  25. 178.32.5.178:3128
  26. 94.23.154.55:3128
  27. 178.32.5.190:3128
  28. 178.32.5.164:3128
  29. 178.32.5.161:3128
  30. 212.45.5.172:3128
  31. 81.200.26.217:3128
  32. 134.121.64.4:3127
  33. 222.52.99.131:8081
  34. 61.190.28.166:8080
  35. 94.232.65.104:3128
  36. 137.165.1.111:3127
  37. 82.199.113.2:3128
  38. 212.23.70.188:3128
  39. 92.50.152.62:3128
  40. 69.163.96.2:8080
  41. 95.31.2.114:3128
  42. 82.91.170.116:8080
  43. 218.204.240.26:8080
  44. 60.63.79.127:8909
  45. 114.241.36.11:8909
  46. 85.237.46.141:8080
  47. 209.97.203.64:8080

  48. """

  49. def parse_ipv4(content):
  50.     patterns = [
  51.                 r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d+)',
  52.                 r'ip>(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).*?port>(\d+)',
  53.     ]
  54.    
  55.     for pattern in patterns:
  56.         matches = re.findall(pattern, content, re.I)
  57.         if matches:
  58.             for match in matches:
  59.                 print match[0] + ":" + match[1]
  60.         
  61. parse_ipv4(content)
复制代码

论坛徽章:
0
4 [报告]
发表于 2012-08-15 08:31 |只看该作者
正式表达式运用,学习了!
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP