- 论坛徽章:
- 4
|
代码如下:- content = """<?xml version="1.0" encoding="UTF-8"?>
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "xxxx">
- <html xmlns="xxxx">
- XXXX
- </html>
- <?xml version="1.0" encoding="UTF-8"?>
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "xxxx">
- <html xmlns="xxxx">
- XXXX
- </html>
- <?xml version="1.0" encoding="UTF-8"?>
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "xxxx">
- <html xmlns="xxxx">
- XXXX
- </html>"""
- import re
- replacer = re.compile("</html>.*?<html .*?>", re.M | re.DOTALL)
- result = replacer.sub("", content)
- print result
复制代码 不过替换之后,原来的地方是个空行。 |
|