|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 脑子 于 2018-2-8 23:17 编辑
对于以下俩坨:- <a href="/htmlnews/2018/2/401948.shtm" target="_blank" onmouseover="javascript:this.style.color='#ba1414'" onmouseout="javascript:this.style.color='#333333'" style="color: rgb(51, 51, 51);">六年200高校更名:想成名校不能只改校名</a>
- <a href="/htmlnews/2018/2/402593.shtm" target="_blank">
- 香港设立抗战博物馆
- </a>
复制代码
我想把
- htmlnews/2018/2/401948.shtm" target="_blank" onmouseover="javascript:this.style.color='#ba1414'" onmouseout="javascript:this.style.color='#333333'" style="color: rgb(51, 51, 51);">六年200高校更名:想成名校不能只改校名</a>
复制代码
和
- htmlnews/2018/2/402593.shtm" target="_blank">
- 香港设立抗战博物馆
- </a>
复制代码
用正则表达式给找出来。
然而。。。
- >>> p=ur'/htmlnews/[^\n]+.shtm"[^。]*</a>'
- >>> urls=re.findall(p,'''cellspacing="0" width="100%">
-
- <tr onmouseover="javascript:this.style.backgroundColor='#f5ecec'" onmouseout="javascript:this.style.backgroundColor=''">
- <td align="left" valign="top" width="60%" style="font-size:5px">
- <img src="/images/t11.gif" alt="" /> <a href='/htmlnews/2018/2/402590.shtm' target="_blank" >
- 武警部队组织第一期新军事训练大纲集训
- </a><a href="/htmlnews/2018/2/401948.shtm" target="_blank" onmouseover="javascript:this.style.color='#ba1414'" onmouseout="javascript:this.style.color='#333333'" style="color: rgb(51, 51, 51);">六年200高校更名:想成名校不能只改校名</a>''')
- >>> urls
- ['/htmlnews/2018/2/401948.shtm" target="_blank" onmouseover="javascript:this.style.color=\'#ba1414\'" onmouseout="javascript:this.style.color=\'#333333\'" style="color: rgb(51, 51, 51);">\xc1\xf9\xc4\xea200\xb8\xdf\xd0\xa3\xb8\xfc\xc3\xfb\xa3\xba\xcf\xeb\xb3\xc9\xc3\xfb\xd0\xa3\xb2\xbb\xc4\xdc\xd6\xbb\xb8\xc4\xd0\xa3\xc3\xfb</a>']
复制代码
如上,我的正则只能匹配其中一个
急求解!!!
本帖最后由 lies_for_L 于 2018-2-9 20:57 编辑
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
.*?
.*?
.*?
.*?
.*?
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
- # 测试代码
- import re
- a = '''<a href="/htmlnews/2018/2/401948.shtm" target="_blank" onmouseover="javascript:this.style.color='#ba1414'" onmouseout="javascript:this.style.color='#333333'" style="color: rgb(51, 51, 51);">六年200高校更名:想成名校不能只改校名</a>
- <a href="/htmlnews/2018/2/402593.shtm" target="_blank">
- 香港设立抗战博物馆
- </a>'''
- print("p=r'/htmlnews/.+?.shtm".*?</a>'测试开始")
- p=r'/htmlnews/.+?.shtm".*?</a>'
- x=re.compile(p,re.DOTALL)
- result = x.findall(a)
- print('匹配结果:%d'%len(result))
- for i, count in zip(result,range(1,10)):
- print('第%d条匹配结果: %s'%(count, i))
- print('###########################################################')
-
- print("re_pattern = '<a href="/htmlnews/[\\w/]+.*?</a>'测试开始")
- re_pattern = '<a href="/htmlnews/[\\w/]+.*?</a>'
- x=re.compile(re_pattern,re.DOTALL)
- result = x.findall(a)
- print('匹配结果:%d'%len(result))
- for i, count in zip(result,range(1, 10)):
- print('第%d条匹配结果: %s'%(count, i))
- print('###########################################################')
- print("re_pattern = '/htmlnews/[\\w/]+\\.shtm'测试开始")
- re_pattern = '/htmlnews/[\\w/]+\\.shtm'
- x=re.compile(re_pattern,re.DOTALL)
- result = x.findall(a)
- print('匹配结果:%d'%len(result))
- for i, count in zip(result,range(1, 10)):
- print('第%d条匹配结果: %s'%(count, i))
- print('###########################################################')
- print("re_pattern = '<a href="(/htmlnews/[\\w/]+\\.shtm).*?</a>'测试开始")
- re_pattern = '<a href="(/htmlnews/[\\w/]+\\.shtm).*?</a>'
- x=re.compile(re_pattern,re.DOTALL)
- result = x.findall(a)
- print('匹配结果:%d'%len(result))
- for i, count in zip(result,range(1, 10)):
- print('第%d条匹配结果: %s'%(count, i))
- print('###########################################################')
复制代码
结果
|
|