正则表达式爬取标题时小横线变为–怎么办
import reimport requests
link="http://www.santostang.com/"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36'}
r=requests.get(link,headers=headers)
html=r.text
comment=re.compile(r'<h1 class="post-title"><a href=.*?>(.*?)</a>',flags=re.DOTALL)
title_list=comment.findall(html)
for each in title_list:
print(each)
import re
import requests
link="http://www.santostang.com/"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36'}
r=requests.get(link,headers=headers)
html=r.text
comment=re.compile(r'<h1 class="post-title"><a href=.*?>(.*?)</a>',flags=re.DOTALL)
title_list=comment.findall(html)
for each in title_list:
print(each.replace('–',' '))这样就好了 import re
import requests
link = "http://www.santostang.com/"
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36'}
r = requests.get(link, headers=headers)
html = r.text
comment = re.compile(r'<h1 class="post-title"><a href=.*?>(.*?)</a>', flags=re.DOTALL)
title_list = comment.findall(html)
for each in title_list:
print(each.replace('–', chr(8211))) kaohsing 发表于 2020-4-30 14:49
原来是要unescape解码吗,感谢{:5_109:} 永恒的蓝色梦想 发表于 2020-4-29 23:24
这样就好了
23333 zltzlt 发表于 2020-4-30 13:11
感谢{:5_109:}
页:
[1]