|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
代码如下:
import urllib.request
import re
def open_url(url):
req=urllib.request.Request(url)
req=req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0')
response=urllib.request.urlopen(req)
html=response.read().decode('utf-8')
return html
def get_img(html):
p=r'<img class="BDE_Image" src="[^"]+\.jpg"'
imglist=re.findall(p,html)
for each in imglist:
print(each)
if __name__=='__main__':
url='https://tieba.baidu.com/p/3563409202'
get_img(open_url(url))
错误:
Traceback (most recent call last):
File "E:/pythonpro/lesson53.py", line 20, in <module>
get_img(open_url(url))
File "E:/pythonpro/lesson53.py", line 7, in open_url
response=urllib.request.urlopen(req)
File "D:\Python36_64\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "D:\Python36_64\lib\urllib\request.py", line 517, in open
req.timeout = timeout
AttributeError: 'NoneType' object has no attribute 'timeout'
我看了一下错误是在打开网址的时候,是百度加了什么反爬的机制吗??搞不懂a
def open_url(url):
req = urllib.request.Request(url)
req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0')
response = urllib.request.urlopen(req)
html = response.read().decode('utf-8')
return html
|
|