爬虫求助:为什么打不开豆瓣的网站?
import urllib.requestresponse = urllib.request.urlopen("http://www.douban.com")
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
response = urllib.request.urlopen("http://www.douban.com")
File "D:\文件\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "D:\文件\Python38\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "D:\文件\Python38\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "D:\文件\Python38\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "D:\文件\Python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "D:\文件\Python38\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 418: 已经不用urllib了,直接用requests库不是美滋滋,你试试,而且url也不对,豆瓣是https 你可以试试这样import requests
url = 'https://book.douban.com/'
response = requests.get(url)
print(response.text) 本帖最后由 疾风怪盗 于 2020-9-7 17:44 编辑
<Response >
反爬了吧
用selenium试试吧 网站有反爬,用requests库加headers两三行代码就可以请求成功了。用urllib的话就比较麻烦,需要代码比较多,试试这个。
import urllib.request
headers = {} # 自行添加headers
url = 'http://www.douban.com/'
re = urllib.request.Request(url=url, headers=headers)
html = urllib.request.urlopen(re)
print(html.read().decode()) 本帖最后由 suchocolate 于 2020-9-7 19:28 编辑
得改ua,默认ua是python-urllib,豆瓣反扒。
from urllib import request
headers = {'user-agent': 'firefox'}
req = request.Request('http://www.douban.com', headers=headers)
r = request.urlopen(req)
print(r.read().decode('utf-8'))
这是不改ua的抓包。
这是改了ua的抓包:
https://xxx.ilovefishc.com/album/202009/07/192724u7rw7vyuvno6ap73.png
页:
[1]