爬虫求助：为什么打不开豆瓣的网站？,Python交流,编程语言专区,鱼C论坛

病名为孙笑川 发表于 2020-9-5 08:47:47

爬虫求助：为什么打不开豆瓣的网站？

import urllib.request
response = urllib.request.urlopen("http://www.douban.com")
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
response = urllib.request.urlopen("http://www.douban.com")
File "D:\文件\Python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "D:\文件\Python38\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "D:\文件\Python38\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "D:\文件\Python38\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "D:\文件\Python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "D:\文件\Python38\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 418:

bonst 发表于 2020-9-7 14:59:00

已经不用urllib了，直接用requests库不是美滋滋，你试试，而且url也不对，豆瓣是https

bonst 发表于 2020-9-7 15:00:34

你可以试试这样import requests

url = 'https://book.douban.com/'
response = requests.get(url)
print(response.text)

疾风怪盗 发表于 2020-9-7 17:40:13

本帖最后由疾风怪盗于 2020-9-7 17:44 编辑

<Response >
反爬了吧
用selenium试试吧

YunGuo 发表于 2020-9-7 18:16:56

网站有反爬，用requests库加headers两三行代码就可以请求成功了。用urllib的话就比较麻烦，需要代码比较多，试试这个。

import urllib.request
headers = {} # 自行添加headers
url = 'http://www.douban.com/'
re = urllib.request.Request(url=url, headers=headers)
html = urllib.request.urlopen(re)
print(html.read().decode())

suchocolate 发表于 2020-9-7 18:36:32

本帖最后由 suchocolate 于 2020-9-7 19:28 编辑

得改ua，默认ua是python-urllib，豆瓣反扒。
from urllib import request

headers = {'user-agent': 'firefox'}
req = request.Request('http://www.douban.com', headers=headers)
r = request.urlopen(req)
print(r.read().decode('utf-8'))
这是不改ua的抓包。

这是改了ua的抓包：
https://xxx.ilovefishc.com/album/202009/07/192724u7rw7vyuvno6ap73.png

页: [1]

鱼C论坛's Archiver

爬虫求助：为什么打不开豆瓣的网站？