病名为孙笑川 发表于 2020-9-5 08:47:47

爬虫求助:为什么打不开豆瓣的网站?

import urllib.request
response = urllib.request.urlopen("http://www.douban.com")
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
    response = urllib.request.urlopen("http://www.douban.com")
File "D:\文件\Python38\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
File "D:\文件\Python38\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
File "D:\文件\Python38\lib\urllib\request.py", line 640, in http_response
    response = self.parent.error(
File "D:\文件\Python38\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
File "D:\文件\Python38\lib\urllib\request.py", line 502, in _call_chain
    result = func(*args)
File "D:\文件\Python38\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 418:

bonst 发表于 2020-9-7 14:59:00

已经不用urllib了,直接用requests库不是美滋滋,你试试,而且url也不对,豆瓣是https

bonst 发表于 2020-9-7 15:00:34

你可以试试这样import requests

url = 'https://book.douban.com/'
response = requests.get(url)
print(response.text)

疾风怪盗 发表于 2020-9-7 17:40:13

本帖最后由 疾风怪盗 于 2020-9-7 17:44 编辑

<Response >
反爬了吧
用selenium试试吧

YunGuo 发表于 2020-9-7 18:16:56

网站有反爬,用requests库加headers两三行代码就可以请求成功了。用urllib的话就比较麻烦,需要代码比较多,试试这个。

import urllib.request
headers = {}    # 自行添加headers
url = 'http://www.douban.com/'
re = urllib.request.Request(url=url, headers=headers)
html = urllib.request.urlopen(re)
print(html.read().decode())

suchocolate 发表于 2020-9-7 18:36:32

本帖最后由 suchocolate 于 2020-9-7 19:28 编辑

得改ua,默认ua是python-urllib,豆瓣反扒。
from urllib import request

headers = {'user-agent': 'firefox'}
req = request.Request('http://www.douban.com', headers=headers)
r = request.urlopen(req)
print(r.read().decode('utf-8'))
这是不改ua的抓包。

这是改了ua的抓包:
https://xxx.ilovefishc.com/album/202009/07/192724u7rw7vyuvno6ap73.png
页: [1]
查看完整版本: 爬虫求助:为什么打不开豆瓣的网站?