|
1鱼币
本帖最后由 yixinwenxin 于 2022-3-21 22:34 编辑
爬虫初体验非常不友好
urllib.error.HTTPError: HTTP Error 418:
上来就418
我现在慌的一批
- import urllib.request
- import chardet
- with open("urls.txt",mode="r",encoding="utf-8") as url:
- list_url = url.read().split()
- for i in list_url:
- url_ = urllib.request.urlopen(i).read()
- print(chardet.detect(url_)["encoding"])
复制代码
urls.txt
- http://www.fishc.com
- http://www.baidu.com
- http://www.douban.com
- http://www.zhihu.com
- http://www.taobao.coms
复制代码
加请求头
- s = ["http://www.fishc.com",
- "http://www.baidu.com",
- "http://www.douban.com",
- "http://www.zhihu.com",
- "http://www.taobao.com"
- ]
- headers = {
- 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
- }
- # with open("urls.txt",mode="r",encoding="utf-8") as url:
- # list_url = url.read().split()
- for i in s:
- request = urllib.request.Request(url=i, headers=headers) # 避免反爬
- url_ = urllib.request.urlopen(request).read()
- print(chardet.detect(url_)["encoding"])
复制代码
|
|