"论一只爬虫的自我修养"第一课课后作业运行到douban的网址就会报错,求解?
动动手最后一题:按照参考答案的代码运行,一到www.douban.com就会报错,如下:
请问这是为啥呀,如何解决? 错误描述:418的意思是被网站的反爬虫程序返回的,网上解释为,418 I'm a teapot
The HTTP 418 I'm a teapot client error response code indicates that the server refuses to brew coffee because it is a teapot. This error is a reference to Hyper Text Coffee Pot Control Protocol which was an April Fools' joke in 1998.
翻译为:HTTP 418 I'm a teapot客户端错误响应代码表示服务器拒绝煮咖啡,因为它是一个茶壶。这个错误是对1998年愚人节玩笑的超文本咖啡壶控制协议的引用。
解决办法 加headers,往后学就会学到 把代码发上来,帮你加个 headers zltzlt 发表于 2020-4-4 19:55
把代码发上来,帮你加个 headers
import urllib.request
import chardet
with open("urls.txt") as file:
eachlines=file.readlines()
times=1
for eachline in eachlines:
html=urllib.request.urlopen(eachline).read()
encode=chardet.detect(html)['encoding']
if encode=='GB2312':
encode='GBK'
with open("url_%d.txt" % times,'w',encoding=encode) as new:
new.write(html.decode(encode,'ignore'))
times += 1
感谢!
页:
[1]