Python FAQ 031 有道翻译爬虫代码出错
Python FAQ 031 有道翻译爬虫代码出错问题
为什么以下代码打印的是 {"errorCode":50} 而不是表示翻译结果的字典?哪里出了问题呢?
import urllib.request
import urllib.parse
url = "http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule"
data = {}
data['i'] = 'I love FishC.com!'
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['bv'] = '70244e0061db49a9ee62d341c5fed82a'
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_CLICKBUTTION'
data = urllib.parse.urlencode(data).encode('utf-8')
response = urllib.request.urlopen(url, data)
html = response.read().decode('utf-8')
print(html)
解答
需要将 URL 中的 _o 去掉,因为网站增加了反爬虫机制。改成这样就 OK 了:
import urllib.request
import urllib.parse
url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule" # 更改
data = {}
data['i'] = 'I love FishC.com!'
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['bv'] = '70244e0061db49a9ee62d341c5fed82a'
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_CLICKBUTTION'
data = urllib.parse.urlencode(data).encode('utf-8')
response = urllib.request.urlopen(url, data)
html = response.read().decode('utf-8')
print(html) 我都爬这个爬自闭了,整个人都不好了。 期间尝试了各种办法,改了各种参数,结论就是不去掉_o根本爬不出来 sam_wu 发表于 2020-4-19 21:19
我都爬这个爬自闭了,整个人都不好了。 期间尝试了各种办法,改了各种参数,结论就是不去掉_o根本爬不出来
https://www.jianshu.com/p/5001c75a23c4
解决了我的燃煤之急,跪谢! 哇塞!!查了好多好复杂的办法,原来去掉一个_o就可以!!好简单,谢谢楼主!!! 为什么去掉了就能爬出来呀{:5_94:}
页:
[1]