马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
Python FAQ 031 有道翻译爬虫代码出错
问题
为什么以下代码打印的是 {"errorCode":50} 而不是表示翻译结果的字典?哪里出了问题呢?
import urllib.request
import urllib.parse
url = "http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule"
data = {}
data['i'] = 'I love FishC.com!'
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['bv'] = '70244e0061db49a9ee62d341c5fed82a'
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_CLICKBUTTION'
data = urllib.parse.urlencode(data).encode('utf-8')
response = urllib.request.urlopen(url, data)
html = response.read().decode('utf-8')
print(html)
解答
需要将 URL 中的 _o 去掉,因为网站增加了反爬虫机制。改成这样就 OK 了:
import urllib.request
import urllib.parse
url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule" # 更改
data = {}
data['i'] = 'I love FishC.com!'
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['bv'] = '70244e0061db49a9ee62d341c5fed82a'
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_CLICKBUTTION'
data = urllib.parse.urlencode(data).encode('utf-8')
response = urllib.request.urlopen(url, data)
html = response.read().decode('utf-8')
print(html)
|