救救孩子吧!python有道翻译爬虫求助 纯新手被整晕了
本帖最后由 王跌宕 于 2021-8-8 01:41 编辑js代码里这三行看不懂啊
t = p("#docUploadFile").val(),
i = t.split("\\"),
o = i,
这o到底是啥啊
一直报{'errorCode': 50}错误
其他都好了就这个不确定,纯新手的我都快被这个整晕了 本帖最后由 王跌宕 于 2021-8-8 01:43 编辑
js里有关源代码
define("newweb/common/docTrans", ["./form", "./md5", "./jquery-1.7", "./account", "./log", "../langSelect", "./TranslateState", "./star", "./select", "./utils"],
function(e, t) {
function n() {
var e = p("#language").val().split("2"),
t = p("#docUploadFile").val(),
n = t.split("."),
r = n,
i = t.split("\\"),
o = i,
a = p("#docUploadFile").files,
s = 1e3,
l = (new Date).getTime(),
c = p.md5("new-fanyiweb" + l + "ydsecret://newfanyiweb.doctran/sign/0j9n2{3mLSN-$Lg]K4o0N2}" + o);
return a && a && a.size && (s = a.size),
{
from: e,
to: e,
type: r,
filename: o,
client: "docserver",
keyfrom: "new-fanyiweb",
size: s,
sign: c,
salt: l
}
}
html里有关源码
<input name="your_file" type="file" id="docUploadFile" class="doc__upload--file"> 孩子都快傻了{:10_247:} 王跌宕 发表于 2021-8-8 01:37
js里有关源代码
这个你代码? 爬有道翻译不是要抓包吗?为啥要网页源代码 import requests
dat = input('请输入想要翻译的句子:')
url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
data = {}
data['i'] = dat
data['from'] = 'AUTO'
data['to'] = 'AUTO'
data['smartresult'] = 'dict'
data['client'] = 'fanyideskweb'
data['salt'] = '16038541804405'
data['sign'] = 'a0c52b875aa481825e8411c6d7b0f6b0'
data['lts'] = '1603854180440'
data['bv'] = '8269b35cc1594b7635631cdd3a301112'
data['doctype'] = 'json'
data['version'] = '2.1'
data['keyfrom'] = 'fanyi.web'
data['action'] = 'FY_BY_REALTlME'
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"}
req = requests.post(url, headers=headers, data=data)
target = req.json()
是扣代码,而不是读代码 大马强 发表于 2021-8-8 06:43
爬有道翻译不是要抓包吗?为啥要网页源代码
是网页源代码不是我的,我是照着网上大佬的步骤写,说是网址有反爬虫机制,from data里salt和sign值每一次都在改变,要在源代码里面查找有关于这两个值的加密方式并修改,但我看不懂关于salt的加密方式{:10_278:} 南归 发表于 2021-8-8 07:48
是扣代码,而不是读代码
不懂啊我纯新手{:10_278:},意思是js里的代码也能直接复制到python里吗 大马强 发表于 2021-8-8 06:49
现在的url在里面的translate后面加了_o变成translate_o了,然后一直提示50错误,上网一查说是有道有了反爬虫机制,给from data里的sign添加了md5加密,我太难了{:5_104:} # V1.0
"""
文件 YDtranslate.py
时间 2021/02/23 20:26:37
"""
import requests
import time
import random
import hashlib
url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule"
i = input("输入单词:")
lts = str(int(round(time.time() * 1000)))# 十三位时间戳
salt = lts + str(random.randint(0, 9))# 十三位时间戳加一位随机数
s = "fanyideskweb" + i + salt + "Tbh5E8=q6U3EXe+&L[4c@"
sign = hashlib.md5(s.encode("utf-8")).hexdigest()
bv = hashlib.md5(
"5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36".encode(
"utf-8"
)
).hexdigest()
headers = {
"Referer": "http://fanyi.youdao.com/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36",
"Connection": "keep-alive",
"Accept": "image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8",
"X-Requested-With": "XMLHttpRequest",
"Content-Type": "application/x-www-form-urlencoded;",# ! charset=UTF-8导致无法将中文翻译成英文
"Origin": "http://fanyi.youdao.com",
"Accept-Language": "zh-CN,zh;q=0.9",
"Cookie": "OUTFOX_SEARCH_USER_ID=-368708839@10.108.160.18; JSESSIONID=aaaL2DMAbpTgg8Qpc2xUw; OUTFOX_SEARCH_USER_ID_NCOO=1451460344.418452; ___rl__test__cookies=1561684330987",
}
data = {
"i": i,
"from": "AUTO",
"to": "AUTO",
"smartresult": "dict",
"client": "fanyideskweb",
"salt": salt,
"sign": sign,
"lts": lts,
"bv": bv,
"doctype": "json",
"version": "2.1",
"keyfrom": "fanyi.web",
"action": "FY_BY_REALTlME",
}
response = requests.post(url=url, headers=headers, data=data).json()
result = response["translateResult"]["tgt"]
print(result)
王跌宕 发表于 2021-8-8 15:20
现在的url在里面的translate后面加了_o变成translate_o了,然后一直提示50错误,上网一查说是有道有了反 ...
我这个是可以用的 这个o是i数组倒数第一的元素。
页:
[1]