爬虫返回状态码521
这问题卡我一周了朋友们,救命啊!!!!!网页就https://www.yidaiyilu.gov.cn/xwzx/gnxw/87373.htm这个网页
然后老规矩,上requests
import requests
url = "https://www.yidaiyilu.gov.cn/xwzx/gnxw/87373.htm"
res = requests.get(url)
print(res.status_code) #这里返回521
print(res.text)
##################
<script>document.cookie=('_')+('_')+('j')+('s')+('l')+('_')+('c')+('l')+('e')+('a')+('r')+('a')+('n')+('c')+('e')+('_')+('s')+('=')+(-~[]+'')+(3+3+'')+(~~[]+'')+((1+)/+'')+(3+4+'')+(~~{}+'')+(7+'')+(2+4+'')+(1+4+'')+((+0>>2)+'')+('.')+(2+3+'')+((+0>>2)+'')+(3+3+'')+('|')+('-')+(-~false+'')+('|')+('u')+('g')+('D')+('L')+('x')+('A')+('W')+('M')+('l')+('h')+('q')+((1<<3)+'')+('Z')+('g')+('s')+('%')+(2+'')+('B')+('D')+('f')+(-~+'')+('r')+(-~1+'')+('J')+('z')+('R')+('f')+('C')+('s')+('%')+((2^1)+'')+('D')+(';')+('m')+('a')+('x')+('-')+('a')+('g')+('e')+('=')+(3+'')+(-~+'')+(~~[]+'')+((+false)+'')+(';')+('p')+('a')+('t')+('h')+('=')+('/');location.href=location.pathname+location.search</script>
网上搜出来的结果都是带var 带function的,我这返回的是啥???
这可咋整???(另外,selenium也试过了,没成功,难受,想哭)
网站反爬虫,返回的内容被加密了
Twilight6 发表于 2020-11-18 22:25
网站反爬虫,返回的内容被加密了
嗐,我知道,就是不知道咋整,{:9_220:} 用编程搞垮道盟 发表于 2020-11-18 22:41
嗐,我知道,就是不知道咋整,
抱歉 ,我爬虫只会些基础,只知道返回的被加密了
Twilight6 发表于 2020-11-18 22:49
抱歉 ,我爬虫只会些基础,只知道返回的被加密了
好吧,谢谢,也就你能理理我了嗐{:10_266:} 用编程搞垮道盟 发表于 2020-11-18 22:50
好吧,谢谢,也就你能理理我了嗐
没事,蹲蹲看有没大佬~
用编程搞垮道盟 发表于 2020-11-18 22:50
好吧,谢谢,也就你能理理我了嗐
加个Cookie应该就没问题了 反加密的话 大部分的用cookie或IP池基本上都能解决了 这个页面需要cookie包含两个参数:
__jsluid_s:从521的response的header里可以看到:import requests
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Firefox/68.0'}
url = "https://www.yidaiyilu.gov.cn/xwzx/gnxw/87373.htm"
r = requests.get(url, headers=headers)
r.encoding = 'utf-8'
for k, v in r.cookies.items():
print(k, '=', v)
__jsl_clearance_s:这个主要用js算出来,js不熟,找到一篇,可以参考一下:https://blog.csdn.net/qq_39138295/article/details/100705405
如果不想研究如何生成的,也可以直接用浏览器的:import requests
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Firefox/68.0', 'cookie': '__jsluid_s=b37d6e95fe6d3b0d462eb76b2ab93002; __jsl_clearance_s=1605747177.992|0|EnIyB0bctVOOAs8Tklgeghe%2Bx44%3D'}
url = "https://www.yidaiyilu.gov.cn/xwzx/gnxw/87373.htm"
r = requests.get(url, headers=headers)
r.encoding = 'utf-8'
print(r.status_code)
print(r.text)
楼上说的方法都是没有用的,我测试了下,这个是js加密
笨鸟学飞 发表于 2020-11-19 09:37
楼上说的方法都是没有用的,我测试了下,这个是js加密
我这可以爬到啊 闲来无事分析了一下,这个cookie生成方式不难,从浏览器分析来看,总共进行了三次请求,前两次都是为了生成cookie,最后一次才是正确响应:
1、第一次响应得到一段js代码,这段js代码执行后为浏览器添加了一个cookie;
<script>document.cookie=('_')+('_')+('j')+('s')+('l')+('_')+('c')+('l')+('e')+('a')+('r')+('a')+('n')+('c')+('e')+('_')+('s')+('=')+(-~false+'')+(*(3)+'')+(~~''+'')+((+0>>2)+'')+(-~+'')+((1+)/+'')+((+0>>2)+'')+(1+1+'')+((2^1)+'')+(-~+'')+('.')+(*(3)+'')+(-~0+'')+(1+3+'')+('|')+('-')+((+true)+'')+('|')+('w')+('Q')+('z')+('u')+(-~false+'')+('M')+('i')+('b')+('l')+('V')+('B')+(4+5+'')+('e')+('K')+('b')+('%')+(1+1+'')+('B')+('V')+('o')+('J')+('B')+('y')+('A')+('Q')+('A')+((2)*+'')+('h')+((1<<2)+'')+('%')+((1|2)+'')+('D')+(';')+('m')+('a')+('x')+('-')+('a')+('g')+('e')+('=')+(-~+'')+(-~+'')+(~~[]+'')+(~~{}+'')+(';')+('p')+('a')+('t')+('h')+('=')+('/');location.href=location.pathname+location.search</script>
分析及实现:利用正则将js代码取出,再利用execjs模块执行这段代码,得到一段名为__jsl_clearance_s的cookie(这个不是最终cookie值),同时获取该请求的set-cookie,作为下次请求附带使用
# 获取cookie参数jsluid
jsluid = response.headers.get('set-cookie').split(';')
# 提取js代码
js_clearance = re.findall('cookie=(.*?);location.href=', response.text)
# 执行后获得cookie参数js_clearance
result = execjs.eval(js_clearance).split(';')
2、第二次通过携带上一个请求得到的两个cookie参数请求并得到响应第二段js代码,这段js代码是经过混淆的代码,利用解混淆工具将js代码解混淆后代码如下;
function hash(_0x1b66b8) {
function _0x35c6e5(_0x268dd8, _0xea5bd4) {
return _0x268dd8 << _0xea5bd4 | _0x268dd8 >>> 32 - _0xea5bd4;
}
function _0x1eaf4b(_0x31d866, _0x14e06e) {
var _0x157f3b, _0x51ff9a, _0x2bf573, _0x434e16, _0x3f57f0;
_0x2bf573 = _0x31d866 & 2147483648;
_0x434e16 = _0x14e06e & 2147483648;
_0x157f3b = _0x31d866 & 1073741824;
_0x51ff9a = _0x14e06e & 1073741824;
_0x3f57f0 = (_0x31d866 & 1073741823) + (_0x14e06e & 1073741823);
if (_0x157f3b & _0x51ff9a) {
return _0x3f57f0 ^ 2147483648 ^ _0x2bf573 ^ _0x434e16;
}
if (_0x157f3b | _0x51ff9a) {
if (_0x3f57f0 & 1073741824) {
return _0x3f57f0 ^ 3221225472 ^ _0x2bf573 ^ _0x434e16;
} else {
return _0x3f57f0 ^ 1073741824 ^ _0x2bf573 ^ _0x434e16;
}
} else {
return _0x3f57f0 ^ _0x2bf573 ^ _0x434e16;
}
}
function _0x296d1d(_0x3ec120, _0x19f2dd, _0x5c9060) {
return _0x3ec120 & _0x19f2dd | ~_0x3ec120 & _0x5c9060;
}
function _0x2e22ab(_0x1b4bee, _0x5b3ded, _0x1f786e) {
return _0x1b4bee & _0x1f786e | _0x5b3ded & ~_0x1f786e;
}
function _0x9c1e12(_0x583030, _0x2fb4b0, _0x1e223e) {
return _0x583030 ^ _0x2fb4b0 ^ _0x1e223e;
}
function _0x21943e(_0x507d21, _0x593ceb, _0x12d837) {
return _0x593ceb ^ (_0x507d21 | ~_0x12d837);
}
function _0x30c4a8(_0x11c9c5, _0x2d92d7, _0x5443b6, _0xf48f8, _0x224d79, _0x640128, _0x4788bf) {
_0x11c9c5 = _0x1eaf4b(_0x11c9c5, _0x1eaf4b(_0x1eaf4b(_0x296d1d(_0x2d92d7, _0x5443b6, _0xf48f8), _0x224d79), _0x4788bf));
return _0x1eaf4b(_0x35c6e5(_0x11c9c5, _0x640128), _0x2d92d7);
}
function _0x2145f8(_0x53d7e0, _0xf63c6, _0x1eddd0, _0x5af86a, _0x4e89ac, _0x42dfbd, _0x4e866b) {
_0x53d7e0 = _0x1eaf4b(_0x53d7e0, _0x1eaf4b(_0x1eaf4b(_0x2e22ab(_0xf63c6, _0x1eddd0, _0x5af86a), _0x4e89ac), _0x4e866b));
return _0x1eaf4b(_0x35c6e5(_0x53d7e0, _0x42dfbd), _0xf63c6);
}
function _0x311b76(_0x39b6f5, _0x5a7109, _0x3a29c6, _0x4fb375, _0xcadb59, _0x508c0e, _0x234182) {
_0x39b6f5 = _0x1eaf4b(_0x39b6f5, _0x1eaf4b(_0x1eaf4b(_0x9c1e12(_0x5a7109, _0x3a29c6, _0x4fb375), _0xcadb59), _0x234182));
return _0x1eaf4b(_0x35c6e5(_0x39b6f5, _0x508c0e), _0x5a7109);
}
function _0x361b6d(_0x3d0c62, _0x300099, _0x537e35, _0x6f09e1, _0x45e6a4, _0x1d7856, _0x2506bc) {
_0x3d0c62 = _0x1eaf4b(_0x3d0c62, _0x1eaf4b(_0x1eaf4b(_0x21943e(_0x300099, _0x537e35, _0x6f09e1), _0x45e6a4), _0x2506bc));
return _0x1eaf4b(_0x35c6e5(_0x3d0c62, _0x1d7856), _0x300099);
}
function _0x57b771(_0x3c91e6) {
var _0x10f282;
var _0xc362bc = _0x3c91e6["length"];
var _0x41aff5 = _0xc362bc + 8;
var _0x24fc0a = (_0x41aff5 - _0x41aff5 % 64) / 64;
var _0x1c8987 = (_0x24fc0a + 1) * 16;
var _0x281eac = Array(_0x1c8987 - 1);
var _0x11a5cc = 0;
var _0x3f48ef = 0;
while (_0x3f48ef < _0xc362bc) {
_0x10f282 = (_0x3f48ef - _0x3f48ef % 4) / 4;
_0x11a5cc = _0x3f48ef % 4 * 8;
_0x281eac = _0x281eac | _0x3c91e6["charCodeAt"](_0x3f48ef) << _0x11a5cc;
_0x3f48ef++;
}
_0x10f282 = (_0x3f48ef - _0x3f48ef % 4) / 4;
_0x11a5cc = _0x3f48ef % 4 * 8;
_0x281eac = _0x281eac | 128 << _0x11a5cc;
_0x281eac = _0xc362bc << 3;
_0x281eac = _0xc362bc >>> 29;
return _0x281eac;
}
function _0x1b0e3e(_0x1bc183) {
var _0x2d342c = "",
_0x486522 = "",
_0x45875a,
_0x2a3b5e;
for (_0x2a3b5e = 0; _0x2a3b5e <= 3; _0x2a3b5e++) {
_0x45875a = _0x1bc183 >>> _0x2a3b5e * 8 & 255;
_0x486522 = "0" + _0x45875a["toString"](16);
_0x2d342c = _0x2d342c + _0x486522["substr"](_0x486522["length"] - 2, 2);
}
return _0x2d342c;
}
var _0x198c42 = Array();
var _0x556dd6, _0x3e947b, _0x217e9f, _0x8545c6, _0x3ed023, _0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7;
var _0x5153e3 = 7,
_0xa71763 = 12,
_0x509ea6 = 17,
_0x4288bc = 22;
var _0x7ad2f4 = 5,
_0x45d017 = 9,
_0x41614c = 14,
_0x354464 = 20;
var _0x307adc = 4,
_0x1cc902 = 11,
_0x5bb242 = 16,
_0x4ed1a4 = 23;
var _0x418318 = 6,
_0xb85eab = 10,
_0x2a7231 = 15,
_0x5cca29 = 21;
_0x198c42 = _0x57b771(_0x1b66b8);
_0x244b3a = 1732584193;
_0x47e9c7 = 4023233417;
_0x4f689f = 2562383102;
_0x27bcf7 = 271733878;
for (_0x556dd6 = 0; _0x556dd6 < _0x198c42["length"]; _0x556dd6 += 16) {
_0x3e947b = _0x244b3a;
_0x217e9f = _0x47e9c7;
_0x8545c6 = _0x4f689f;
_0x3ed023 = _0x27bcf7;
_0x244b3a = _0x30c4a8(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x5153e3, 3614090360);
_0x27bcf7 = _0x30c4a8(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0xa71763, 3905402710);
_0x4f689f = _0x30c4a8(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x509ea6, 606105819);
_0x47e9c7 = _0x30c4a8(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x4288bc, 3250441966);
_0x244b3a = _0x30c4a8(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x5153e3, 4118548399);
_0x27bcf7 = _0x30c4a8(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0xa71763, 1200080426);
_0x4f689f = _0x30c4a8(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x509ea6, 2821735955);
_0x47e9c7 = _0x30c4a8(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x4288bc, 4249261313);
_0x244b3a = _0x30c4a8(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x5153e3, 1770035416);
_0x27bcf7 = _0x30c4a8(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0xa71763, 2336552879);
_0x4f689f = _0x30c4a8(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x509ea6, 4294925233);
_0x47e9c7 = _0x30c4a8(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x4288bc, 2304563134);
_0x244b3a = _0x30c4a8(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x5153e3, 1804603682);
_0x27bcf7 = _0x30c4a8(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0xa71763, 4254626195);
_0x4f689f = _0x30c4a8(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x509ea6, 2792965006);
_0x47e9c7 = _0x30c4a8(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x4288bc, 1236535329);
_0x244b3a = _0x2145f8(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x7ad2f4, 4129170786);
_0x27bcf7 = _0x2145f8(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0x45d017, 3225465664);
_0x4f689f = _0x2145f8(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x41614c, 643717713);
_0x47e9c7 = _0x2145f8(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x354464, 3921069994);
_0x244b3a = _0x2145f8(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x7ad2f4, 3593408605);
_0x27bcf7 = _0x2145f8(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0x45d017, 38016083);
_0x4f689f = _0x2145f8(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x41614c, 3634488961);
_0x47e9c7 = _0x2145f8(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x354464, 3889429448);
_0x244b3a = _0x2145f8(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x7ad2f4, 568446438);
_0x27bcf7 = _0x2145f8(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0x45d017, 3275163606);
_0x4f689f = _0x2145f8(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x41614c, 4107603335);
_0x47e9c7 = _0x2145f8(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x354464, 1163531501);
_0x244b3a = _0x2145f8(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x7ad2f4, 2850285829);
_0x27bcf7 = _0x2145f8(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0x45d017, 4243563512);
_0x4f689f = _0x2145f8(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x41614c, 1735328473);
_0x47e9c7 = _0x2145f8(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x354464, 2368359562);
_0x244b3a = _0x311b76(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x307adc, 4294588738);
_0x27bcf7 = _0x311b76(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0x1cc902, 2272392833);
_0x4f689f = _0x311b76(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x5bb242, 1839030562);
_0x47e9c7 = _0x311b76(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x4ed1a4, 4259657740);
_0x244b3a = _0x311b76(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x307adc, 2763975236);
_0x27bcf7 = _0x311b76(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0x1cc902, 1272893353);
_0x4f689f = _0x311b76(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x5bb242, 4139469664);
_0x47e9c7 = _0x311b76(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x4ed1a4, 3200236656);
_0x244b3a = _0x311b76(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x307adc, 681279174);
_0x27bcf7 = _0x311b76(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0x1cc902, 3936430074);
_0x4f689f = _0x311b76(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x5bb242, 3572445317);
_0x47e9c7 = _0x311b76(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x4ed1a4, 76029189);
_0x244b3a = _0x311b76(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x307adc, 3654602809);
_0x27bcf7 = _0x311b76(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0x1cc902, 3873151461);
_0x4f689f = _0x311b76(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x5bb242, 530742520);
_0x47e9c7 = _0x311b76(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x4ed1a4, 3299628645);
_0x244b3a = _0x361b6d(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x418318, 4096336452);
_0x27bcf7 = _0x361b6d(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0xb85eab, 1126891415);
_0x4f689f = _0x361b6d(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x2a7231, 2878612391);
_0x47e9c7 = _0x361b6d(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x5cca29, 4237533241);
_0x244b3a = _0x361b6d(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x418318, 1700485571);
_0x27bcf7 = _0x361b6d(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0xb85eab, 2399980690);
_0x4f689f = _0x361b6d(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x2a7231, 4293915773);
_0x47e9c7 = _0x361b6d(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x5cca29, 2240044497);
_0x244b3a = _0x361b6d(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x418318, 1873313359);
_0x27bcf7 = _0x361b6d(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0xb85eab, 4264355552);
_0x4f689f = _0x361b6d(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x2a7231, 2734768916);
_0x47e9c7 = _0x361b6d(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x5cca29, 1309151649);
_0x244b3a = _0x361b6d(_0x244b3a, _0x47e9c7, _0x4f689f, _0x27bcf7, _0x198c42, _0x418318, 4149444226);
_0x27bcf7 = _0x361b6d(_0x27bcf7, _0x244b3a, _0x47e9c7, _0x4f689f, _0x198c42, _0xb85eab, 3174756917);
_0x4f689f = _0x361b6d(_0x4f689f, _0x27bcf7, _0x244b3a, _0x47e9c7, _0x198c42, _0x2a7231, 718787259);
_0x47e9c7 = _0x361b6d(_0x47e9c7, _0x4f689f, _0x27bcf7, _0x244b3a, _0x198c42, _0x5cca29, 3951481745);
_0x244b3a = _0x1eaf4b(_0x244b3a, _0x3e947b);
_0x47e9c7 = _0x1eaf4b(_0x47e9c7, _0x217e9f);
_0x4f689f = _0x1eaf4b(_0x4f689f, _0x8545c6);
_0x27bcf7 = _0x1eaf4b(_0x27bcf7, _0x3ed023);
}
var _0x35900a = _0x1b0e3e(_0x244b3a) + _0x1b0e3e(_0x47e9c7) + _0x1b0e3e(_0x4f689f) + _0x1b0e3e(_0x27bcf7);
return _0x35900a["toLowerCase"]();
}
function go(_0x30b50d) {
function _0x3dbf67() {
var _0x5f1114 = window["navigator"]["userAgent"],
_0x2ed046 = ["Phantom"];
for (var _0x1869b0 = 0; _0x1869b0 < _0x2ed046["length"]; _0x1869b0++) {
if (_0x5f1114["indexOf"](_0x2ed046) != -1) {
return true;
}
}
if (window["callPhantom"] || window["_phantom"] || window["Headless"] || window["navigator"]["webdriver"] || window["navigator"]["__driver_evaluate"] || window["navigator"]["__webdriver_evaluate"]) {
return true;
}
}
if (_0x3dbf67()) {
return;
}
var _0x26a47f = new Date();
function _0x3df5bc(_0x5da4a3, _0x2d77c8) {
var _0xad821a = _0x30b50d["chars"]["length"];
for (var _0x42a4ac = 0; _0x42a4ac < _0xad821a; _0x42a4ac++) {
for (var _0x250ad6 = 0; _0x250ad6 < _0xad821a; _0x250ad6++) {
var _0x5f1c4c = _0x2d77c8 + _0x30b50d["chars"]["substr"](_0x42a4ac, 1) + _0x30b50d["chars"]["substr"](_0x250ad6, 1) + _0x2d77c8;
if (hash(_0x5f1c4c) == _0x5da4a3) {
return ;
}
}
}
}
var _0x1d6c97 = _0x3df5bc(_0x30b50d["ct"], _0x30b50d["bts"]);
if (_0x1d6c97) {
var _0x5c31f9;
if (_0x30b50d["wt"]) {
_0x5c31f9 = parseInt(_0x30b50d["wt"]) > _0x1d6c97 ? parseInt(_0x30b50d["wt"]) - _0x1d6c97 : 500;
} else {
_0x5c31f9 = 1500;
}
setTimeout(function () {
document["cookie"] = _0x30b50d["tn"] + "=" + _0x1d6c97 + ";Max-age=" + _0x30b50d["vt"] + "; path = /";
location["href"] = location["pathname"] + location["search"];
}, _0x5c31f9);
} else {
alert("\u8BF7\u6C42\u9A8C\u8BC1\xE5\xA4\xB1\xE8\xB4\xA5");
}
}
go({
"bts": ["1605770555.059|0|DGK", "s4dADq0wDGWCiURT3yX7ds%3D"],
"chars": "AdFF3xaKjaNVFXqbiTdKR4",
"ct": "40ed0871cd9830417eda6370eef68d78",
"ha": "md5",
"tn": "__jsl_clearance_s",
"vt": "3600",
"wt": "1500"
});
分析及实现:简单解读后发现这段代码是调用了go方法并传入了一段参数,这段参数作用就是用于第二次生成cookie的,接下来就简单了,先利用正则将这段参数提取出来,再修改一下js代码;
下面这段代码目测应该是判断是否是爬虫用的,经过测试可以删除,不影响;
function _0x3dbf67() {
var _0x5f1114 = window["navigator"]["userAgent"],
_0x2ed046 = ["Phantom"];
for (var _0x1869b0 = 0; _0x1869b0 < _0x2ed046["length"]; _0x1869b0++) {
if (_0x5f1114["indexOf"](_0x2ed046) != -1) {
return true;
}
}
if (window["callPhantom"] || window["_phantom"] || window["Headless"] || window["navigator"]["webdriver"] || window["navigator"]["__driver_evaluate"] || window["navigator"]["__webdriver_evaluate"]) {
return true;
}
}
if (_0x3dbf67()) {
return;
}
再将这段设置cookie的代码修改,调用go方法后直接返回cookie
//原代码
setTimeout(function () {
document["cookie"] = _0x30b50d["tn"] + "=" + _0x1d6c97 + ";Max-age=" + _0x30b50d["vt"] + "; path = /";
location["href"] = location["pathname"] + location["search"];
}, _0x5c31f9);
} else {
alert("\u8BF7\u6C42\u9A8C\u8BC1\xE5\xA4\xB1\xE8\xB4\xA5");
//修改为
return _0x30b50d["tn"] + "=" + _0x1d6c97 + ";Max-age=" + _0x30b50d["vt"] + "; path = /";
最后删除js代码中的go方法调用,将js代码保存(另外需要注意的是网站第二次生成cookie的js代码有三种生成方式,需要用相同的方法将三段js代码分别修改保存);
先将之前得到的参数转为字典,再通过判断其中的参数ha,来使用对应的cookie生成代码
利用execjs模块传参执行js代码后得到最终的cookie,把前面已经获得的jsluid和最后得到的cookie参数携带去请求,得到响应正确内容。
import re
import execjs
import requests
import json
from requests.packages.urllib3.exceptions import InsecureRequestWarning
# 关闭ssl验证提示
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
}
url = 'https://www.yidaiyilu.gov.cn/xwzx/gnxw/87373.htm'
def get_page():
response = requests.get(url, headers=headers, verify=False)
return response
def get_parameter(response):
# 获取cookie参数jsluid
jsluid = response.headers.get('set-cookie').split(';')
# 提取js代码
js_clearance = re.findall('cookie=(.*?);location.href=', response.text)
# 执行后获得cookie参数js_clearance
result = execjs.eval(js_clearance).split(';')
global headers
headers.update({'cookie': jsluid + '; ' + result})
response = get_page()
# 提取参数并转字典
parameter = json.loads(re.findall(r'};go\((.*?)\)</script>', response.text))
js_file = ''
# 判断cookie生成方式
if parameter['ha'] == 'sha1':
js_file = 'sha1.js'
elif parameter['ha'] == 'sha256':
js_file = 'sha256.js'
elif parameter['ha'] == 'md5':
js_file = 'md5.js'
return parameter, js_file, jsluid
def get_cookie(param, file):
parameter = {
"bts": param['bts'],
"chars": param['chars'],
"ct": param['ct'],
"ha": param['ha'],
"tn": param['tn'],
"vt": param['vt'],
"wt": param['wt']
}
with open(file, 'r') as f:
js = f.read()
cmp = execjs.compile(js)
# 执行js代码传入参数
clearance = cmp.call('go', parameter)
return clearance
def run():
response = get_page()
parameter, js_file, jsluid = get_parameter(response)
clearance = get_cookie(parameter, js_file)
global headers
headers.update({'cookie': jsluid + '; ' + clearance})
html = requests.get(url, headers=headers, verify=False)
print(html.content.decode())
run()
suchocolate 发表于 2020-11-19 09:49
我这可以爬到啊
??????你这怎么成功的amazing YunGuo 发表于 2020-11-19 17:03
闲来无事分析了一下,这个cookie生成方式不难,从浏览器分析来看,总共进行了三次请求,前两次都是为了生成 ...
巨牛逼,谢谢大佬,不过二次的js代码后面我就不太懂了,还得参悟下 本帖最后由 用编程搞垮道盟 于 2020-11-19 23:40 编辑
YunGuo 发表于 2020-11-19 17:03
闲来无事分析了一下,这个cookie生成方式不难,从浏览器分析来看,总共进行了三次请求,前两次都是为了生成 ...
大佬那几个js文件(md5.jssha1.js sha256.js)是什么?另外就是那个go函数的调用怎么判断它该带的参数就是js代码最后那个go(())里面的东西的呀{:10_297:}大佬不嫌弃的话就拜托解答一下啦 用编程搞垮道盟 发表于 2020-11-19 23:14
巨牛逼,谢谢大佬,不过二次的js代码后面我就不太懂了,还得参悟下
这段js代码也不是很复杂,就是要先把代码解混淆,不然就看不懂,既然知道这段代码是用来生成cookie的,那肯定就有为浏览器设置cookie的代码段(可以在代码中直接搜索cookie),找到这段代码就差不多成功了一半了,不需要知道js具体生成方式,将设置cookie的代码改为调用后返回结果就行了。只需要懂一点js就行。其他的就是需要注意网站有三套cookie生成方式,需要多次请求分别得到这三套js代码,至于怎样知道代码不同,可以通过代码最后的go方法传入的参数ha值判断使用的是哪一套代码,有三个值,分别是md5/sha1/sha256,代表三套不同的cookie生成代码。 YunGuo 发表于 2020-11-19 23:42
这段js代码也不是很复杂,就是要先把代码解混淆,不然就看不懂,既然知道这段代码是用来生成cookie的,那 ...
大佬用的啥解混淆工具?我咋解不出来呢啥cookie这个单词啥的在我这儿根本就还是看不懂的一堆代码 用编程搞垮道盟 发表于 2020-11-19 23:54
大佬用的啥解混淆工具?我咋解不出来呢啥cookie这个单词啥的在我这儿根本就还是看不懂的一堆代码
ob混淆专解测试版V0.1
http://tool.yuanrenxue.com/decode_obfuscator
用编程搞垮道盟 发表于 2020-11-19 23:54
大佬用的啥解混淆工具?我咋解不出来呢啥cookie这个单词啥的在我这儿根本就还是看不懂的一堆代码
一个在线解混淆的,发了网址还在审核。关于你问的怎么知道带的参数就是go里面的东西,很简单,首先懂一点js的就知道最后面的这个go就是调用了go方法并传入了一个参数(这个参数在js中的数据类型属于object对象),如果实在不确定,可以看看上面的go函数代码,这个函数传入的只有一个_0x30b50d,通过搜索_0x30b50d,就能发现第一次用到这个参数的地方var _0xad821a = _0x30b50d["chars"]["length"]; 这段代码定义了一个_0xad821a,并获取了_0x30b50d中chars的长度,你再回到代码最后看看,传入的参数中是不是有一个chars,你再继续找其他的也能找到,包括最后设置cookie的那段代码,也用到了这个参数_0x30b50d["tn"]、_0x30b50d["vt"],由此得知传入的肯定是生成cookie用到的参数。 YunGuo 发表于 2020-11-20 00:26
一个在线解混淆的,发了网址还在审核。关于你问的怎么知道带的参数就是go里面的东西,很简单,首先懂一点 ...
其实这个解混淆不完整,这段代码还是混淆的,不过不影响解读后面的设置cookie的代码,只是变量名和函数名是混淆的(每次得到的代码变量名和函数名不一样)
页:
[1]
2