textnow 爬虫登录的问题,Python交流,编程语言专区,鱼C论坛

unsinx 发表于 2020-7-24 21:03:14

textnow 爬虫登录的问题

本帖最后由 unsinx 于 2020-7-25 09:55 编辑

爬虫初学者。想使用爬虫登录 textnow，然后服务器定时发消息，防止号码被回收。目前通过浏览器抓包得到了登录的 post 请求，见下图.

https://i.loli.net/2020/07/24/3nacAHimZQy9hf1.png
https://i.loli.net/2020/07/24/W3cYnpesH8i4NjL.png
请求头除了 cookie 外，其他大部分一致。但是使用相同的请求后，登录不成功,返回下面的结果
https://i.loli.net/2020/07/24/LnsZJwRgfyQ7O9K.png
目前猜测是上图中headers中没有包含 x-csrf-token和x-th-captcha-v3。
点击登录按钮后，浏览器发送的第一个请求如下图，该请求个人猜测是被用来设置x-th-captcha-v3的，
https://i.loli.net/2020/07/25/lIV6va9Wg71iXo2.png
https://i.loli.net/2020/07/25/emsEud5byB2r9HN.png
其中requestPayload不知道怎么提取，它包含了未知字符，所以无法使用requests.post方式提交request-payload。
关于csrf-token，不前并不知道是如何获取的。
请大佬来看看，如何才能使用python爬虫模拟登录{:9_241:}

suchocolate 发表于 2020-7-25 10:03:02

从截图里看不出你的header

unsinx 发表于 2020-7-25 10:30:00

suchocolate 发表于 2020-7-25 10:03
从截图里看不出你的header

https://i.loli.net/2020/07/25/gbLOe8F3jmQHDCU.png
https://i.loli.net/2020/07/25/j7wUtqmEzOhbZJ2.png
目前猜测问题可能有两个，一个是 csrf-token 的问题，这个目前并不知道是如何生成的。另一个是 x-tn-captcha-v3，但是它的 post 请求 request payload 提交了很多内容，包含未知字符，无法复制下来，而且目前我还不清楚如何使用 requests.post 通过什么参数来提交这个 request payload.

suchocolate 发表于 2020-7-25 10:43:35

本帖最后由 suchocolate 于 2020-7-25 10:45 编辑

unsinx 发表于 2020-7-25 10:30
目前猜测问题可能有两个，一个是 csrf-token 的问题，这个目前并不知道是如何生成的。另一个是 x-tn- ...

从代码的截图看不出request带了header。
还是发文本代码吧，不要截图了。

unsinx 发表于 2020-7-25 10:49:02

suchocolate 发表于 2020-7-25 10:43
从代码的截图看不出request带了header。
还是发文本代码吧，不要截图了。

抱歉{:10_250:}
import requests
import json

headers = {
'cache-control': 'no-cache',
'accept': 'application/json, text/plain, */*; q=0.01',
'content-type': 'application/json;charset=utf-8',
'cookie': '_ga=GA1.2.1232791235.1560919095; G_ENABLED_IDPS=google; _fbp=fb.1.1560919111593.2093970008; UserDidVisitApp=true; __cfduid=dce5092d9271875c09a8b3e9f73cac0f11570075385; tatari-session-cookie=690d0e2c-0fbf-ef35-2487-8498d30ea678; d7s_uid=k9l9yyrdxonmmq; tatari-cookie-test=26184446; stc117823=env:1588256264%7C20200531141744%7C20200430144744%7C1%7C1073241:20210430141744|uid:1565917487242.1073637811.2107635.117823.779348737.:20210430141744|srchist:1073241%3A1588256264%3A20200531141744:20210430141744|tsa:1588256264963.238488725.49476194.430271162756376.8:20200430144744; _gaexp=GAX1.2.MYMMPqR8QfGrxDn3HtIglg.18508.0; _gid=GA1.2.1430836992.1594873661; puntCookie=true; __zlcmid=zDjHsiZOWFKFSb; tnExp%3AfilteringExperiment2=0; tnExp%3Av3DeviceExperiment2=0; __ssid=a0aa129aa072111096ed1d407ad5c38; FirehoseSession-messaging=true; language=zh; cxz.kerviaNotifPermissionsGranted=true; XSRF-TOKEN=aamnWtsV-86RLjkVaVqz9bcoODC1kurw-50M',
'origin': 'https://www.textnow.com',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9;zh',
'Connection':'keep-alive',
#'x-tn-captcha-v3':'03AGdBq25djUkzVxhpuSoWIo-TmMgB2yjQEjJfulxb7b16lRdyL2nzlXYYgSHDFRsOha0v0WWcgja-5hqI__3PVbr-lWC1gSgG76rLPXN2lDBfkha8auuGAftB0G9MWmptRkXGm1Zf_HZwNcijsIdOWpHSoGi2PM02NJbTJJddazscgrwyCjbrEneXzFGx3LwFQzbDMOYVqB6qOJo9moAPx7wtEQujeP7v8Xi9Xv3EqPr_HTuaoGt0t96o48xQ2UIchCg2SdkzbPnZLr4EkNJIlOARlP0wTmWThAWs4GHbvIbJCLocCHqgTvqiKNwATXnxq3zfgPMIQRw3KdBf5ES3Ze7bL24N89NUiPED25fzJiz0pZTgAre_ITBmDF5fQXD6rlZq7_5KFwbF',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36',
#'x-csrf-token': 'rb5QMkbZ-HSa0lMSf4xnPQkX981Yyv9rQvQE',
#'x-requested-with': 'XMLHttpRequest',
}

s = requests.session()
s.headers.update(headers)
# s.get('https://www.textnow.com/api/location')
# s.get('https://www.textnow.com/api/client-features/version')
# s.get('https://www.textnow.com/api/client-features/features')
# s.get('https://www.facebook.com/x/oauth/status?client_id=134162068005&input_token&origin=1&redirect_uri=https://www.textnow.com/login&sdk=joey&wants_cookie_data=true', proxies=proxy)
#s.post('https://www.google.com/recaptcha/api2/reload?k=6Ld5K0IUAAAAABGVv54NtC-G_0ygR8vF1rTrwLj2')

resp = s.get('http://www.textnow.com/login')
# print(resp.content.decode('ISO-8859-1'))
# print(s.headers)
# print(s.cookies)

url_1 = 'https://www.textnow.com/api/sessions'
data = {
"json":{
"remember": False,
"username":"保密",
"password":"保密"
}
}
# print(json.dumps(data))
resp = s.post(url_1, data=json.dumps(data))
print(resp.text)
s.close()

unsinx 发表于 2020-7-25 10:50:49

suchocolate 发表于 2020-7-25 10:43
从代码的截图看不出request带了header。
还是发文本代码吧，不要截图了。

import requests
import json

headers = {
'cache-control': 'no-cache',
'accept': 'application/json, text/plain, */*; q=0.01',
'content-type': 'application/json;charset=utf-8',
'cookie': '_ga=GA1.2.1232791235.1560919095; G_ENABLED_IDPS=google; _fbp=fb.1.1560919111593.2093970008; UserDidVisitApp=true; __cfduid=dce5092d9271875c09a8b3e9f73cac0f11570075385; tatari-session-cookie=690d0e2c-0fbf-ef35-2487-8498d30ea678; d7s_uid=k9l9yyrdxonmmq; tatari-cookie-test=26184446; stc117823=env:1588256264%7C20200531141744%7C20200430144744%7C1%7C1073241:20210430141744|uid:1565917487242.1073637811.2107635.117823.779348737.:20210430141744|srchist:1073241%3A1588256264%3A20200531141744:20210430141744|tsa:1588256264963.238488725.49476194.430271162756376.8:20200430144744; _gaexp=GAX1.2.MYMMPqR8QfGrxDn3HtIglg.18508.0; _gid=GA1.2.1430836992.1594873661; puntCookie=true; __zlcmid=zDjHsiZOWFKFSb; tnExp%3AfilteringExperiment2=0; tnExp%3Av3DeviceExperiment2=0; __ssid=a0aa129aa072111096ed1d407ad5c38; FirehoseSession-messaging=true; language=zh; cxz.kerviaNotifPermissionsGranted=true; XSRF-TOKEN=aamnWtsV-86RLjkVaVqz9bcoODC1kurw-50M',
'origin': 'https://www.textnow.com',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9;zh',
'Connection':'keep-alive',
#'x-tn-captcha-v3':'03AGdBq25djUkzVxhpuSoWIo-TmMgB2yjQEjJfulxb7b16lRdyL2nzlXYYgSHDFRsOha0v0WWcgja-5hqI__3PVbr-lWC1gSgG76rLPXN2lDBfkha8auuGAftB0G9MWmptRkXGm1Zf_HZwNcijsIdOWpHSoGi2PM02NJbTJJddazscgrwyCjbrEneXzFGx3LwFQzbDMOYVqB6qOJo9moAPx7wtEQujeP7v8Xi9Xv3EqPr_HTuaoGt0t96o48xQ2UIchCg2SdkzbPnZLr4EkNJIlOARlP0wTmWThAWs4GHbvIbJCLocCHqgTvqiKNwATXnxq3zfgPMIQRw3KdBf5ES3Ze7bL24N89NUiPED25fzJiz0pZTgAre_ITBmDF5fQXD6rlZq7_5KFwbF',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36',
#'x-csrf-token': 'rb5QMkbZ-HSa0lMSf4xnPQkX981Yyv9rQvQE',
#'x-requested-with': 'XMLHttpRequest',
}

s = requests.session()
s.headers.update(headers)
# s.get('https://www.textnow.com/api/location')
# s.get('https://www.textnow.com/api/client-features/version')
# s.get('https://www.textnow.com/api/client-features/features')
# s.get('https://www.facebook.com/x/oauth/status?client_id=134162068005&input_token&origin=1&redirect_uri=https://www.textnow.com/login&sdk=joey&wants_cookie_data=true', proxies=proxy)
#s.post('https://www.google.com/recaptcha/api2/reload?k=6Ld5K0IUAAAAABGVv54NtC-G_0ygR8vF1rTrwLj2')

resp = s.get('http://www.textnow.com/login')
# print(resp.content.decode('ISO-8859-1'))
# print(s.headers)
# print(s.cookies)

url_1 = 'https://www.textnow.com/api/sessions'
data = {
"json":{
"remember": False,
"username":"保密",
"password":"保密"
}
}
# print(json.dumps(data))
resp = s.post(url_1, data=json.dumps(data))
print(resp.text)
s.close()

unsinx 发表于 2020-7-25 10:57:22

suchocolate 发表于 2020-7-25 10:43
从代码的截图看不出request带了header。
还是发文本代码吧，不要截图了。

发代码总是要审核（试试这个回复需不需要审核）

unsinx 发表于 2020-7-25 11:05:36

suchocolate 发表于 2020-7-25 10:43
从代码的截图看不出request带了header。
还是发文本代码吧，不要截图了。

发代码会要求审核，还是先发截图吧
https://i.loli.net/2020/07/25/juyrdFsEH8p7wML.png

suchocolate 发表于 2020-7-25 11:33:30

本帖最后由 suchocolate 于 2020-7-25 11:35 编辑

unsinx 发表于 2020-7-25 11:05
发代码会要求审核，还是先发截图吧{:10_277:}

nahongyan1997 发表于 2020-7-25 18:21:58

unsinx 发表于 2020-7-25 10:50
import requests
import json

这段：
data = {
"json":{
"remember": False,
"username":"保密",
"password":"保密"
}
}

改成：

data = {"remember": False,
"username":"保密",
"password":"保密"}

unsinx 发表于 2020-7-25 18:35:14

nahongyan1997 发表于 2020-7-25 18:21
这段：

之前试过，这样还是不行

页: [1]

鱼C论坛's Archiver

textnow 爬虫登录的问题