yuyang123999 发表于 2021-5-26 16:02:04

【求助】爬虫相关,如何用正则匹配获取想要的信息


url = 'https://appgallery.huawei.com/#/app/C10149151'
res = urllib.request.Request(url)
res.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36')
req = urllib.request.urlopen(res)
html = req.read().decode('utf-8')
result = r'<div data-v-14ed2345 class="info_val">(\d\.\d{1,}\.\d{3,})</div>'
version = re.findall(result,html)
print('华为的当前版本是:',version)
print(' ------------------------------------------------\n')


这段代码想要获取游戏的版本号(例如版本号:10.5.111),其它平台的能够成功获取,但是这个平台却获取不到
求助帮忙解决

suchocolate 发表于 2021-5-26 18:19:03

这个网页用了ajax,版本信息在另外的url里,不过这个url好像是用js算出来的,可能不太好获得:import requests

url = 'https://web-drcn.hispace.dbankcloud.cn/uowap/index?method=internal.getTabDetail&serviceType=20&reqPageNum=1&maxResults=25&uri=app%7CC10149151&shareTo=&currentUrl=https%253A%252F%252Fappgallery.huawei.com%252F%2523%252Fapp%252FC10149151&accessId=&appid=C10149151&zone=&locale=en'
headers= {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'}
r = requests.get(url, headers=headers)
print(r.json()['layoutData']['dataList']['versionName'])
页: [1]
查看完整版本: 【求助】爬虫相关,如何用正则匹配获取想要的信息