|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
大佬们,这个为什么爬取不到数据呀
- import requests
- from lxml import html
- etree = html.etree
- url = 'http://61.163.88.227:8006/hwsq.aspx?sr=0nkRxv6s9CTRMlwRgmfFF6jTpJPtAv87'
- data = {'ctl00$ContentLeft$menuDate1$TextBox11': '2023-03-04'}
- headers = {'User-Agent': 'Mozilla/5.0'}
- resp = requests.get(url, data=data, headers=headers).text
- tree = etree.HTML(resp)
- title = tree.xpath('/html/body/form/div[3]/table/tbody/tr/td/table/tbody/tr/td[2]/div/table[4]/tbody/tr/td[1]/table/tbody/tr[2]/td[2]/text()')
- print(title)
复制代码
本帖最后由 isdkz 于 2023-3-2 22:06 编辑
我帮你改了一下代码,有疑惑再问,之前的那些参数不全,无法获取到 ajax 的响应
要安装 texttable 这个库:
- import requests
- import texttable as tt
- from lxml import html
- from urllib.request import unquote
- etree = html.etree
- url = 'http://61.163.88.227:8006/hwsq.aspx?sr=0nkRxv6s9CTRMlwRgmfFF6jTpJPtAv87'
- headers = {'User-Agent': 'Mozilla/5.0'}
- data = {
- 'ctl00$ContentLeft$menuDate1$TextBox11': '2023-03-04',
- '__ASYNCPOST': 'true',
- 'ctl00$ScriptManager1': 'ctl00$ScriptManager1|ctl00$ContentLeft$Button1'
- }
- sess = requests.Session()
- sess.headers = headers
- resp = sess.get(url).text
- tree = etree.HTML(resp)
- data['__VIEWSTATE'] = tree.xpath('//*[@id="__VIEWSTATE"]/@value')[0]
- data['__EVENTVALIDATION'] = tree.xpath('//*[@id="__EVENTVALIDATION"]/@value')[0]
- resp = sess.post(url, data=data).text
- '''
- with open('tmp.html', 'w') as f:
- print(resp, file=f)
- '''
- tree = etree.HTML(resp)
- # /html/body/table[4]/tbody/tr/td[1]/table/tbody/tr[2]/td[2]
- tds = tree.xpath('//*[@id="ContentRight_divfff"]/table[2]/tr/td')
- for td in tds:
- trs = td.xpath('./table/tr')
- header = trs[0].xpath('./td/text()')
- table = tt.Texttable()
- table.header(header)
- for row in trs[1:]:
- table.add_row(row.xpath('./td/text()'))
- print(table.draw())
- print()
复制代码
|
|