python 爬虫问题求助
因为网页上存在分页,观察了下分页是ddlpage发生了变化,可是下边的代码爬取时还是只爬取了一页,求个大神指点import requests
from bs4 import BeautifulSoup
import re
import time
def gethtml():
url ="https://www.zjgrc.com/posSearchRslt.aspx?textPosKey=沙钢集团"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
}
for i in range(0,2):
data = {
"ddlPage": 0,
"ddlPage": 1
}
r = requests.post(url, data=data)
time.sleep(2)
print(r.text)
print("结束")
if __name__ == '__main__':
gethtml() for i in range(0,2):
data = {
"ddlPage": i
} 路神 发表于 2021-5-22 21:21
for i in range(0,2):
data = {
"ddlPage": i
不行,我最开始也是这么写的,爬下来还是只有一页 代码小白liu 发表于 2021-5-22 21:25
不行,我最开始也是这么写的,爬下来还是只有一页
其他参数也要填 路神 发表于 2021-5-22 21:50
其他参数也要填
方便的话给点代码吧 代码小白liu 发表于 2021-5-22 22:02
方便的话给点代码吧
import requests
import re
url = 'https://www.zjgrc.com/posSearchRslt.aspx?textPosKey=%E6%B2%99%E9%92%A2%E9%9B%86%E5%9B%A2'
headers = {
'User-Agent': 'Mozilla/5.0',
}
# 第一页
res = requests.get(url, headers=headers)
# 获取翻页参数
state = re.findall('__VIEWSTATE" value="(.*?)" />', res.text)
state_generator = re.findall('__VIEWSTATEGENERATOR" value="(.*?)" />', res.text)
event_validation = re.findall('__EVENTVALIDATION" value="(.*?)" />', res.text)
# 翻页(第二页)
data = {
'__EVENTTARGET': 'lbNext',
'__EVENTARGUMENT': '',
'__LASTFOCUS': '',
'__VIEWSTATE': state,
'__VIEWSTATEGENERATOR': state_generator,
'__EVENTVALIDATION': event_validation,
'hfKey': '沙钢集团',
'txtDw': '',
'txtDw_TextBoxWatermarkExtender_ClientState': '',
'txtPos': '',
'txtPos_TextBoxWatermarkExtender_ClientState': '',
'ddlPage': 0
}
res1 = requests.post(url, headers=headers, data=data)
print(res1.text)
页:
[1]