[已解决]爬虫翻页问题？

kelby · 发表于 2019-7-15 10:55:20

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

本帖最后由 kelby 于 2020-11-8 10:48 编辑

……

最佳答案

月排行榜 / 总排行榜

空青

2019-7-15 11:20:03

点击翻页的时候url类似http://www.example.com/xxx?page_a01，http://www.example.com/xxx?page_a02这样的，直接去遍历吧。
既然你说了，第一页地址最后面是a01,第二页是a02,a03...a20...，那我可以理解你这个总页数没有超过100，如果是a1,a2....a99，可以直接遍历，但是a后面的数字被填充到2位，如果不足两位，前面的用0填充。

举个例子，比如现在有20页

第一种方法，在循环中加入判断，如果页数在10以内(不含10)，前面加个0，否则不做改变

import requests
base_url = 'http://www.example.com/xxx?page_a{}'
for page in range(1,21):
if len(str(page)) == 1:
url = base_url.format('0' + str(page))
else:
url = base_url.format(str(page))
try:
r = requests.get(url)
r.encoding = r.apparent_encoding
print(r.text)
except Exception as e:
print(e)

复制代码

第二种方法，使用zfil()函数进行填充

import requests
base_url = 'http://www.example.com/xxx?page_a%s'
for page in range(1,21):
page = str(page).zfill(2)
url = base_url % page
try:
r = requests.get(url)
r.encoding=r.apparent_encoding
except Exception as e:
print(e)

复制代码

好了，希望对你有帮助。

跳转到最佳答案楼层

空青 · 发表于 2019-7-15 11:20:03

点击翻页的时候url类似http://www.example.com/xxx?page_a01，http://www.example.com/xxx?page_a02这样的，直接去遍历吧。
既然你说了，第一页地址最后面是a01,第二页是a02,a03...a20...，那我可以理解你这个总页数没有超过100，如果是a1,a2....a99，可以直接遍历，但是a后面的数字被填充到2位，如果不足两位，前面的用0填充。

举个例子，比如现在有20页

第一种方法，在循环中加入判断，如果页数在10以内(不含10)，前面加个0，否则不做改变

import requests
base_url = 'http://www.example.com/xxx?page_a{}'
for page in range(1,21):
if len(str(page)) == 1:
url = base_url.format('0' + str(page))
else:
url = base_url.format(str(page))
try:
r = requests.get(url)
r.encoding = r.apparent_encoding
print(r.text)
except Exception as e:
print(e)

复制代码

第二种方法，使用zfil()函数进行填充

import requests
base_url = 'http://www.example.com/xxx?page_a%s'
for page in range(1,21):
page = str(page).zfill(2)
url = base_url % page
try:
r = requests.get(url)
r.encoding=r.apparent_encoding
except Exception as e:
print(e)

复制代码

好了，希望对你有帮助。

chxchxkkk · 发表于 2019-7-15 12:17:43

replace('a01', 'a02')

空青 · 发表于 2019-7-15 14:05:19

import requests

base_url = 'http://www.example.com/xxx?page_a%s'
for page in range(1,21):
page = str(page).zfill(2)
url = base_url % page
#print(url)
try:
      r = requests.get(url)
      r.encoding=r.apparent_encoding
except Exception as e:
      print(e)

账号		自动登录	找回密码
密码			立即注册

[已解决]爬虫翻页问题？

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块