|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
新人第一条虫。有不足的地方请指正。- import requests
- import re
- def getDouTu(page):
- url = requests.get('https://www.doutula.com/article/list/?page={}'.format(page)).text
- headers = {
- "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like\
- Gecko) Chrome/55.0.2883.87 Safari/537.36"}
- reg = r'data-original="(.*?)".*?alt="(.*?)"'
- reg = re.compile(reg, re.S)
- imagesList = re.findall(reg, url)
- for i in imagesList:
- image_url = i[0]
- image_title = i[1]
- print(image_url, image_title)
- if image_url[-1] == "g":
- response = requests.get(image_url)
- filename = '%s.jpg' % image_title
- with open(filename, "wb") as jpg:
- jpg.write(response.content)
- elif image_url[-1] == "f":
- response = requests.get(image_url)
- filename = '%s.gif' % image_title
- with open(filename, "wb") as gif:
- gif.write(response.content)
- return
- for i in range(1,1000):
- getDouTu(i)
复制代码
|
评分
-
查看全部评分
|