|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
- import requests,bs4
- import re
- pzn=input('请输入网址')
- headers = {
- 'authority': 'cn.apo.com',
- 'cache-control': 'max-age=0',
- 'upgrade-insecure-requests': '1',
- 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
- 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
- 'sec-fetch-site': 'same-origin',
- 'sec-fetch-mode': 'navigate',
- 'sec-fetch-user': '?1',
- 'sec-fetch-dest': 'document',
- 'accept-language': 'zh-CN,zh;q=0.9',
- 'cookie': '__guid=5722465.4514138450663120400.1622527541951.2417; _ga=GA1.2.2073280731.1622543356; _gid=GA1.2.939110097.1626079866; SESSION=d12919a9-f80b-42df-8503-ee05e35e9fe7; csrfToken=a98f0a8c41e3e93909c7317868223b59; Hm_lvt_af0f4729a756b47aeb8f98097a94a1e1=1625815401,1626079865,1626142090,1626166025; monitor_count=484; Hm_lpvt_af0f4729a756b47aeb8f98097a94a1e1=1626167242',
- 'if-none-match': '^\\^02c3749931be82f7276467b122f391d4c^\\^',
- }
- response = requests.get(pzn, headers=headers)
- xqq=1
- bsp=bs4.BeautifulSoup(response.text,'html.parser')
- bsp=str(bsp.find_all('div',class_="introduction-body"))
- b=re.findall(r'https:.*?.jpg',bsp)
- for url in b:
- r = requests.get(url, stream=True)
- b='xq'+str(xqq)+'.jpg'
- b='xq'+str(xqq)+'.jpg'
- with open(b, 'wb') as fd:
- for chunk in r.iter_content():
- fd.write(chunk)
- xqq+=1
- b=re.findall(r'https:.*?.png',bsp)
- for url in b:
- r = requests.get(url, stream=True)
- b='xq'+str(xqq)+'.png'
- b='xq'+str(xqq)+'.png'
- with open(b, 'wb') as fd:
- for chunk in r.iter_content():
- fd.write(chunk)
- xqq+=1
- b=re.findall(r'https:.*?.png',bsp)
- for url in b:
- if url[-9:] == 'large.jpg':
- r = requests.get(url, stream=True)
- b='zt'+str(xqq)+'.png'
- with open(b, 'wb') as fd:
- for chunk in r.iter_content():
- fd.write(chunk)
- xqq+=1
复制代码
我用re爬这个网址的商品详情图的时候,有少数商品会出现一张jpg格式,一张PNG格式,例如:1.png 2.jpg 3.jpg 4.png
但是我这个代码只能爬其中的一个格式在爬其他格式:1.jpg 2.jpg 3.png 4.png
想知道如何修改成按顺序爬下来
|
|