re爬取jpg和png格式有个疑问
import requests,bs4import re
pzn=input('请输入网址')
headers = {
'authority': 'cn.apo.com',
'cache-control': 'max-age=0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'accept-language': 'zh-CN,zh;q=0.9',
'cookie': '__guid=5722465.4514138450663120400.1622527541951.2417; _ga=GA1.2.2073280731.1622543356; _gid=GA1.2.939110097.1626079866; SESSION=d12919a9-f80b-42df-8503-ee05e35e9fe7; csrfToken=a98f0a8c41e3e93909c7317868223b59; Hm_lvt_af0f4729a756b47aeb8f98097a94a1e1=1625815401,1626079865,1626142090,1626166025; monitor_count=484; Hm_lpvt_af0f4729a756b47aeb8f98097a94a1e1=1626167242',
'if-none-match': '^\\^02c3749931be82f7276467b122f391d4c^\\^',
}
response = requests.get(pzn, headers=headers)
xqq=1
bsp=bs4.BeautifulSoup(response.text,'html.parser')
bsp=str(bsp.find_all('div',class_="introduction-body"))
b=re.findall(r'https:.*?.jpg',bsp)
for url in b:
r = requests.get(url, stream=True)
b='xq'+str(xqq)+'.jpg'
b='xq'+str(xqq)+'.jpg'
with open(b, 'wb') as fd:
for chunk in r.iter_content():
fd.write(chunk)
xqq+=1
b=re.findall(r'https:.*?.png',bsp)
for url in b:
r = requests.get(url, stream=True)
b='xq'+str(xqq)+'.png'
b='xq'+str(xqq)+'.png'
with open(b, 'wb') as fd:
for chunk in r.iter_content():
fd.write(chunk)
xqq+=1
b=re.findall(r'https:.*?.png',bsp)
for url in b:
if url[-9:] == 'large.jpg':
r = requests.get(url, stream=True)
b='zt'+str(xqq)+'.png'
with open(b, 'wb') as fd:
for chunk in r.iter_content():
fd.write(chunk)
xqq+=1
我用re爬这个网址的商品详情图的时候,有少数商品会出现一张jpg格式,一张PNG格式,例如:1.png 2.jpg 3.jpg4.png
但是我这个代码只能爬其中的一个格式在爬其他格式:1.jpg2.jpg3.png4.png
想知道如何修改成按顺序爬下来 网站URL发出来 re可以用或 *.jpg | *.png suchocolate 发表于 2021-7-29 01:04
网站URL发出来
https://cn.apo.com/ wp231957 发表于 2021-7-29 07:21
re可以用或 *.jpg | *.png
能具体说说吗不是很明白 wp231957 发表于 2021-7-29 07:21
re可以用或 *.jpg | *.png
我明白了谢谢
页:
[1]