问个爬虫问题
本帖最后由 basketmn 于 2021-8-9 14:06 编辑import requests
import re
from lxml import etree
url='https://www.qiushibaike.com/video/'
headers={'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0'
}
response=requests.get(url=url,headers=headers)
result=etree.HTML(response.text)
#tupian=re.findall(r'<div class="thumb">.*?<img src="(.*?)" alt=.*?</div>',response.text,re.S)
tupian=result.xpath('//video[@controls="controls"]/source/@src')
print(tupian)
for img_tupian in tupian:
video_url='https:'+img_tupian
shipin=requests.get(url=video_url,headers=headers)
print(shipin)
with open('.\','wb') as f:
f.write(shipin.content)
各位大佬,这个被反爬了,返回response ,怎么解决 没被反扒啊,response,表示请求正常,200是状态码,你这个代码就是open那里有点问题,可以改成with open('./'+img_tupian.split('/')[-1],'wb') 本帖最后由 basketmn 于 2021-8-9 14:39 编辑
2012277033 发表于 2021-8-9 14:18
没被反扒啊,response,表示请求正常,200是状态码,你这个代码就是open那里有点问题,可以改成
路径没学好啊{:5_100:},要把文件存储再好好看看。
我改了一下还是没东西啊! 2012277033 发表于 2021-8-9 14:18
没被反扒啊,response,表示请求正常,200是状态码,你这个代码就是open那里有点问题,可以改成
谢谢大佬!好了 basketmn 发表于 2021-8-9 14:31
路径没学好啊,要把文件存储再好好看看。
我改了一下还是没东西啊!
我这边跑下来正常的,看下你的文件夹是否有写入权限吧
import requests
import re
from lxml import etree
url='https://www.qiushibaike.com/video/'
headers={'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0'
}
response=requests.get(url=url,headers=headers)
result=etree.HTML(response.text)
#tupian=re.findall(r'<div class="thumb">.*?<img src="(.*?)" alt=.*?</div>',response.text,re.S)
tupian=result.xpath('//video[@controls="controls"]/source/@src')
print(tupian)
for img_tupian in tupian:
video_url='https:'+img_tupian
shipin=requests.get(url=video_url,headers=headers)
print(shipin)
with open('./'+img_tupian.split('/')[-1],'wb') as f:
f.write(shipin.content)
你代码没问题,保存路径原因
url = 'https://www.qiushibaike.com/video/'
headers = {'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0'
}
response = requests.get(url=url, headers=headers)
result = etree.HTML(response.text)
#tupian=re.findall(r'<div class="thumb">.*?<img src="(.*?)" alt=.*?</div>',response.text,re.S)
tupian = result.xpath('//video[@controls="controls"]/source/@src')
print(tupian)
for img_tupian in tupian:
video_url = 'https:'+img_tupian
shipin = requests.get(url=video_url, headers=headers)
file_name = img_tupian.split("/")[-1]
with open(f"./video/{file_name}", 'wb') as f:
f.write(shipin.content)
print(f"{file_name} 下载完毕!")
https://static01.imgkr.com/temp/c7babf5625c34962bee16cb865a01147.jpg
页:
[1]