新手学习爬虫，啥也不懂，就是复制链接，但是不成功！希望高手帮忙解疑答惑！,Python交流,编程语言专区,鱼C论坛

然小兆同学 发表于 2022-7-14 09:44:02

新手学习爬虫，啥也不懂，就是复制链接，但是不成功！希望高手帮忙解疑答惑！

python爬虫自动爬取百度图片
https://fishc.com.cn/thread-200086-1-1.html
(出处: 鱼C论坛)

为什么我用这个代码，他会创建“pics”文件夹，也会在“pics”文件夹里面创建我搜索的“小甲鱼”的文件夹，但是在“小甲鱼”文件夹里面只有一个图片，命名为0.jpg.而且我也打不开，请问这是为什么？

以上链接的代码在下面呈现 ↓↓↓

import requests
import re
def getHtml(url,object):
hd = {'User-Agent':'Mozilla/5.0','Accept':r'text/html,application/xhtml+xml,*/*'}
try:
   response = requests.get(url+object,timeout=10,headers=hd)
   response.raise_for_status()
   response.encoding = response.apparent_encoding
   return response.text
except:
   return None

import os
object = input("请输入要查找的目标:")
while True:
num = int(input("计划下载图片数:"))
if num<=0:
   print("数量非法，必须重新输入")
else:
   break
file_path = "D:/pics"
obj_url = "https://image.baidu.com/search/index?ct=201326592&cl=2&st=-1&lm=-1&nc=1&ie=utf-8&tn=baiduimage&ipn=r&rps=1&pv=&fm=rs5&word="
if not os.path.exists(file_path):
os.mkdir(file_path)
if not os.path.exists(file_path+os.sep+object):
os.mkdir(file_path+os.sep+object)
else:
if len(os.listdir(file_path+os.sep+object)) != 0:
   print("文件已存在")
   exit()
format_str = r'thumbURL":"(https://[^"]+.jpg)'
text = getHtml(obj_url,object)
if text == None:
print("网址访问异常")
exit()
content = re.findall(format_str,text)
content = iter(content)
for i in range(num):
   try:
         with open(file_path+os.sep+object+os.sep+str(i)+".jpg",'wb') as f:
            respon = requests.get(next(content))
            f.write(respon.content)
   except:
         print("爬虫程序提前结束，已经达到最大搜索上限！")
         break

wp231957 发表于 2022-7-15 11:35:17

爬虫代码几乎都有时效性，所谓的过期代码是不能直接用

suchocolate 发表于 2022-7-15 13:31:55

原文代码太乱，不适合学习。
建议选个好的教程学。

页: [1]

鱼C论坛's Archiver

新手学习爬虫，啥也不懂，就是复制链接，但是不成功！希望高手帮忙解疑答惑！