鱼C论坛

 找回密码
 立即注册
查看: 3252|回复: 3

[已解决]Python爬虫,运行一下报错一大片

[复制链接]
发表于 2020-2-14 22:08:32 | 显示全部楼层 |阅读模式
4鱼币
我的python爬虫报错,
  1. mport requests
  2. import re


  3. Headers = {
  4.     'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"}
  5. response=requests.get("https://www.vmgirls.com/9384.html",headers=Headers)
  6. #print(response.request.headers)
  7. html=response.text

  8. #print(html)
  9. print("------------------------------------------------------------------------------------------------")
  10. s=re.findall('<img alt=".*?" src="(.*?)" >',html)
  11. print(s)
复制代码

图片如下:
最佳答案
2020-2-14 22:08:33
1.我运行了你的程序,没有出错,但也没有输出,建议重装requests库。
2.用re不如用bs4方便。我用bs4修改了你的程序,供参考。
  1. import requests
  2. import bs4


  3. Headers = {
  4.     'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"}
  5. response=requests.get("https://www.vmgirls.com/9384.html",headers=Headers)
  6. #print(response.request.headers)
  7. html=response.text

  8. soup = bs4.BeautifulSoup(html, "html.parser")
  9. result = soup.find_all("img", class_="alignnone size-full")
  10. for each in result:
  11.         print(each["data-src"])
复制代码
  1. https://static.vmgirls.com/image/2018/08/2018-08-10_13-52-47.jpg
  2. https://static.vmgirls.com/image/2018/08/2018-08-10_13-53-00.jpg
  3. https://static.vmgirls.com/image/2018/08/2018-08-10_13-53-05.jpg
  4. https://static.vmgirls.com/image/2018/08/2018-08-10_13-53-10.jpg
  5. https://static.vmgirls.com/image/2018/08/2018-08-10_13-53-20.jpg
  6. https://static.vmgirls.com/image/2018/08/2018-08-10_13-53-26.jpg
  7. https://static.vmgirls.com/image/2018/08/2018-08-10_13-53-32.jpg
复制代码
捕获.PNG

最佳答案

查看完整内容

1.我运行了你的程序,没有出错,但也没有输出,建议重装requests库。 2.用re不如用bs4方便。我用bs4修改了你的程序,供参考。
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2020-2-14 22:08:33 | 显示全部楼层    本楼为最佳答案   
1.我运行了你的程序,没有出错,但也没有输出,建议重装requests库。
2.用re不如用bs4方便。我用bs4修改了你的程序,供参考。
  1. import requests
  2. import bs4


  3. Headers = {
  4.     'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"}
  5. response=requests.get("https://www.vmgirls.com/9384.html",headers=Headers)
  6. #print(response.request.headers)
  7. html=response.text

  8. soup = bs4.BeautifulSoup(html, "html.parser")
  9. result = soup.find_all("img", class_="alignnone size-full")
  10. for each in result:
  11.         print(each["data-src"])
复制代码
  1. https://static.vmgirls.com/image/2018/08/2018-08-10_13-52-47.jpg
  2. https://static.vmgirls.com/image/2018/08/2018-08-10_13-53-00.jpg
  3. https://static.vmgirls.com/image/2018/08/2018-08-10_13-53-05.jpg
  4. https://static.vmgirls.com/image/2018/08/2018-08-10_13-53-10.jpg
  5. https://static.vmgirls.com/image/2018/08/2018-08-10_13-53-20.jpg
  6. https://static.vmgirls.com/image/2018/08/2018-08-10_13-53-26.jpg
  7. https://static.vmgirls.com/image/2018/08/2018-08-10_13-53-32.jpg
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2020-2-15 14:49:34 | 显示全部楼层
程序没出错,但没输出,改了一下

  1. import requests
  2. import re


  3. Headers = {
  4.     'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"
  5.     }
  6. response=requests.get("https://www.vmgirls.com/9384.html",headers=Headers)
  7. #print(response.request.headers)
  8. html=response.text

  9. print("------------------------------------------------------------------------------------------------")
  10. s=re.findall('size-full" data-src="(https:[^"]+\.jpg)',html)
  11. for each in s:
  12.     print(each)

复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2020-2-15 16:06:21 | 显示全部楼层
报的错是说请求超时,即这个网站无法访问,有几种原因:这个网站被墙了,你访问不了;或者你的网络有问题。

我换了一个网站,访问正常。另外你的正则表达式写得不严谨,我帮你改了一下。运行之前请你确认安装了requests模块

  1. import requests
  2. import re


  3. Headers = {
  4.     'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"}
  5. response = requests.get(
  6.     "https://www.meitulu.com/item/3289.html", headers=Headers)
  7. # print(response.request.headers)
  8. html = response.text

  9. # print(html)
  10. print("------------------------------------------------------------------------------------------------")
  11. s = re.findall('<img[^>]*?src=[\'"]([^\'"]*?jpg)[\'"][^>]*?>',
  12.                html, flags=re.I | re.M | re.S)
  13. print(s)
复制代码


结果:
  1. [
  2. 'https://mtl.gzhuibei.com/css/logo.jpg',
  3. 'https://mtl.gzhuibei.com/images/img/3289/1.jpg',
  4. 'https://mtl.gzhuibei.com/images/img/3289/2.jpg',
  5. 'https://mtl.gzhuibei.com/images/img/3289/3.jpg',
  6. 'https://mtl.gzhuibei.com/images/img/3289/4.jpg',
  7. 'https://mtl.gzhuibei.com/images/img/10695/0.jpg',
  8. 'https://mtl.gzhuibei.com/images/img/16676/0.jpg',
  9. 'https://mtl.gzhuibei.com/images/img/16029/0.jpg',
  10. 'https://mtl.gzhuibei.com/images/img/16512/0.jpg',
  11. 'https://mtl.gzhuibei.com/images/img/6809/0.jpg',
  12. 'https://mtl.gzhuibei.com/images/img/17765/0.jpg',
  13. 'https://mtl.gzhuibei.com/images/img/17831/0.jpg',
  14. 'https://mtl.gzhuibei.com/images/img/17733/0.jpg'
  15. ]
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2025-7-4 09:37

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表