[已解决]python爬虫爬取套图保存问题

小小蛙 · 发表于 2020-3-14 18:25:14

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

本帖最后由一个账号于 2020-3-14 18:32 编辑

如题，已经获取到每张图片的链接与标题，怎么保存到本地文件夹，我的是错的

import urllib.request
import re
import os
from bs4 import BeautifulSoup
def request_url():
url='https://www.tooopen.com/topiclist/9620.aspx'
response =urllib.request.Request(url)
html = urllib.request.urlopen(response)
#html.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36')
html = html.read().decode('utf-8')
reg =r'<a class="pic" href="(.*?)" title="(.*?)" target="_blank">'
photo_url= re.findall(reg,html)
#print(photo_url[0][0])
for i in photo_url:
#urls =i[0]
#name =i[1]
urls,name =i
r =urllib.request.urlopen(urls).read().decode('utf-8')
#print(r)
soup =BeautifulSoup(r,"html.parser")
img_url = soup.find("div",class_="img-list").find_all("a")
reg=r'http(.*?)jpg'
link = re.findall(reg,str(img_url))
for i in link:
i='http'+i+'jpg'
print(i)
download_imgs = urllib.request.urlopen(i).read()
with open('.jpg','wb') as f:
f.write(i)
request_url()

复制代码

最佳答案

月排行榜 / 总排行榜

一个账号

2020-3-14 18:45:10

小小蛙发表于 2020-3-14 18:39
版主好，我用pip安装requests模块一直会报错，可以用urllib模块演示吗

import urllib.request
import re
import os
import random
from bs4 import BeautifulSoup
def request_url():
url='https://www.tooopen.com/topiclist/9620.aspx'
response =urllib.request.Request(url)
html = urllib.request.urlopen(response)
#html.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36')
html = html.read().decode('utf-8')
reg =r'<a class="pic" href="(.*?)" title="(.*?)" target="_blank">'
photo_url= re.findall(reg,html)
#print(photo_url[0][0])
for i in photo_url:
#urls =i[0]
#name =i[1]
urls,name =i
r =urllib.request.urlopen(urls).read().decode('utf-8')
#print(r)
soup =BeautifulSoup(r,"html.parser")
img_url = soup.find("div",class_="img-list").find_all("a")
reg=r'http(.*?)jpg'
link = re.findall(reg,str(img_url))
for i in link:
i='http'+i+'jpg'
print(i)
download_imgs = urllib.request.urlopen(i).read()
res = urllib.request.urlopen(i)
data = res.read()
with open(f'{random.randint(1, 1000)}.jpg','wb') as f:
f.write(data)
request_url()

复制代码

跳转到最佳答案楼层

wp231957 · 发表于 2020-3-14 18:34:09

上一个帖子谁删的

小小蛙 · 发表于 2020-3-14 18:34:34

wp231957 发表于 2020-3-14 18:34
上一个帖子谁删的

一位版主大哥

一个账号 · 发表于 2020-3-14 18:35:07

本帖最后由一个账号于 2020-3-14 18:37 编辑

代码帮楼主改好了：

import urllib.request

import re

import os

import random

import requests

from bs4 import BeautifulSoup

def request_url():

url='https://www.tooopen.com/topiclist/9620.aspx'

response =urllib.request.Request(url)

html = urllib.request.urlopen(response)

#html.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36')

html = html.read().decode('utf-8')

reg =r'<a class="pic" href="(.*?)" title="(.*?)" target="_blank">'

photo_url= re.findall(reg,html)

#print(photo_url[0][0])

for i in photo_url:

      #urls =i[0]

      #name =i[1]

      urls,name =i

r =urllib.request.urlopen(urls).read().decode('utf-8')

#print(r)

soup =BeautifulSoup(r,"html.parser")

img_url = soup.find("div",class_="img-list").find_all("a")

reg=r'http(.*?)jpg'

link = re.findall(reg,str(img_url))

for i in link:

      i='http'+i+'jpg'

      print(i)

      download_imgs = urllib.request.urlopen(i).read()

      res = requests.get(i)



      with open(f'{random.randint(1, 1000)}.jpg','wb') as f:

         f.write(res.content)



request_url()



复制代码

小小蛙 · 发表于 2020-3-14 18:39:54

本帖最后由一个账号于 2020-3-14 19:07 编辑

一个账号发表于 2020-3-14 18:36
我删的，怎么了？

我用pip安装requests模块一直会报错，可以用urllib模块演示吗

一个账号 · 发表于 2020-3-14 18:45:10

小小蛙发表于 2020-3-14 18:39
版主好，我用pip安装requests模块一直会报错，可以用urllib模块演示吗

import urllib.request
import re
import os
import random
from bs4 import BeautifulSoup
def request_url():
url='https://www.tooopen.com/topiclist/9620.aspx'
response =urllib.request.Request(url)
html = urllib.request.urlopen(response)
#html.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36')
html = html.read().decode('utf-8')
reg =r'<a class="pic" href="(.*?)" title="(.*?)" target="_blank">'
photo_url= re.findall(reg,html)
#print(photo_url[0][0])
for i in photo_url:
#urls =i[0]
#name =i[1]
urls,name =i
r =urllib.request.urlopen(urls).read().decode('utf-8')
#print(r)
soup =BeautifulSoup(r,"html.parser")
img_url = soup.find("div",class_="img-list").find_all("a")
reg=r'http(.*?)jpg'
link = re.findall(reg,str(img_url))
for i in link:
i='http'+i+'jpg'
print(i)
download_imgs = urllib.request.urlopen(i).read()
res = urllib.request.urlopen(i)
data = res.read()
with open(f'{random.randint(1, 1000)}.jpg','wb') as f:
f.write(data)
request_url()

复制代码

小小蛙 · 发表于 2020-3-14 22:19:25

一个账号发表于 2020-3-14 18:45

老哥，可以把照片名字打印上去吗，已经提取到了上面那个name

账号		自动登录	找回密码
密码			立即注册

[已解决]python爬虫爬取套图保存问题

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块