[已解决]妹子是找到，但是怎么搂在怀里呀！！！！

maxliu06 · 发表于 2020-6-13 12:58:22

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

如图，我已得到该图片了，但是怎么保存下来。。。

with open (photodata + '.jpg' , 'wb') as f:
TypeError: can only concatenate list (not "str") to list

最佳答案

月排行榜 / 总排行榜

Twilight6

2020-6-13 14:51:29

maxliu06 发表于 2020-6-13 14:41
意思就是再请求一下找到的图片url ??~!!!

当然要，你爬到链接就单纯的是个链接  而且这个网站有反爬你要在请求头要这么写：

headers = {

      'Referer': 'https://www.mzitu.com/',

      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0'

                     }
复制代码

跳转到最佳答案楼层

Twilight6 · 发表于 2020-6-13 13:03:34

TypeError: can only concatenate list (not "str") to list

看你报错应该是前面代码是 .text 的你改成 .content就好

以防万一还是发个代码看看

永恒的蓝色梦想 · 发表于 2020-6-13 13:09:40

没代码啊……

maxliu06 · 发表于 2020-6-13 13:12:36

Twilight6 发表于 2020-6-13 13:03
看你报错应该是前面代码是 .text 的你改成 .content就好

以防万一还是发个代码看看

#!/usr/bin/env python
#-*- coding: utf-8 -*-
import requests
import re
import os
def photolists(html):
"""获取每一页的图集url 及标题 """
photo_urls = re.findall(r'<li><a href="(.*?)" target="_blank"><img', html) # 图片url
photo_titles = re.findall(r'alt=\'(.*?)\' width=\'236\' height=\'354\' />', html) # 图集标题
return list(zip(photo_titles, photo_urls))
def photourl(url,headers):
response = requests.get(url=url, headers=headers)
html = response.text
# 图集最大页码
maxpage = re.findall(r'>…<a href=\'.*?\'>(\d+)</a><a href=\'.*?\'>下一页', html)[0]
pages = range(1,int(maxpage)+1)
for page in pages:
# print(url+'/'+str(page)) # 图集的url
phtoturl = url+'/'+str(page) # 每张图片的真正url
response = requests.get(url=phtoturl, headers=headers)
html = response.text
photodata = re.findall(r'<img class="blur" src="(.*?)" alt=',html)
phototname = photodata[0].split(".")[-2]
with open (photodata + '.jpg' , 'wb') as f:
f.write(photodata[0].content)
break
# 主程序
def main():
# 主页url
url = 'https://www.mzitu.com/'
#构造请求头
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0'
}
response = requests.get(url, headers=headers)
photos = photolists(response.text) # 得到当前页的所有图集 url及标题
for i in range(len(photos)): # 打开每一个图集
#print(photos[i][1])
# 创建文件夹
if not os.path.exists(r'D:\测试\{}'.format(photos[i][0])):
os.mkdir(r'D:\测试\{}'.format(photos[i][0]))
# 获取图集并保存
photourl(photos[i][1], headers)
break
#程序入口
if __name__ == '__main__':
main()

复制代码

maxliu06 · 发表于 2020-6-13 13:13:36

Twilight6 发表于 2020-6-13 13:03
看你报错应该是前面代码是 .text 的你改成 .content就好

以防万一还是发个代码看看

看你写的那个，运行后找不到妹子呀。。

所以今天想自己来找找

Twilight6 · 发表于 2020-6-13 13:16:10

maxliu06 发表于 2020-6-13 13:13
看你写的那个，运行后找不到妹子呀。。
所以今天想自己来找找

photos = photolists(response.text)

这边改成

photos = photolists(response.content)

maxliu06 · 发表于 2020-6-13 13:19:01

Twilight6 发表于 2020-6-13 13:16
photos = photolists(response.text)

这边改成

这是啥意思。。。

Twilight6 · 发表于 2020-6-13 13:21:53

maxliu06 发表于 2020-6-13 13:19
这是啥意思。。。

text 返回的是unicode 型的数据

content返回的是bytes，二进制型的数据

Twilight6 · 发表于 2020-6-13 13:24:32

maxliu06 发表于 2020-6-13 13:19
这是啥意思。。。

等等是我看错了我重新看下代码不是这里问题好像

Twilight6 · 发表于 2020-6-13 13:43:44

maxliu06 发表于 2020-6-13 13:19
这是啥意思。。。

要改代码了网站源码变了，我的也爬不到了

maxliu06 · 发表于 2020-6-13 14:02:39

Twilight6 发表于 2020-6-13 13:43
要改代码了网站源码变了，我的也爬不到了

但是我能得到图片的url呀。。

maxliu06 · 发表于 2020-6-13 14:03:11

Twilight6 发表于 2020-6-13 13:43
要改代码了网站源码变了，我的也爬不到了

得到这东西，，不就可以保存下来的吗

Twilight6 · 发表于 2020-6-13 14:24:02

maxliu06 发表于 2020-6-13 14:03
得到这东西，，不就可以保存下来的吗

#!/usr/bin/env python

#-*- coding: utf-8 -*-

import requests

import re

import os

def photolists(html):

"""获取每一页的图集url 及标题 """

photo_urls = re.findall(r'<li><a href="(.*?)" target="_blank"><img', html) # 图片url

photo_titles = re.findall(r'alt=\'(.*?)\' width=\'236\' height=\'354\' />', html) # 图集标题

return list(zip(photo_titles, photo_urls))

def photourl(url,headers):

response = requests.get(url=url, headers=headers)

html = response.text

# 图集最大页码

maxpage = re.findall(r'>…<a href=\'.*?\'>(\d+)</a><a href=\'.*?\'>下一页', html)[0]

pages = range(1,int(maxpage)+1)

for page in pages:

 # print(url+'/'+str(page)) # 图集的url

 phtoturl = url+'/'+str(page) # 每张图片的真正url

 response = requests.get(url=phtoturl, headers=headers)

 html = response.text

 photodata = re.findall(r'<img class="blur" src="(.*?)" alt=',html)

 phototname = photodata[0].split(".")[-2].split('/')[-1]

 photo = requests.get(url=photodata[0], headers=headers)

 with open (phototname + '.jpg' , 'wb') as f:

 print('123')

 f.write(photo.content)

 break

# 主程序

def main():

# 主页url

url = 'https://www.mzitu.com/'

#构造请求头

headers = {

 'Referer': 'https://www.mzitu.com/',

 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0'

 }

response = requests.get(url, headers=headers)

photos = photolists(response.text) # 得到当前页的所有图集 url及标题

for i in range(len(photos)): # 打开每一个图集

 #print(photos[i][1])

 # 创建文件夹

 if not os.path.exists(r'{}'.format(photos[i][0])):

 os.mkdir(r'{}'.format(photos[i][0]))

 # 获取图集并保存

 photourl(photos[i][1], headers)

 break

#程序入口

if __name__ == '__main__':

main()

复制代码

maxliu06 · 发表于 2020-6-13 14:41:12

Twilight6 发表于 2020-6-13 14:24

意思就是再请求一下找到的图片url ??~!!!

Twilight6 · 发表于 2020-6-13 14:51:29

这个最佳答案由 Twilight6 给出，感谢 Twilight6 的回答。

单击隐藏图章

maxliu06 发表于 2020-6-13 14:41
意思就是再请求一下找到的图片url ??~!!!

当然要，你爬到链接就单纯的是个链接  而且这个网站有反爬你要在请求头要这么写：

headers = {

      'Referer': 'https://www.mzitu.com/',

      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0'

                     }
复制代码

maxliu06 · 发表于 2020-6-13 15:31:38

Twilight6 发表于 2020-6-13 14:51
当然要，你爬到链接就单纯的是个链接而且这个网站有反爬你要在请求头要这么写：

#!/usr/bin/env python
#-*- coding: utf-8 -*-
import requests
import re
import os
def photolists(html):
"""获取每一页的图集url 及标题 """
photo_urls = re.findall(r'<li><a href="(.*?)" target="_blank"><img', html) # 图片url
photo_titles = re.findall(r'alt=\'(.*?)\' width=\'236\' height=\'354\' />', html) # 图集标题
return list(zip(photo_titles, photo_urls))
def photourl(url, headers):
response = requests.get(url=url, headers=headers)
html = response.text
# 图集最大页码
maxpage = re.findall(r'>…<a href=\'.*?\'>(\d+)</a><a href=\'.*?\'>下一页', html)
#print(maxpage)
pages = range(1,int(maxpage[0])+1) # 每个图集的最大页码
photo =[]
for page in pages:
# print(url+'/'+str(page)) # 图集的url
try:
phtoturl = url+'/'+str(page) # 每张图片的真正url
response = requests.get(url=phtoturl, headers=headers)
html = response.text
photodata = re.findall(r'<img class="blur" src="(.*?)" alt=',html)
phototname = photodata[0].split(".")[-2].split('/')[-1]
photo.append([photodata[0],phototname])
except:
break
return photo
# 主程序
def main():
# 主页url
url = 'https://www.mzitu.com/'
# 构造请求头
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0'
}
response = requests.get(url, headers=headers)
photos = photolists(response.text) # 得到当前页的所有图集 url及标题
for i in range(len(photos)): # 打开每一个图集
# print(photos[i][1])
# 创建文件夹
#if not os.path.exists(r'D:\{}'.format(photos[i][0])):
# os.mkdir(r'D:\{}'.format(photos[i][0]))
# 获取图集并保存
phdatas = photourl(photos[i][1], headers) # 图集中每个图片的url
# 下载图片
for photo in phdatas:
print(photo[0])
print(photo[1])
res = requests.get(url=photo[0], headers=headers)
with open(photo[1] + '.jpg', 'wb') as f:
f.write(res.content)
break
break
#程序入口
if __name__ == '__main__':
main()

复制代码

#程序入口
if __name__ == '__main__':
main()
[/code]

大侠，再请教一下。。为什么我保存的图片，显示不了。用你的却可以的那个就可以显示。这问题出在哪里？

Twilight6 · 发表于 2020-6-13 15:37:16

maxliu06 发表于 2020-6-13 15:31
#程序入口
if __name__ == '__main__':
main()

你看清楚我上面一楼的评论，怎么都不认真看回复捏?

maxliu06 · 发表于 2020-6-13 15:43:19

Twilight6 发表于 2020-6-13 15:37
你看清楚我上面一楼的评论，怎么都不认真看回复捏?

#!/usr/bin/env python
#-*- coding: utf-8 -*-
import requests
import re
import os
def photolists(html):
"""获取每一页的图集url 及标题 """
photo_urls = re.findall(r'<li><a href="(.*?)" target="_blank"><img', html) # 图片url
photo_titles = re.findall(r'alt=\'(.*?)\' width=\'236\' height=\'354\' />', html) # 图集标题
return list(zip(photo_titles, photo_urls))
def photourl(url, headers):
response = requests.get(url=url, headers=headers)
html = response.text
# 图集最大页码
maxpage = re.findall(r'>…<a href=\'.*?\'>(\d+)</a><a href=\'.*?\'>下一页', html)
#print(maxpage)
pages = range(1,int(maxpage[0])+1) # 每个图集的最大页码
photo =[]
for page in pages:
# print(url+'/'+str(page)) # 图集的url
try:
phtoturl = url+'/'+str(page) # 每张图片的真正url
response = requests.get(url=phtoturl, headers=headers)
html = response.text
photodata = re.findall(r'<img class="blur" src="(.*?)" alt=',html)
phototname = photodata[0].split(".")[-2].split('/')[-1]
photo.append([photodata[0],phototname])
except:
break
return photo
# 主程序
def main():
# 主页url
url = 'https://www.mzitu.com/'
# 构造请求头
headers = {
'Referer': 'https://www.mzitu.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0'
}
response = requests.get(url, headers=headers)
photos = photolists(response.text) # 得到当前页的所有图集 url及标题
for i in range(len(photos)): # 打开每一个图集
# print(photos[i][1])
# 创建文件夹
#if not os.path.exists(r'D:\{}'.format(photos[i][0])):
# os.mkdir(r'D:\{}'.format(photos[i][0]))
# 获取图集并保存
phdatas = photourl(photos[i][1], headers) # 图集中每个图片的url
# 下载图片
for photo in phdatas:
print(photo[0])
print(photo[1])
res = requests.get(url=photo[0], headers=headers)
with open(photo[1] + '.jpg', 'wb') as f:
f.write(res.content)
break
break
#程序入口
if __name__ == '__main__':
main()

复制代码

没注意到那个信息。。。我换了你说的请求头了。下载回来的图片，说是已损坏的。。。。

Twilight6 · 发表于 2020-6-13 15:44:31

maxliu06 发表于 2020-6-13 15:43
没注意到那个信息。。。我换了你说的请求头了。下载回来的图片，说是已损坏的 ...

你试试我的代码就会发现可以运行了我现在在外面不方便嘿嘿

maxliu06 · 发表于 2020-6-13 15:45:02

Twilight6 发表于 2020-6-13 15:44
你试试我的代码就会发现可以运行了我现在在外面不方便嘿嘿

你的确实可以。。。

账号		自动登录	找回密码
密码			立即注册

[已解决]妹子是找到， 但是怎么搂在怀里呀！！！！

马上注册，结交更多好友，享用更多功能^_^

评分

[已解决]妹子是找到，但是怎么搂在怀里呀！！！！