|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
代码来源:小甲鱼的零基础学习python视频中关于爬虫模块的一节视频:论一只爬虫的自我修养4:OOXX
import urllib.request
import os
def url_open(url):
req = urllib.request.Request(url)
req.add_header('User-Agent','Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36')
response = urllib.request.urlopen(url)
html = response.read()
print(url)
return html
def get_page(url):
html = url_open(url).decode('utf-8')
a = html.find('current-comment-page') + 23
b = html.find(']',a)
print(html[a:b])
return (html[a:b])
def find_imgs(url):
html = url_open(url).decode('utf-8')
img_addrs = []
a = html.find('img src=') #find找不到该内容就会返回-1
while a != -1:
b = html.find('.jpg',a,a+255)
if b != -1:
img_addrs.append(html[a+9:b+4])
else:
b = a + 9
a = html.find('img src=',b)
for each in img_addrs:
print(each)
def save_imgs(folder,img_addrs):
for each in img_addrs:
filename = each.split('/')[-1] #保存的文件名为最后一个‘/’后面的名字
with open(filename,'wb') as f:
img = url_open(each)
f.write()
def download_mm(folder="ooxx",pages=10):
os.mkdir(folder)
os.chdir(folder)
url = "http://jandan.net/ooxx/"
page_num = int(get_page(url))
for i in range(pages):
page_num -= i
page_url = url + 'page-' + str(page_num) + '#comments'
img_addrs = find_imgs(page_url)
save_imgs(folder,img_addrs)
if __name__ == '__main__':
download_mm()
代码照着视频里面的代码写的,但运行时出现以下两个错误:
http://jandan.net/ooxx/
19
http://jandan.net/ooxx/page-19#comments
//wx3.sinaimg.cn/mw600/0076BSS5ly1g3zmygpflsj30u00u0q5i.jpg
//wx3.sinaimg.cn/mw600/0076BSS5ly1g3zmc5d48oj30kw0kw75s.jpg
//wx1.sinaimg.cn/mw600/0076BSS5ly1g3zl3r931oj30hs0qoq4m.jpg
//wx1.sinaimg.cn/mw600/007JBZrwly1g3z17wilv0j31400u01kx.jpg
//wx2.sinaimg.cn/mw600/0076BSS5ly1g3zjw7wx8jj30u00u0161.jpg
//wx2.sinaimg.cn/mw600/d9f2e9cagy1g3zixhs73tj21jk15o7wi.jpg
//wx4.sinaimg.cn/mw600/0076BSS5ly1g3ziphbmxkj30u018y7wh.jpg
//wx1.sinaimg.cn/mw600/6d9d69baly1g3zhqngjj7j20b90gsmy6.jpg
//wx4.sinaimg.cn/mw600/9c109e01gy1g3zhnk0qxtj20c80l7ta0.jpg
//wx3.sinaimg.cn/mw600/0076BSS5ly1g3zhfhxyzhj30ku0n040n.jpg
//wx1.sinaimg.cn/mw600/0076BSS5ly1g3zgu5huiij30m80xcn4j.jpg
Traceback (most recent call last):
File "C:\Users\Glassy Sky\Desktop\download_mm.py", line 64, in <module>
download_mm()
File "C:\Users\Glassy Sky\Desktop\download_mm.py", line 60, in download_mm
save_imgs(folder,img_addrs)
File "C:\Users\Glassy Sky\Desktop\download_mm.py", line 43, in save_imgs
for each in img_addrs:
TypeError: 'NoneType' object is not iterable
第一个错误是后面for循环为啥没有循环,小甲鱼视频里都可以。
第二个错误是函数find_imgs(url)中打印出来的图片网址前面为啥没有“http:”。小甲鱼视频里是可以的,为啥我照着写一遍就运行不出来了 。求大佬帮忙解答一下 |
|