python 2 下载博客论文报错

wyhy921 · 发表于 2015-5-29 17:14:26

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

本帖最后由 wyhy921 于 2015-5-29 17:24 编辑

博文有7页，下载完第一页之后就报错，查不出问题

登录/注册后可看大图

错误提示：

Traceback (most recent call last):
File "1.py", line 33, in <module>
content = urllib.urlopen(url[j]).read()
File "/usr/lib/python2.7/urllib.py", line 87, in urlopen
return opener.open(url)
File "/usr/lib/python2.7/urllib.py", line 208, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 463, in open_file
return self.open_local_file(url)
File "/usr/lib/python2.7/urllib.py", line 477, in open_local_file
raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] No such file or directory: ''

复制代码

运行代码：

#!/usr/bin/python
#coding=utf-8
import urllib
import time
url = ['']*350
page = 1
link = 1
#循环7页
while page <= 7:
con = urllib.urlopen('http://blog.sina.com.cn/s/articlelist_1191258123_0_'+str(page)+'.html').read()
i = 0
title = con.find(r'<a title=')
href = con.find(r'href=',title)
html = con.find(r'.html',href)
#找到一页中的所有链接地址
while title != -1 and href != -1 and html != -1 and i < 50:
url[i] = con[href + 6:html + 5]
print link,url[i]
title = con.find(r'<a title=',html)
href = con.find(r'href=',title)
html = con.find(r'.html',href)
i = i + 1
link = link + 1
else:
print page,'finish end'
page = page + 1
else:
print 'all finish'
#读取并写入所有页面内容
j = 0
while j < 350:
content = urllib.urlopen(url[j]).read()
open(r'hanhan/'+url[j][-26:],'w+').write(content)
print 'downloading',url[j]
j = j + 1
#time.sleep(20)
else:
print 'download finish'

复制代码

Reed · 发表于 2015-6-1 00:42:38

no such file or directory: ''
可能是第一次的url逻辑第二次不适用，比如'hanhan/'+url[j][-26:] 重点检查一下

我大概看了一眼，整体逻辑太乱了，你应该包装成几个函数，比如：
def get_html()
def get_blog()
def save_txt()
然后在主程序里循环执行上面的3个函数，这样就优雅很多嘛

账号		自动登录	找回密码
密码			立即注册