|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 wyhy921 于 2015-5-29 17:24 编辑
博文有7页,下载完第一页之后就报错,查不出问题
错误提示:
- Traceback (most recent call last):
- File "1.py", line 33, in <module>
- content = urllib.urlopen(url[j]).read()
- File "/usr/lib/python2.7/urllib.py", line 87, in urlopen
- return opener.open(url)
- File "/usr/lib/python2.7/urllib.py", line 208, in open
- return getattr(self, name)(url)
- File "/usr/lib/python2.7/urllib.py", line 463, in open_file
- return self.open_local_file(url)
- File "/usr/lib/python2.7/urllib.py", line 477, in open_local_file
- raise IOError(e.errno, e.strerror, e.filename)
- IOError: [Errno 2] No such file or directory: ''
复制代码 运行代码:
- #!/usr/bin/python
- #coding=utf-8
- import urllib
- import time
- url = ['']*350
- page = 1
- link = 1
- #循环7页
- while page <= 7:
- con = urllib.urlopen('http://blog.sina.com.cn/s/articlelist_1191258123_0_'+str(page)+'.html').read()
- i = 0
- title = con.find(r'<a title=')
- href = con.find(r'href=',title)
- html = con.find(r'.html',href)
- #找到一页中的所有链接地址
- while title != -1 and href != -1 and html != -1 and i < 50:
- url[i] = con[href + 6:html + 5]
- print link,url[i]
- title = con.find(r'<a title=',html)
- href = con.find(r'href=',title)
- html = con.find(r'.html',href)
- i = i + 1
- link = link + 1
- else:
- print page,'finish end'
- page = page + 1
- else:
- print 'all finish'
- #读取并写入所有页面内容
- j = 0
- while j < 350:
- content = urllib.urlopen(url[j]).read()
- open(r'hanhan/'+url[j][-26:],'w+').write(content)
- print 'downloading',url[j]
- j = j + 1
- #time.sleep(20)
- else:
- print 'download finish'
复制代码
|
|