关于053将将网页内容写入文档的的练习题
原题:写一个程序,依次访问文件中指定的站点,并将每个站点返回的内容依次存放到不同的文件中这是我的代码
import urllib.request as ur
def test():
count = 1
f1 = open('urls.txt', 'r', encoding='utf-8')
for each_line in f1:
try:
response = ur.urlopen(each_line).read()
f = open(('url_%s' % count + '.txt'), 'w', encoding='utf-8' )
f.write(str(response))
f.close()
except HTTPError as reason:
print('无法访问%s' % each_line)
count += 1
test()
然后我想问一下,这个异常捕获这一块总是出错,URLError和HTTPError都不行,应该怎么改啊~~~
是我的问题还是网站的问题呢?
http://www.fishc.com
http://www.baidu.com
http://www.douban.com
http://www.zhihu.com
http://www.taobao.com 好像只可以访问两个网站~~能写出来两个文档 需要先从 urllib.error 导入 HTTPError,而且爬虫被网站反爬了,加个 headers 即可
正确代码:
import urllib.request as ur
from urllib.error import HTTPError
def test():
count = 1
f1 = open('urls.txt', 'r', encoding='utf-8')
for each_line in f1:
try:
req = ur.Request(each_line, headers={'User-Agent': 'Mozilla/5.0'})
response = ur.urlopen(req).read()
f = open(('url_%s' % count + '.txt'), 'w', encoding='utf-8')
f.write(str(response))
f.close()
except HTTPError as reason:
print('无法访问%s' % each_line)
count += 1
test() import urllib.request as ur
from urllib.error import HTTPError
def test():
count = 1
f1 = open('urls.txt', 'r', encoding='utf-8')
for each_line in f1:
try:
response = ur.urlopen(each_line).read()
f = open(('url_%s' % count + '.txt'), 'w', encoding='utf-8')
f.write(str(response))
f.close()
except HTTPError as reason:
print('无法访问%s' % each_line)
count += 1
test()
zltzlt 发表于 2020-7-27 10:06
需要先从 urllib.error 导入 HTTPError,而且爬虫被网站反爬了,加个 headers 即可
正确代码:
{:7_119:} 快 5 秒 Twilight6 发表于 2020-7-27 10:06
快 5 秒
{:10_297:}{:10_250:} 谢谢两位大佬 “f = open(('url_%s' % count + '.txt'), 'w', encoding='utf-8')”,encoding要使用其内容对应的编码。
页:
[1]