[已解决]关于053将将网页内容写入文档的的练习题

狗宁 · 发表于 2020-7-27 10:02:01

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

原题：写一个程序，依次访问文件中指定的站点，并将每个站点返回的内容依次存放到不同的文件中
这是我的代码

import urllib.request as ur
def test():
count = 1
f1 = open('urls.txt', 'r', encoding='utf-8')
for each_line in f1:
try:
response = ur.urlopen(each_line).read()
f = open(('url_%s' % count + '.txt'), 'w', encoding='utf-8' )
f.write(str(response))
f.close()
except HTTPError as reason:
print('无法访问%s' % each_line)
count += 1
test()

复制代码

然后我想问一下，这个异常捕获这一块总是出错，URLError和HTTPError都不行，应该怎么改啊~~~
是我的问题还是网站的问题呢？
http://www.fishc.com
http://www.baidu.com
http://www.douban.com
http://www.zhihu.com
http://www.taobao.com

最佳答案

月排行榜 / 总排行榜

Twilight6

2020-7-27 10:06:20

import urllib.request as ur

from urllib.error import HTTPError

def test():

count = 1

f1 = open('urls.txt', 'r', encoding='utf-8')

for each_line in f1:

      try:

         response = ur.urlopen(each_line).read()

         f = open(('url_%s' % count + '.txt'), 'w', encoding='utf-8')

         f.write(str(response))

         f.close()

      except HTTPError as reason:

         print('无法访问%s' % each_line)

count += 1

test()

复制代码

跳转到最佳答案楼层

狗宁 · 发表于 2020-7-27 10:04:24

好像只可以访问两个网站~~能写出来两个文档

zltzlt · 发表于 2020-7-27 10:06:15

需要先从 urllib.error 导入 HTTPError，而且爬虫被网站反爬了，加个 headers 即可

正确代码：

import urllib.request as ur
from urllib.error import HTTPError
def test():
count = 1
f1 = open('urls.txt', 'r', encoding='utf-8')
for each_line in f1:
try:
req = ur.Request(each_line, headers={'User-Agent': 'Mozilla/5.0'})
response = ur.urlopen(req).read()
f = open(('url_%s' % count + '.txt'), 'w', encoding='utf-8')
f.write(str(response))
f.close()
except HTTPError as reason:
print('无法访问%s' % each_line)
count += 1
test()

复制代码

Twilight6 · 发表于 2020-7-27 10:06:20

这个最佳答案由 Twilight6 给出，感谢 Twilight6 的回答。

单击隐藏图章

import urllib.request as ur

from urllib.error import HTTPError

def test():

count = 1

f1 = open('urls.txt', 'r', encoding='utf-8')

for each_line in f1:

      try:

         response = ur.urlopen(each_line).read()

         f = open(('url_%s' % count + '.txt'), 'w', encoding='utf-8')

         f.write(str(response))

         f.close()

      except HTTPError as reason:

         print('无法访问%s' % each_line)

count += 1

test()

复制代码

Twilight6 · 发表于 2020-7-27 10:06:43

zltzlt 发表于 2020-7-27 10:06
需要先从 urllib.error 导入 HTTPError，而且爬虫被网站反爬了，加个 headers 即可

正确代码：

快 5 秒

zltzlt · 发表于 2020-7-27 10:07:17

Twilight6 发表于 2020-7-27 10:06
快 5 秒

狗宁 · 发表于 2020-7-27 10:10:32

谢谢两位大佬

zhuchuanhan · 发表于 2022-4-5 17:08:36

“f = open(('url_%s' % count + '.txt'), 'w', encoding='utf-8')”，encoding要使用其内容对应的编码。

账号		自动登录	找回密码
密码			立即注册