关于053将将网页内容写入文档的的练习题,Python交流,编程语言专区,鱼C论坛

狗宁发表于 2020-7-27 10:02:01

关于053将将网页内容写入文档的的练习题

原题：写一个程序，依次访问文件中指定的站点，并将每个站点返回的内容依次存放到不同的文件中
这是我的代码
import urllib.request as ur

def test():
count = 1
f1 = open('urls.txt', 'r', encoding='utf-8')
for each_line in f1:
   try:
         response = ur.urlopen(each_line).read()
         f = open(('url_%s' % count + '.txt'), 'w', encoding='utf-8' )
         f.write(str(response))
         f.close()
   except HTTPError as reason:
         print('无法访问%s' % each_line)
count += 1

test()


然后我想问一下，这个异常捕获这一块总是出错，URLError和HTTPError都不行，应该怎么改啊~~~
是我的问题还是网站的问题呢？
http://www.fishc.com
http://www.baidu.com
http://www.douban.com
http://www.zhihu.com
http://www.taobao.com

狗宁发表于 2020-7-27 10:04:24

好像只可以访问两个网站~~能写出来两个文档

zltzlt 发表于 2020-7-27 10:06:15

需要先从 urllib.error 导入 HTTPError，而且爬虫被网站反爬了，加个 headers 即可

正确代码：

import urllib.request as ur
from urllib.error import HTTPError

def test():
count = 1
f1 = open('urls.txt', 'r', encoding='utf-8')
for each_line in f1:
   try:
         req = ur.Request(each_line, headers={'User-Agent': 'Mozilla/5.0'})
         response = ur.urlopen(req).read()
         f = open(('url_%s' % count + '.txt'), 'w', encoding='utf-8')
         f.write(str(response))
         f.close()
   except HTTPError as reason:
         print('无法访问%s' % each_line)
count += 1

test()

Twilight6 发表于 2020-7-27 10:06:20

import urllib.request as ur
from urllib.error import HTTPError

def test():
count = 1
f1 = open('urls.txt', 'r', encoding='utf-8')
for each_line in f1:
   try:
         response = ur.urlopen(each_line).read()
         f = open(('url_%s' % count + '.txt'), 'w', encoding='utf-8')
         f.write(str(response))
         f.close()
   except HTTPError as reason:
         print('无法访问%s' % each_line)
count += 1

test()

Twilight6 发表于 2020-7-27 10:06:43

zltzlt 发表于 2020-7-27 10:06
需要先从 urllib.error 导入 HTTPError，而且爬虫被网站反爬了，加个 headers 即可

正确代码：

{:7_119:} 快 5 秒

zltzlt 发表于 2020-7-27 10:07:17

Twilight6 发表于 2020-7-27 10:06
快 5 秒

{:10_297:}{:10_250:}

狗宁发表于 2020-7-27 10:10:32

谢谢两位大佬

zhuchuanhan 发表于 2022-4-5 17:08:36

“f = open(('url_%s' % count + '.txt'), 'w', encoding='utf-8')”，encoding要使用其内容对应的编码。

页: [1]

鱼C论坛's Archiver

关于053将将网页内容写入文档的的练习题