|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
以下是一个爬虫的练习代码:
# coding=utf-8
import requests
import re
header = {"user-agent":"Mozilla/5.0"}
url = "http://www.qiushibaike.com/hot/page/1"
r = requests.get(url,headers=header)
html = r.text
pattern = re.compile(r'<div class="content">\n<span>\n([\s\S]*?)\n</span>')
ContentList = pattern.findall(html)
print ContentList[-1]
fo = open(r'F:\pa\r.text','a')
for i in ContentList:
fo.write(i)
fo.close()
返回结果如下:
======================= RESTART: F:\pa\抓取糗事段子.py =======================
出差半个多月,一回到家就闻到家里有一股子还没散去的烟味儿,随后又看到茶几上的烟灰缸里还残留着的冒烟的烟头。我当时就意识到有情况:“媳妇儿,你过来给我解释下家里的烟味和烟头是怎么回事?”媳妇儿支支吾吾半天不肯说话,我当时那叫一个气啊!“你说你啊!怎么让你戒烟就戒不掉呐!我这才出差几天啊,又抽烟!”
Traceback (most recent call last):
File "F:\pa\抓取糗事段子.py", line 17, in <module>
fo.write(i)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-13: ordinal not in range(128)
>>>
在代码开头已经定义了 # coding=utf-8
为什么在结果还会返回:ASCII的错误呢?
谢谢各位了。。
- # coding=utf-8
- import requests
- import re
- header = {"user-agent":"Mozilla/5.0"}
- url = "http://www.qiushibaike.com/hot/page/1"
- r = requests.get(url,headers=header)
- html = r.text
- pattern = re.compile(r'<div class="content">\n<span>\n([\s\S]*?)\n</span>')
- ContentList = pattern.findall(html)
- print ContentList[-1]
- fo = open('r.text','a')
- for i in ContentList:
- fo.write(i.encode('utf-8'))
- fo.close()
复制代码
之所以出错是因为文件内也有自己的编码格式
|
|