[已解决]python编码问题

AaBbCc186 · 发表于 2017-11-5 11:28:30

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

以下是一个爬虫的练习代码：
# coding=utf-8

import requests
import re

header = {"user-agent":"Mozilla/5.0"}
url = "http://www.qiushibaike.com/hot/page/1"
r = requests.get(url,headers=header)
html = r.text

pattern = re.compile(r'<div class="content">\n<span>\n([\s\S]*?)\n</span>')
ContentList = pattern.findall(html)
print ContentList[-1]

fo = open(r'F:\pa\r.text','a')
for i in ContentList:
fo.write(i)
fo.close()

返回结果如下：
======================= RESTART: F:\pa\抓取糗事段子.py =======================

出差半个多月，一回到家就闻到家里有一股子还没散去的烟味儿，随后又看到茶几上的烟灰缸里还残留着的冒烟的烟头。我当时就意识到有情况：“媳妇儿，你过来给我解释下家里的烟味和烟头是怎么回事？”媳妇儿支支吾吾半天不肯说话，我当时那叫一个气啊！“你说你啊！怎么让你戒烟就戒不掉呐！我这才出差几天啊，又抽烟！”

Traceback (most recent call last):
File "F:\pa\抓取糗事段子.py", line 17, in <module>
fo.write(i)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-13: ordinal not in range(128)
>>>

在代码开头已经定义了 # coding=utf-8
为什么在结果还会返回：ASCII的错误呢？
谢谢各位了。。

最佳答案

月排行榜 / 总排行榜

Teagle

2017-11-5 17:24:38

# coding=utf-8
import requests
import re
header = {"user-agent":"Mozilla/5.0"}
url = "http://www.qiushibaike.com/hot/page/1"
r = requests.get(url,headers=header)
html = r.text
pattern = re.compile(r'<div class="content">\n<span>\n([\s\S]*?)\n</span>')
ContentList = pattern.findall(html)
print ContentList[-1]
fo = open('r.text','a')
for i in ContentList:
fo.write(i.encode('utf-8'))
fo.close()

复制代码

之所以出错是因为文件内也有自己的编码格式

跳转到最佳答案楼层

SixPy · 发表于 2017-11-5 12:00:01

r = requests.get(url,headers=header)
r.encoding = 'utf-8'
html = r.text

AaBbCc186 · 发表于 2017-11-5 12:36:34

SixPy 发表于 2017-11-5 12:00
r = requests.get(url,headers=header)
r.encoding = 'utf-8'
html = r.text

抱歉，似乎并不行。返回结果为：
======================= RESTART: F:\pa\抓取糗事段子.py =======================

我暗恋店里一个女孩很久了，……割……昨晚睡不着就微信我师父并告诉了他，他说现在的女孩子都很现实你还是老老实实上班赚钱吧。<br/>第二天发工资他又找我借钱，我算了下前后左右总共找我借了1300了，当时没借，后来我无意间在大街上看到他带着我暗恋的那个女孩子去了宾馆……

Traceback (most recent call last):
File "F:\pa\抓取糗事段子.py", line 18, in <module>
fo.write(i)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-184: ordinal not in range(128)
>>>

python版本为2.7

Teagle · 发表于 2017-11-5 17:24:38

这个最佳答案由 Teagle 给出，感谢 Teagle 的回答。

单击隐藏图章

# coding=utf-8
import requests
import re
header = {"user-agent":"Mozilla/5.0"}
url = "http://www.qiushibaike.com/hot/page/1"
r = requests.get(url,headers=header)
html = r.text
pattern = re.compile(r'<div class="content">\n<span>\n([\s\S]*?)\n</span>')
ContentList = pattern.findall(html)
print ContentList[-1]
fo = open('r.text','a')
for i in ContentList:
fo.write(i.encode('utf-8'))
fo.close()

复制代码

之所以出错是因为文件内也有自己的编码格式

账号		自动登录	找回密码
密码			立即注册

[已解决]python编码问题

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块