|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
- import jieba
- txt = open('threekingdoms.txt','r',encoding='utf-8').read()
- excludes = {'将军','却说','荆州','二人','不可','不能','如此'}
- words = jieba.lcut(txt)
- counts ={}
- for word in words:
- if len(word) == 1:
- continue
- elif word == '诸葛亮' or word =='孔明曰':
- rword = '孔明'
- elif word =='关公' or word =='云长':
- rword ='关羽'
- elif word =='玄德' or word =='玄德曰':
- rword = '刘备'
- elif word =='孟德'or word =='丞相曰':
- rword = '曹操'
- else:
- rword = word
- counts[rword] = counts.get(rword,0)+1
- for word in excludes :
- del counts[word]
- items = list(counts.items())
- items.sort(key=lambda x:x[1],reverse=True)
- for i in range(10):
- word,count=items[i]
- print('{0:<10}{1`:>5}'.format(word,count))
-
复制代码
为什么会出现UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid continuation byte错误?这种错误说明了什么?
难道是路径的冒号是中文了?话说无论我的怎么读取都没问题额。。
(, 下载次数: 0)
|
|