|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
步骤
一,1将文本转换为词列表字符串 split(),多种标点符号:全部统一替换为 空格
2,循环查看每个词若词在字典中存在,加1,词不存在,添加到字典中
二. 生成文档集词典
三. 按照词典词顺序,生成文档词频列表
1,将每个文档的字典key取出转化为集合
2,使用Python内置函数set()合并所有集合得到出现在所有文档中的词将set转换为列表
3,按照文档集词典长度,为每个文档生成词频向量,将文本序列初始化为全0
遍历单个文档的字典,查找每个词在总词典中的位置序号
用词频值为文本列表对应位置复制
仅供参考
- s1 = '''When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the Powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation. '''
- s2 = s1.lower()
- s2 = s2.replace(',', '')
- s2 = s2.replace('.', '')
- s2 = s2.replace('\'s', ' is')
- s2 = s2.strip()
- ls1 = s2.split(' ')
- set1 = set(ls1)
- dic1 = {}
- for x in set1:
- dic1[x] = s2.count(x)
- result = sorted(dic1.items(), key=lambda item: item[1], reverse=True)
- print(result[0:5])
复制代码
|
|