鱼C论坛

 找回密码
 立即注册
查看: 1648|回复: 2

[已解决]将英文文档集表示为词袋特征向量?怎么用python实现

[复制链接]
发表于 2020-10-4 07:59:03 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
步骤
一,1将文本转换为词列表字符串 split(),多种标点符号:全部统一替换为 空格
       2,循环查看每个词若词在字典中存在,加1,词不存在,添加到字典中
二. 生成文档集词典
三. 按照词典词顺序,生成文档词频列表
1,将每个文档的字典key取出转化为集合
2,使用Python内置函数set()合并所有集合得到出现在所有文档中的词将set转换为列表
3,按照文档集词典长度,为每个文档生成词频向量,将文本序列初始化为全0
遍历单个文档的字典,查找每个词在总词典中的位置序号
用词频值为文本列表对应位置复制
最佳答案
2020-10-4 08:25:16
仅供参考
  1. s1 = '''When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the Powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation. '''
  2. s2 = s1.lower()
  3. s2 = s2.replace(',', '')
  4. s2 = s2.replace('.', '')
  5. s2 = s2.replace('\'s', ' is')
  6. s2 = s2.strip()
  7. ls1 = s2.split(' ')
  8. set1 = set(ls1)
  9. dic1 = {}
  10. for x in set1:
  11.     dic1[x] = s2.count(x)
  12. result = sorted(dic1.items(), key=lambda item: item[1], reverse=True)
  13. print(result[0:5])
复制代码
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2020-10-4 08:25:16 | 显示全部楼层    本楼为最佳答案   
仅供参考
  1. s1 = '''When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the Powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation. '''
  2. s2 = s1.lower()
  3. s2 = s2.replace(',', '')
  4. s2 = s2.replace('.', '')
  5. s2 = s2.replace('\'s', ' is')
  6. s2 = s2.strip()
  7. ls1 = s2.split(' ')
  8. set1 = set(ls1)
  9. dic1 = {}
  10. for x in set1:
  11.     dic1[x] = s2.count(x)
  12. result = sorted(dic1.items(), key=lambda item: item[1], reverse=True)
  13. print(result[0:5])
复制代码
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-10-4 10:54:49 | 显示全部楼层

兄弟,非常感谢,给了我很大的启发
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2024-6-3 07:05

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表