|

楼主 |
发表于 2018-9-4 19:45:23
|
显示全部楼层
- from gensim import corpora,models
- text = open(r'C:\Users\lenovo\Desktop\自然语言处理\白沙分词.txt', 'r', encoding='gbk').read()
- dictionary = corpora.Dictionary([[text]])
- print(dictionary.token2id)
- corpus = [dictionary.doc2bow([corpus]) for corpus in text]
- tfidf = models.TfidfModel(corpus)
- corpus_tfidf = tfidf[corpus]
- total_topics = 2
- lsi = models.LsiModel(corpus_tfidf,id2word=dictionary,num_topics=total_topics)
- for index, topic in lsi.print_topics(total_topics):
- print('Topic #'+str(index+1))
- print(topic)
复制代码 |
|