关于之前《正确展示用代码吃王力宏的瓜》帖子里的代码几点疑问

非凡 · 发表于 2022-3-4 16:50:31

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

原贴在这：
正确展示用代码吃王力宏的瓜
https://fishc.com.cn/thread-207059-1-1.html
(出处: 鱼C论坛)

帖子里有个附件，是吃瓜的源码，对于这个源码我有几个不理解的地方，希望大神能帮忙解读下

comment_show.py这文件里的代码中的这一段应该都是在将爬取的评论分词吧：

stop_words = []
with open('stop_words.txt', 'r', encoding='utf-8') as f:
lines = f.readlines()
for line in lines:
stop_words.append(line.strip())
content = open('comments.txt', 'rb').read()
# jieba 分词
word_list = jieba.cut(content)
words = []
for word in word_list:
if word not in stop_words:
words.append(word)
wordcount = {}
for word in words:
if word != ' ':
wordcount[word] = wordcount.get(word, 0)+1

复制代码

1、上面代码中第2行：stop_words.txt文件是哪里来的？没有在代码中找到有创建这个文件呐？

2、第6行代码中，comments.txt是存放爬取到的评论的文件，这里为什么要用二进制去打开它？用只 ‘r’ 读模式不行吗?

账号		自动登录	找回密码
密码			立即注册

[技术交流] 关于之前《正确展示用代码吃王力宏的瓜》帖子里的代码几点疑问

马上注册，结交更多好友，享用更多功能^_^