敏感词过滤程序,求解
1.完成一个敏感词过滤程序,分别输入待过滤的文件名与保存过滤结果的文件名,将待过滤文件中单独的敏感词转换为等长度
的“*”字符。
2.设待过滤文件和敏感词均为纯英文文本文件,需要过滤的敏感
词存放在文本文件“sensitivetxt”中,每行一个敏感词。
3. 所谓单独的敏感词指的是单词前后可以有标点符号和空格,但
是不能有字母,另外,待过滤的敏感词并不区分大小写。
(运用正则表达式)
import re
def foo(obj_file_path, new_file_path, key_file_path='sensitive.txt'):
with open(obj_file_path, 'r', encoding='utf-8') as of:
text = of.read()
with open(key_file_path, 'r', encoding='utf-8') as kf:
keywords = kf.readlines()
for keyword in keywords:
keyword = keyword.strip()
all_kws = re.findall(keyword, text, flags=re.IGNORECASE)
if all_kws:
for kw in all_kws:
text = text.replace(kw, '*' * len(kw))
with open(new_file_path, 'w', encoding='utf-8') as nf:
nf.write(text)
if __name__ == '__main__':
ofp = 'obj.txt'
nfp = 'new.txt'
kfp = 'sensitive.txt'
foo(ofp, nfp, kfp)
qq1151985918 发表于 2022-11-8 17:19
多谢大神指点,谢谢谢谢,明白了{:5_109:}
页:
[1]