|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
我有A.txt和B.txt,每个txt包含了一百多个近义词集合,一行一个集合,部分示例如下:
A.txt
{1,2,3,4}
{5,6}
{7,8,9}
B.txt
{6,5}
{9,8}
具体问题:两个文件包含了一些同样的数据,我想去除它:例如,上个数据中{5,6}、{6,5}同样,就留下一个;
{7,8,9}、{9,8}这两个就留下{7,8,9}因为内容多。
结果生成C.txt,内容如下:
{1,2,3,4}
{5,6}
{7,8,9}
需要您的帮助,期待回复,谢谢哒....
本帖最后由 qq1151985918 于 2021-12-2 20:41 编辑
- fun = lambda x: str({i.strip() for i in x.split(",")})
- with open("A.txt", "r", encoding="utf-8") as f:
- setsA = list(map(fun, f.readlines()))
- with open("B.txt", "r", encoding="utf-8") as g:
- setsB = list(map(fun, g.readlines()))
- all_sets = [eval(s) for s in set(setsA + setsB) if s]
- new_sets = []
- while all_sets:
- s = all_sets.pop()
- if any(s.issubset(x) for x in all_sets):
- continue
- elif any(s.issubset(y) for y in new_sets):
- continue
- else:
- new_sets.append(s)
- with open("C.txt", "w", encoding="utf-8") as h:
- h.write("\n".join(map(str, new_sets)))
- print("OK!")
复制代码
|
|