[已解决]求助---A文件中的词汇出现在B文件中的统计

Amgalang · 发表于 2022-3-24 15:15:15

大佬们帮帮我，因我的等级原因只能发布10个鱼币谢谢。

具体问题是：我有A和B两个UTF-8d的.txt文件分别实例如下：

A.txt
编号
03642179N
03719538N
...

B.txt
序号,内容,编号
AA,容貌,03642179N
BB,示例,03642179N
CC,标注,03719538N
DD,特征,03719538N
...

想要得到的结果是：
如果A.txt中的编号字段在B.txt的编号字段里，且大于1次以上，也就是频率大于1（不包含1次），那么把该编号在B.txt中《序号》字段所包含的《英文字母》统一放在单独的一个[]里面。内容的《汉语内容》也是如此。

示例结果：
C
03642179N,[AA,BB],[容貌,示例]
03719538N,[CC,DD],[标注,特征]

谢谢您的帮助，在线等待，谢谢

最佳答案

月排行榜 / 总排行榜

qbw941054510

2022-3-24 15:15:16

本帖最后由 qbw941054510 于 2022-3-24 21:40 编辑

给一个大致的思路：
1. 用 set 存 A.txt 中的所有编号
2. 设一个 dict，键为编号，值为 list
3. 遍历 B.txt 的每一条，用 split 提取编号，如果编号在 set 中，那么存到 dict 里
4. 遍历 dict，按特定格式存到 C.txt

with open('a.txt', encoding='utf8') as f:
s = set(f.read().split('\n'))
d = {}
with open('b.txt', encoding='utf8') as f:
lines = f.read().split('\n')
for line in lines:
arr = line.split(',')
order, content, serial = arr[0], arr[1], arr[2]
if serial in s:
if serial not in d:
d[serial] = {'order': [], 'content': []}
d[serial]['order'].append(order)
d[serial]['content'].append(content)
def get_str(item):
return '[' + ','.join(item) + ']'
with open('c.txt', mode='w', encoding='utf8') as f:
contents = []
for key in d.keys():
order, content = d[key]['order'], d[key]['content']
contents.append(','.join([key, get_str(order), get_str(content)]))
f.write('\n'.join(contents))

复制代码

跳转到最佳答案楼层

qbw941054510 · 发表于 2022-3-24 15:15:16

这个最佳答案由 qbw941054510 给出，感谢 qbw941054510 的回答。

单击隐藏图章

本帖最后由 qbw941054510 于 2022-3-24 21:40 编辑

给一个大致的思路：
1. 用 set 存 A.txt 中的所有编号
2. 设一个 dict，键为编号，值为 list
3. 遍历 B.txt 的每一条，用 split 提取编号，如果编号在 set 中，那么存到 dict 里
4. 遍历 dict，按特定格式存到 C.txt

with open('a.txt', encoding='utf8') as f:
s = set(f.read().split('\n'))
d = {}
with open('b.txt', encoding='utf8') as f:
lines = f.read().split('\n')
for line in lines:
arr = line.split(',')
order, content, serial = arr[0], arr[1], arr[2]
if serial in s:
if serial not in d:
d[serial] = {'order': [], 'content': []}
d[serial]['order'].append(order)
d[serial]['content'].append(content)
def get_str(item):
return '[' + ','.join(item) + ']'
with open('c.txt', mode='w', encoding='utf8') as f:
contents = []
for key in d.keys():
order, content = d[key]['order'], d[key]['content']
contents.append(','.join([key, get_str(order), get_str(content)]))
f.write('\n'.join(contents))

复制代码

lfhuang · 发表于 2022-3-24 20:46:01

新手，留步，学习

Amgalang · 发表于 2022-3-28 10:13:21

qbw941054510 发表于 2022-3-24 20:58
给一个大致的思路：
1. 用 set 存 A.txt 中的所有编号
2. 设一个 dict，键为编号，值为 list

哇哦，跟您学习，谢谢您的帮助啦，wish you have a nice day~

账号		自动登录	找回密码
密码			立即注册

[已解决]求助---A文件中的词汇出现在B文件中的统计

最佳答案

浏览过的版块