马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
比如要求:
用python把这种断句有问题的碎片字幕合成完整句子? 1.以句号,问号等为标志重新合并成完整句子 2.尽量保持原有字幕条数 3.考虑相应调整时间轴的时间
它竟然把原句子写入代码里,
import re
def merge_subtitles(srt_content):
# 分割字幕条目
entries = srt_content.strip().split('\n\n')
merged_entries = []
for entry in entries:
lines = entry.split('\n')
if len(lines) < 3:
continue # 跳过不完整的条目
# 提取时间轴和文本
time_line = lines[1]
text = ' '.join(lines[2:]).replace(' - ', '') # 合并文本并去掉多余的符号
# 使用正则表达式合并句子
sentences = re.split(r'(?<=[.!?]) +', text)
merged_text = ' '.join(sentences).strip()
# 检查并合并不完整的句子
if merged_text.endswith('maybe we can'):
# 尝试合并到下一个条目
if merged_entries:
merged_entries[-1] = merged_entries[-1].strip() + ' ' + merged_text
continue
# 重新构建条目
merged_entry = f"{lines[0]}\n{time_line}\n{merged_text}\n"
merged_entries.append(merged_entry)
return '\n\n'.join(merged_entries)
# 示例使用
srt_content = """1
00:00:12,220 --> 00:00:12,930
- And now you want to find him.Yeah.So maybe we can
2
00:00:13,930 --> 00:00:20,850
help each other out.CI. Teresa Daniel. I think she got caught snooping on Beck and something happened to her. You need to find Quinn.
3
00:00:21,850 --> 00:00:24,060
While you're doing that, you can help me find Teresa."""
merged_srt = merge_subtitles(srt_content)
print(merged_srt)
|