三体第一章拼音数据,Python交流,编程语言专区,鱼C论坛

歌者文明清理员 发表于 2023-2-17 20:30:42

三体第一章拼音数据

本帖最后由歌者文明清理员于 2023-2-18 10:51 编辑

因为整个文件有2.73mb，所以只上传前面部分

本来是想自己做一个需要用到汉字大全的项目，可是因为，网上的很多汉字大全都不全，而且拼音格式也要整半天，所以
答对者有最佳答案伺候

汉字大全可以从一本书里取字，书越长越精确，漏掉的汉字越少。
我想到了三体{:10_256:}
已知，判断一个字是否为汉字的代码为r'[\u4e00-\u9fa5]'。
文件名为三体前1.txt
请问如何打开文件并识别出其中的拼音，然后保存到名为“汉字.chars”的二进制文件中

保存为二进制文件的方法 -> https://fishc.com.cn/thread-224536-1-1.html
获取拼音 -> https://fishc.com.cn/forum.php?mod=viewthread&tid=224579#lastpost

from re import match
from pickle import dump
from xpinyin import Pinyin
p = Pinyin()
file = open('三体前1.txt', 'r')
content = file.read()
file.close()
chars = {}
for char in content:
if match(r'[\u4e00-\u9fa5]', char):
   pinyin = p.get_pinyin(char)
   if pinyin in chars:
         chars.append(char)
   else:
         chars =

sfqxx 发表于 2023-2-17 20:40:39

抢个楼先{:5_102:}

Mta123456 发表于 2023-2-18 15:04:18

算我一个

页: [1]

鱼C论坛's Archiver

三体第一章拼音数据