|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
import os
aa = ['G','A','S','T','C','V','L','I','M','P','F','Y','W','D','E','N','Q','H','K','R']
path = 'G:\\20190928\\Download\\' #文件夹目录
files = os.listdir(path) #遍历文件夹下的所有文件名称
text = 'AA'
file1 = open (path + text, 'w', encoding='UTF-8')
print(file = file1,end="\t")
for x in aa:
print(x.strip(),file = file1,end="\t") #打印二肽到文件
print(file = file1)
for file in files:
if '.fa'in os.path.splitext(file)[1]: #获取所有含‘.fa’的文件
fa_path = path + file
content = open(fa_path , 'r') #读文档内的内容
seq1 = []
j = {} #将aa建一个字典
for y in aa:
j[y] = 0 #每循环完一个氨基酸就将字典初始化
for seq in content:
if '>' in seq:
del seq
else:
seq = seq.strip() #strip去掉末尾空格和换行符
seq1.append(seq)
for b in seq1:
c=list(b)
for each_char1_index in range(len(c)):
AA = c[each_char1_index]
if AA in aa:
j[AA] +=1
print(file.split(".fa")[0],file = file1,end="\t") #打印文件名到文件
for a in aa:
print(j[a],file = file1, end = "\t")
print(file = file1)
file1.close()
原来跑过这个代码,没出现什么问题,但这两天用不同的数据重新跑,报错了几次,如下:
UnicodeDecodeError: 'gbk' codec can't decode byte 0x89 in position 1700: illegal multibyte sequence
开始 file1 = open (path + text, 'w', encoding='UTF-8')这一行没有encoding='UTF-8',百度一下,说有可能是编码器的原因,所以就加上了encoding='UTF-8',又跑了一遍,还是同样的报错信息。
想请教各位大神,我现在到底是啥原因呢?
本帖最后由 XiaoPaiShen 于 2019-10-5 06:09 编辑
第17行也要加上encoding, 我本地没文件,没法测试
- import os
- aa = ['G','A','S','T','C','V','L','I','M','P','F','Y','W','D','E','N','Q','H','K','R']
-
- path = 'G:\\20190928\\Download\\' #文件夹目录
- files = os.listdir(path) #遍历文件夹下的所有文件名称
- text = 'AA'
- file1 = open (path + text, 'w', encoding='UTF-8')
- print(file = file1,end="\t")
- for x in aa:
- print(x.strip(),file = file1,end="\t") #打印二肽到文件
- print(file = file1)
- for file in files:
- if '.fa'in os.path.splitext(file)[1]: #获取所有含‘.fa’的文件
- fa_path = path + file
- content = open(fa_path , 'r', encoding='UTF-8') #读文档内的内容
- seq1 = []
- j = {} #将aa建一个字典
- for y in aa:
- j[y] = 0 #每循环完一个氨基酸就将字典初始化
- for seq in content:
- if '>' in seq:
- del seq
- else:
- seq = seq.strip() #strip去掉末尾空格和换行符
- seq1.append(seq)
- for b in seq1:
- c=list(b)
- for each_char1_index in range(len(c)):
- AA = c[each_char1_index]
- if AA in aa:
- j[AA] +=1
- print(file.split(".fa")[0],file = file1,end="\t") #打印文件名到文件
- for a in aa:
- print(j[a],file = file1, end = "\t")
- print(file = file1)
-
- file1.close()
复制代码
|
|