新手python求助,逐行比较,输出不同类型,急急急。。。。
本帖最后由 poppy章鱼 于 2021-11-12 14:14 编辑求大佬帮忙,第二次发帖,急急急。。。。
数据如下,第一列是顺序排列的
scaffold_1..12548467 6
scaffold_1..12548468 6
scaffold_1..12548469 6
scaffold_1..12548470 6
scaffold_1..12548471 6
scaffold_1..12548472 6
scaffold_1..12548473 5
scaffold_1..12548474 5
scaffold_1..12548475 6
scaffold_1..12548476 6
scaffold_1..12548477 6
scaffold_1..12548478 5
scaffold_1..12548479 4
scaffold_1..12548480 4
scaffold_1..12548481 4
scaffold_1..12548482 4
scaffold_1..12548483 4
想要的结果输出格式
scaffold_1..12548467 scaffold_1..12548472 6 6
scaffold_1..12548473 scaffold_1..12548474 5 2
scaffold_1..12548475 scaffold_1..12548477 6 3
scaffold_1..12548478 5 1
scaffold_1..12548479 scaffold_1..12548483 4 5
##
逐行读取, 比如原数据中第二列为6,总共连续出现了6行第二列都是6,,就输出一行,格式为: start,end,6,频次
然后后面再出现6 ,又重新输出,start,end,6,频次
如果第二列是1,则只输出start
感激不尽~~~
本帖最后由 hrpzcf 于 2021-11-12 14:25 编辑
# coding: utf-8
# string = """scaffold_1..12548467 6
# scaffold_1..12548468 6
# scaffold_1..12548469 6
# scaffold_1..12548470 6
# scaffold_1..12548471 6
# scaffold_1..12548472 6
# scaffold_1..12548473 5
# scaffold_1..12548474 5
# scaffold_1..12548475 6
# scaffold_1..12548476 6
# scaffold_1..12548477 6
# scaffold_1..12548478 5
# scaffold_1..12548479 4
# scaffold_1..12548480 4
# scaffold_1..12548481 4
# scaffold_1..12548482 4
# scaffold_1..12548483 4"""
with open("data.txt", "rt", encoding="utf-8") as f:
# slist = string.splitlines()
slist = f.read().splitlines()
freq = 1
sp = slist.split()
start, num = sp, sp[-1]
end = start
for s in slist:
if s[-1] == num:
freq += 1
end = s.split()
else:
if freq == 1:
end = " " * len(end)
print("{} {} {} {}".format(start, end, num, freq))
freq = 1
sp = s.split()
start, num = sp, sp[-1]
end = start
else:
print("{} {} {} {}".format(start, end, num, freq))
hrpzcf 发表于 2021-11-12 14:20
谢谢您,非常感谢。我在本地测试是对了。但是我的原文件较大,所以我改了一下输入方式,结果报错了。您有帮我解答下吗?感激不尽,祝好。
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import sys
args=sys.argv
input1=args
with open(input1, 'r') as f:
slist = f.splitlines()
freq = 1
sp = slist.split()
start, num = sp, sp[-1]
end = start
for s in slist:
if s[-1] == num:
freq += 1
end = s.split()
else:
if freq == 1:
end = " " * len(end)
print("{} {} {} {}".format(start, end, num, freq))
freq = 1
s = s.split()
start, num = s, s[-1]
end = start
else:
print("{} {} {} {}".format(start, end, num, freq))
报错信息:
python3 coordinate.count3.py ./05.txt > 05.new.txt
Traceback (most recent call last):
File "coordinate.count3.py", line 9, in <module>
slist = f.splitlines()
AttributeError: 'file' object has no attribute 'splitlines'
poppy章鱼 发表于 2021-11-12 14:42
谢谢您,非常感谢。我在本地测试是对了。但是我的原文件较大,所以我改了一下输入方式,结果报错了。您有 ...
slist = f.splitlines()写错了,改成
slist = f.read().splitlines()
但是如果文件比可用内存还大的话就麻烦了 hrpzcf 发表于 2021-11-12 14:45
slist = f.splitlines()写错了,改成
slist = f.read().splitlines()
谢谢您,跑成功了,我在服务器上跑,没事。我学习下你的脚本,祝万事胜意。
页:
[1]