|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 poppy章鱼 于 2021-11-12 14:14 编辑
求大佬帮忙,第二次发帖,急急急。。。。
数据如下,第一列是顺序排列的
scaffold_1..12548467 6
scaffold_1..12548468 6
scaffold_1..12548469 6
scaffold_1..12548470 6
scaffold_1..12548471 6
scaffold_1..12548472 6
scaffold_1..12548473 5
scaffold_1..12548474 5
scaffold_1..12548475 6
scaffold_1..12548476 6
scaffold_1..12548477 6
scaffold_1..12548478 5
scaffold_1..12548479 4
scaffold_1..12548480 4
scaffold_1..12548481 4
scaffold_1..12548482 4
scaffold_1..12548483 4
想要的结果输出格式
scaffold_1..12548467 scaffold_1..12548472 6 6
scaffold_1..12548473 scaffold_1..12548474 5 2
scaffold_1..12548475 scaffold_1..12548477 6 3
scaffold_1..12548478 5 1
scaffold_1..12548479 scaffold_1..12548483 4 5
##
逐行读取, 比如原数据中第二列为6,总共连续出现了6行第二列都是6,,就输出一行,格式为: start,end,6,频次
然后后面再出现6 ,又重新输出,start,end,6,频次
如果第二列是1,则只输出start
感激不尽~~~
本帖最后由 hrpzcf 于 2021-11-12 14:25 编辑
- # coding: utf-8
- # string = """scaffold_1..12548467 6
- # scaffold_1..12548468 6
- # scaffold_1..12548469 6
- # scaffold_1..12548470 6
- # scaffold_1..12548471 6
- # scaffold_1..12548472 6
- # scaffold_1..12548473 5
- # scaffold_1..12548474 5
- # scaffold_1..12548475 6
- # scaffold_1..12548476 6
- # scaffold_1..12548477 6
- # scaffold_1..12548478 5
- # scaffold_1..12548479 4
- # scaffold_1..12548480 4
- # scaffold_1..12548481 4
- # scaffold_1..12548482 4
- # scaffold_1..12548483 4"""
- with open("data.txt", "rt", encoding="utf-8") as f:
- # slist = string.splitlines()
- slist = f.read().splitlines()
- freq = 1
- sp = slist[0].split()
- start, num = sp[0], sp[-1]
- end = start
- for s in slist[1:]:
- if s[-1] == num:
- freq += 1
- end = s.split()[0]
- else:
- if freq == 1:
- end = " " * len(end)
- print("{} {} {} {}".format(start, end, num, freq))
- freq = 1
- sp = s.split()
- start, num = sp[0], sp[-1]
- end = start
- else:
- print("{} {} {} {}".format(start, end, num, freq))
复制代码
|
|