我有一个大文件,想以某种方式进行格式化。文件输入示例:
DVL1 03220 NP_004412.2 VANGL2 02758 Q9ULK5 in vitro 12490194
PAX3 09421 NP_852124.1 MEOX2 02760 NP_005915.2 in vitro;yeast 2-hybrid 11423130
VANGL2 02758 Q9ULK5 MAGI3 11290 NP_001136254.1 in vitro;in vivo 15195140
这就是我希望它成为的方式:
DVL1 03220 NP_004412 VANGL2 02758 Q9ULK5
PAX3 09421 NP_852124 MEOX2 02760 NP_005915
VANGL2 02758 Q9ULK5 MAGI3 11290 NP_001136254
总结一下:
如果该行有1个点,则删除该点及其后的数字并添加\ t,因此输出行将仅具有6个制表符分隔的值
如果该行有2个点,则会删除这些点及其后的数字并添加\ t,因此输出行将仅具有6个制表符分隔的值
如果该行没有点,则保持前6个制表符分隔的值
我的想法目前是这样的:
for line in infile:
if "." in line: # thought about this and a line.count('.') might be better, just wasn't capable to make it work
transformed_line = line.replace('.', '\t', 2) # only replaces the dot; want to replace dot plus next first character
columns = transformed_line.split('\t')
outfile.write('\t'.join(columns[:8]) + '\n') # if i had a way to know the position of the dot(s), i could join only the desired columns
else:
columns = line.split('\t')
outfile.write('\t'.join(columns[:5]) + '\n') # this is fine
希望我能自己解释一下。
谢谢你们的努力。
最佳答案
您可以尝试这样的事情:
with open('data1.txt') as f:
for line in f:
line=line.split()[:6]
line=map(lambda x:x[:x.index('.')] if '.' in x else x,line) #if an element has '.' then
#remove that dot else keep the element as it is
print('\t'.join(line))
输出:
DVL1 03220 NP_004412 VANGL2 02758 Q9ULK5
PAX3 09421 NP_852124 MEOX2 02760 NP_005915
VANGL2 02758 Q9ULK5 MAGI3 11290 NP_001136254
编辑:
正如@mgilson所建议的,可以用
line=map(lambda x:x[:x.index('.')] if '.' in x else x,line)
替换行line=map(lambda x:x.split('.')[0],line)
关于python - 使用Python区分具有一个点的线和具有两个点的线,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/11474528/