本文介绍了在python中删除大文本文件中的特定行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我有几个大文本文本文件都具有相同的结构,我想删除前3行,然后从第4行删除非法字符。我不希望读取整个数据集,然后修改每个文件超过100MB并记录超过400万条记录。 范围150.0dB -64.9dBm 移动单元1基数-17.19968 145.40369 999.8 固定单位2移动-17.20180 145.29514 533.0 纬度经度Rx(dB)最佳单位 -17.06694 145.23158 -050.5 2 -17.06695 145.23297 -044.1 2 所以1,2和3行应该被删除,并且在第4行中,Rx(db)应该只是Rx并且最佳单位被更改为Best_Unit。然后,我可以使用其他脚本对数据进行地理编码。 我不能使用像grep这样的命令行程序(,因为前三行不完全相同 - 数字(例如150.0dB,-64 *)将在每个文件中发生变化,因此您必须删除整行1-3,然后grep或类似的可以在第4行进行搜索替换。 感谢你们, ===编辑新的pythonic方式来处理来自@heltonbiker的大文件。错误。 import os,re ## infile = arcpy.GetParameter(0) ## chunk_size = arcpy.GetParameter(1)#每个数据集中记录的数量 infile ='trc_emerald.txt' fc =打开(infile)名称= infile [:infile .rfind('。')] outfile = Name +'_ db.txt' line4 = fc.readlines(100)[3] line4 = re.sub(' \([^ \)]。*?\)','',line4) line4 = re.sub('Best(\s。*?)','Best_',line4) newfilestring =''.join(line4 + [line for line in fc.readlines [4:]]) fc.close() newfile = open(outfile,'w') newfile.write(newfilestring) newfile.close() $ b $ del行 del outfile del名称 #return chunk_size,fl #arcpy.SetParameterAsText(2,fl) printCompleted 解决方案正如wim在评论中所说, sed 是正确的工具。 sed -i -e'4 s /(dB)//'-e '4 s / Best Unit / Best_Unit /'-e'1,3 d'yourfile.whatever 稍微解释一下这个命令: -i 就地执行该命令,也就是将输出写回进入输入文件 $ -c $ -c $执行一个命令 '4 s /(dB)//'在线 4 ,替换' ' for '(dB)' '4 s / Best Unit / Best_Unit /'与上面相同,但不同的查找和替换字符串除外 '1,3 d'从第1行到第3行(含)删除整行 sed 是一个非常强大的工具,它可以做的不仅仅是这一点,值得学习。 I have several large text text files that all have the same structure and I want to delete the first 3 lines and then remove illegal characters from the 4th line. I don't want to have to read the entire dataset and then modify as each file is over 100MB with over 4 million records.Range 150.0dB -64.9dBmMobile unit 1 Base -17.19968 145.40369 999.8Fixed unit 2 Mobile -17.20180 145.29514 533.0Latitude Longitude Rx(dB) Best unit-17.06694 145.23158 -050.5 2-17.06695 145.23297 -044.1 2So lines 1,2 and 3 should be deleted and in line 4, "Rx(db)" should be just "Rx" and "Best Unit" be changed to "Best_Unit". Then I can use my other scripts to geocode the data.I can't use commandline programs like grep (as in this question) as the first 3 lines are not all the same -the numbers (such as 150.0dB, -64*) will change in each file so you have to just delete the whole of lines 1-3 and then grep or similar can do the search-replace on line 4.Thanks guys,=== EDIT new pythonic way to handle larger files from @heltonbiker. Error.import os, re##infile = arcpy.GetParameter(0)##chunk_size = arcpy.GetParameter(1) # number of records in each datasetinfile='trc_emerald.txt'fc= open(infile)Name = infile[:infile.rfind('.')]outfile = Name+'_db.txt'line4 = fc.readlines(100)[3]line4 = re.sub('\([^\)].*?\)', '', line4)line4 = re.sub('Best(\s.*?)', 'Best_', line4)newfilestring = ''.join(line4 + [line for line in fc.readlines[4:]])fc.close()newfile = open(outfile, 'w')newfile.write(newfilestring)newfile.close()del linesdel outfiledel Name#return chunk_size, fl#arcpy.SetParameterAsText(2, fl)print "Completed" 解决方案 As wim said in the comments, sed is the right tool for this. The following command should do what you want:sed -i -e '4 s/(dB)//' -e '4 s/Best Unit/Best_Unit/' -e '1,3 d' yourfile.whateverTo explain the command a little:-i executes the command in place, that is it writes the output back into the input file-e execute a command'4 s/(dB)//' on line 4, subsitute '' for '(dB)''4 s/Best Unit/Best_Unit/' same as above, except different find and replace strings'1,3 d' from line 1 to line 3 (inclusive) delete the entire linesed is a really powerful tool, which can do much more than just this, well worth learning. 这篇关于在python中删除大文本文件中的特定行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云! 09-05 18:15