本文介绍了什么是最快的egrep的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

我需要一个大的CSV文件egrep命令拥有200万线,我希望egrep的时间缩短至0.5秒,这是可能的呢?不,我不希望在这个时候数据库(sqlite3的或MySQL)。

I need to egrep from a large csv file with 2 million lines, I want to cut down the egrep time to 0.5 sec, is this possible at all? no, I don't want database (sqlite3 or MySQL) at this time..

$ time wc foo.csv
2000000 22805420 334452932 foo.csv
real 0m3.396s
user 0m3.261s
sys 0m0.115s

我已经能够从40秒减少运行时间为1.75秒。

I've been able to cut down the run time from 40 secs to 1.75 secs

$ time egrep -i "storm|broadway|parkway center|chief financial" foo.csv|wc -l

108292

real    0m40.707s
user    0m40.137s
sys     0m0.309s

$ time LC_ALL=C egrep -i "storm|broadway|parkway center|chief financial" foo.csv|wc -l

108292

real    0m1.751s
user    0m1.590s
sys     0m0.140s

但我想egrep的实时不到半秒钟,任何技巧将大大AP preciated,不断的文件的变化,所以我不能使用任何缓存机制...

but I want the egrep real time to be less than half a second, any tricks will be greatly appreciated, the file changes continuously, so I can't use any cache mechanism...

推荐答案

如果你只是搜索关键字,你可以使用 fgrep一样(或的grep -F )而不是 egrep的

If you are just searching for keywords, you could use fgrep (or grep -F) instead of egrep:

LC_ALL=C grep -F -i -e storm -e broadway -e "parkway center" -e "chief financial"

尝试接下来的事情会被分解出 -i ,这可能是目前的瓶颈。如果你确定只有第一个字母可能是大写,例如,你可以这样做:

The next thing to try would be factoring out -i, which is probably now the bottleneck. If you're sure that only the first letter might be capitalized, for example, you could do:

LC_ALL=C grep -F \
   -e{S,s}torm -e{B,b}roadway -e{P,p}"arkway "{C,c}enter -e{C,c}"hief "{F,f}inancial

这篇关于什么是最快的egrep的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-09 01:09