使用 ack 或 awk 或比 grep 更好的方法从另一个文件中获取模式?

也许一个例子会给你一个更好的主意.假设我有 file1:file1:一个C电子和文件 2:file2:1乙 23d 45我想从文件 2 中获取文件 1 中的所有模式以给出:a 135ack 能做到这一点吗?否则，是否有更好的方法来处理这项工作(例如 awk 或使用哈希)，因为我在两个文件中都有数百万条记录并且真的需要一种有效的方式来完成?谢谢！解决方案这是一个 Perl 单行程序，它使用散列来保存来自 file1 的一组想要的键，以便每次迭代的 O(1)(摊销时间)查找file2 的行因此它将在 O(m+n) 时间内运行，其中 m 是您的密钥集中的行数，n 是您正在测试的文件中的行数.perl -ne'BEGIN{open K,shift@ARGV;chomp(@a=<K>);@hash{@a}=()}m/^(p{alpha}+)s/&&exists$hash{$1}&&print' tkeys file2密钥集将保存在内存中，而 file2 将针对密钥逐行进行测试.这是使用 Perl 的 -a 命令行选项的相同内容:perl -ane'BEGIN{open G,shift@ARGV;chomp(@a=<G>);@h{@a}=();}exists$h{$F[0]}&&print'tkeys file2第二个版本可能看起来更容易一些.;)在这里您必须记住的一件事是，您受 IO 限制的可能性比受处理器限制的可能性更大.所以目标应该是最小化 IO 使用.当整个查找键集保存在提供 O(1) 分摊查找的散列中时.此解决方案相对于其他解决方案的优势在于，某些(较慢的)解决方案必须为 file2 的每一行运行一次您的密钥文件 (file1).这种解决方案将是 O(m*n)，其中 m 是密钥文件的大小，n 是 file2 的大小.另一方面，这种散列方法提供了 O(m+n) 时间.这是一个巨大的差异.它的好处是消除了通过键集的线性搜索，并通过 IO 仅读取一次键来进一步受益.Is there a way to obtain patterns in one file (a list of patterns) from another file using ack as the -f option in grep? I see there is an -f option in ack but it's different with the -f in grep.Perhaps an example will give you a better idea. Suppose I have file1:file1:aceAnd file2:file2:a 1b 2c 3d 4e 5And I want to obtain all the patterns in file1 from file2 to give:a 1c 3e 5Can ack do this? Otherwise, is there a better way to handle the job (such like awk or using hash) because I have millions of records in both files and really need an efficient way to complete? Thanks! 解决方案 Here's a Perl one-liner that uses a hash to hold the set of wanted keys from file1 for O(1) (amortized time) lookups per iteration over the lines of file2. So it will run in O(m+n) time, where m is number of lines in your key set, and n is the number of lines in the file you're testing.perl -ne'BEGIN{open K,shift@ARGV;chomp(@a=<K>);@hash{@a}=()}m/^(p{alpha}+)s/&&exists$hash{$1}&&print' tkeys file2The key set will be held in memory while file2 is tested line by line against the keys.Here's the same thing using Perl's -a command line option:perl -ane'BEGIN{open G,shift@ARGV;chomp(@a=<G>);@h{@a}=();}exists$h{$F[0]}&&print' tkeys file2The second version is probably a little easier on the eyes. ;)One thing you have to remember here is that it's more likely that you're IO bound than processor bound. So the goal should be to minimize IO use. When the entire lookup key set is held in a hash that offers O(1) amortized lookups. The advantage this solution may have over other solutions is that some (slower) solutions will have to run through your key file (file1) one time for each line of file2. That sort of solution will be O(m*n) where m is the size of your key file, and n is the size of file2. On the other hand, this hash approach provides O(m+n) time. That's a magnitude of difference. It benefits by eliminating linear searches through the key-set, and further benefits by reading the keys via IO only one time. 这篇关于使用 ack 或 awk 或比 grep 更好的方法从另一个文件中获取模式?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！