java - 解析日志文件以提取查询

我想从日志文件中提取某些URL。但是我只想提取排名为1或2的那些查询。日志文件包含一个colum itemRank，给出排名。
到目前为止，我能够通过浏览文本来提取某些URL。但是我不知道如何实现只单击URL或单击URL的条件。

例如，这是部分日志文件的样子：

（列为ID，日期，时间，RANK，URL）

763570 2006-03-06 14:09:48 2 http://something.com

763570 2006-03-06 14:09:48 3 http://something.com

在这里，我只想提取第一个查询，因为它的排名为2。

到目前为止，这是我的代码：

public class Scanner {

    public static void main(String[] args) throws FileNotFoundException {


        File testFile = new File ("C:/Users/Zyaad/logs.txt");
        Scanner s = new Scanner(testFile);
        int count=0;

        String pattern="http://ontology.buffalo.edu";
        while(s.hasNextLine()){
            String line = s.nextLine();

            if (line.contains(pattern)){
                count++;

                System.out.println(count + ".query: " );
                System.out.println(line);
            }

        }   System.out.println("url was clicked: "+ count + " times");

        s.close();

        }
}

我应该怎么做才能打印出1.查询？我试过像[\t\n\b\r\f] [1,2]{1}[\t\n\b\r\f]这样的正则表达式，但这没用。

最佳答案

一个简单的（可能是简单的）方法是：

确定您要查找的电话号码（严重性？）
确定网址的起始格式

例

// assume this is the file you're parsing so I don't have to repeat
// the whole Scanner part here
String theFile = "763570 2006-03-06 14:09:48 2 http://something2.com\r\n" +
        "763570 2006-03-06 14:09:48 3 http://something3.com";
//                           | your starting digit of choice
//                           | | one white space
//                           | | | group 1 start
//                           | | | | partial protocol of the URL
//                           | | | |  | any character following in 1+ instances
//                           | | | |  | | end of group 1
//                           | | | |  | |
Pattern p = Pattern.compile("2\\s(http.+)");
Matcher m = p.matcher(theFile);
while (m.find()) {
    // back-referencing group 1
    System.out.println(m.group(1));
}

输出量

http://something2.com

注意

通常建议不要使用正则表达式来解析日志文件。

长期实施自己的解析器并将项目标记为对象的属性（我假设每行1个），然后根据需要进行操作，可能会更好。

关于java - 解析日志文件以提取查询，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/23747369/