本文介绍了自然语言解析的约会?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个Java库来帮助解析表示日历应用程序的约会的用户输入的文本。例如:

I'm looking for a Java library to help parse user entered text that represents an 'appointment' for a calendar application. For instance:

星期二上午11点半与麦克午餐

Lunch with Mike at 11:30 on Tuesday

5pm星期五欢乐时光

5pm Happy hour on Friday

我发现了一些有希望的线索,如和,它可以解析日期,但我也需要能够提取事件的标题,如午餐与迈克。

I've found some promising leads like https://github.com/samtingleff/jchronic and http://www.datejs.com/ which can parse dates - but I also need to be able to extract the title of the event like "Lunch with Mike".

如果这样一个API不存在,我也有兴趣从编码角度对如何最好地解决问题提出任何想法。

If such an API doesn't exist, I'm also interested in any thoughts on how best to approach the problem from a coding perspective.

推荐答案

扩展JChronic可能是你最好的选择。我认为,考虑到的回应,这不太可能一个预建的库存在(虽然看起来像这样的事情可能是有用的...我猜测,解析自然语言日期的主要用例将更有用,如果他们有能力提取来自用户提供的字符串的附加数据)。

Extending JChronic may be your best bet. I think, given the responses to this question, it's unlikely that a pre-built library for this exists (though it seems like such a thing could be useful... I'm guessing that the major use-cases for parsing natural language dates would be even more useful if they had the ability to extract additional data from user-supplied strings).

实际上,可能最直接的事情是扩展JChronic,因为它支持相当大的一部分的用例,但更多的是,扩展/修改/包装parse()方法来支持事件标题的自定义扫描程序不应太难。 (我自己喜欢这些将是包装框架而不是fork并修改它,因为这样可以让您从底层代码的任何改进中获益)。

Implementation-wise, probably the most straight-forward thing to do is to extend JChronic, since, it supports quite a significant part of your use-case, but more over as you can see from the unit test extraneous information should already be ignored by the framework.Fortunately, too, if you look at the main class, it should not be too hard to extend / modify / wrap the parse() method to support a custom scanner for an event title. (My own preference of these would be to wrap the framework rather than fork and modify it, as that allows you to benefit from any improvements to the underlying code more easily).

最终,可能证明最直接的方法是产生一个正则表达式解析器,忽略JChronic尝试捕获的大部分内容(这意味着对JChronic源代码的熟悉程度)。

Ultimately, what may prove the most straight-forward way of doing this is to generate a regex-parser that ignores most of what JChronic tries to capture (and this would mean becoming deeply familiar with the JChronic source code).

与任何NLP类型的项目一样,成功实现这一点的关键是拥有尽可能多的示例,最好是自动化单元测试(最终,即使测试用例测试重复相同的功能多次,最好有更少的例子)。幸运的是,由于我们正在谈论自然语言,所以这样的测试用例应该特别容易,因为即使非程序员的朋友,家人等也应该能够为您提供事件描述(或任何您想要调用的内容)他们)。您还要特别关注边缘情况,其中日期解析位可能会干扰位置/标题解析位(例如在晚上8点的sigurrós中,at显然是部分时间,而在party在phoebe的星期六这显然不是)。

The key to successfully implementing this, as with any NLP-type project is to have as many examples as you can possibly get, preferrably as automated unit tests (ultimately, even if the test cases test duplicate the same functionality many times, it is better to have more examples than fewer). Fortunately, since we're talking about natural language, such test cases should be particularly easy to get, since even non-programmer friends, family, etc. should be able to provide you with "event descriptions" (or whatever you want to call them). You'll also want to especially focus on edge cases where the date-parsing bit might interfere with the location / title parsing bit (for example in "sigur rós at 8pm" the "at" is clearly part of the time whereas in "party at phoebe's saturday" it clearly isn't).

我意识到我对JChronic说了很多,但是我觉得这对你的问题来说已经是一个自然的选择涵盖了解释自然语言约会的大部分硬部分,即我们用于时间的语言的模糊性,并且已经以您所针对的语言实现。

I realize I said quite a bit about JChronic, but I feel that it's a natural choice for your problem as it already covers much of the "hard part" of parsing natural-language "appointments", i.e., the fuzziness of our language that we use about time, and is already implemented in the language you are targetting.

这篇关于自然语言解析的约会?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 16:44