本文介绍了弹性搜寻和Y10k(超过4位数字的年份)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现与Elastic Search查询有关的这个问题,但是由于 ES日期格式文档链接到 java.time.format.DateTimeFormatter 类的API文档,该问题并非真正针对ES.

简短摘要:我们遇到的问题是9999年以后的日期,更确切地说是4位数字以上的年份.

存储在ES中的文档具有日期字段,该日期字段在索引描述符中以日期"格式定义,该格式对应于"yyyy-MM-dd"格式.使用DateTimeFormatter中的模式语言.我们正在获取用户输入,并使用org.apache.commons.validator.DateValidator.isValid和模式"yyyy-MM-dd"来验证输入.如果有效,我们将使用用户输入创建一个ES查询.如果用户输入类似20202-12-03的内容,则执行失败.搜索词可能不是故意的,但是预期的行为是不会找到任何东西,也不是软件咳嗽了一个异常.

问题是org.apache.commons.validator.DateValidator在内部使用较旧的SimpleDateFormat类来验证输入是否符合模式和"yyyy"的含义.如SimpleDateFormat所解释的那样:至少使用4位数字,但如果需要,可以使用更多数字.创建带有模式"yyyy-MM-dd"的SimpleDateFormat.因此,两者都将解析诸如"20202-07-14"之类的输入.并类似地格式化年份大于9999的Date对象.

新的DateTimeFormatter类要严格得多,并且用"yyyy"表示恰好四个数字.它将无法解析诸如"20202-07-14"之类的输入字符串.并且也无法格式化年份超过9999的Temporal对象.值得注意的是,DateTimeFormatter本身具有处理可变长度字段的能力.常数DateTimeFormatter.ISO_LOCAL_DATE例如不等同于"yyyy-MM-dd",但是符合ISO8601,允许年份使用多于四位数,但将至少使用四位数.此常量是使用DateTimeFormatterBuilder而不是使用模式字符串以编程方式创建的.

ES不能配置为使用DateTimeFormatter中定义的常量(例如ISO_LOCAL_DATE),而只能使用模式字符串.ES还知道预定义模式的列表,文档中有时还会引用ISO标准,但是它们似乎是错误的,并且忽略了有效的ISO日期字符串可以包含五位数字的年份.

我可以使用多个允许的日期模式列表来配置ES,例如"yyyy-MM-dd || yyyyy-MM-dd".这将允许一年中的四位数和五位数,但在六位数的年份中会失败.我可以通过添加另一个允许的模式来支持六位数字的年份:"yyyy-MM-dd || yyyyy-MM-dd || yyyyyy-MM-dd",但是它会失败七位数,依此类推.>

我是在监督什么,还是真的无法将ES(或使用模式字符串的DateTimeFormatter实例)配置为具有ISO标准所使用的至少四位数(但可能更多)的Year字段?/p>

解决方案

编辑

ISO 8601

由于您的要求是要符合ISO 8601,所以我们首先来看一下ISO 8601的内容(引自底部的链接):

因此, 20202-12-03 在ISO 8601中不是有效日期.如果您明确告知用户您接受(例如,不超过6位数字的年份),则 + 20202-12-03 -20202-12-03 有效,并且仅带有 + -符号.

接受多于4位数字

格式模式 uuuu-MM-dd 按照ISO 8601格式化和解析日期,年份也超过四位.例如:

  DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("uuuu-MM-dd");LocalDate date = LocalDate.parse("+ 20202-12-03",dateFormatter);System.out.println(解析为:" +日期);System.out.println(格式化为:" + date.format(dateFormatter)); 

输出:

对于带前缀的减号(而不是加号),它的工作原理非常相似.

接受超过4位无符号的数字

  yyyy-MM-dd || yyyyy-MM-dd || yyyyyy-MM-dd || yyyyyyy-MM-dd || yyyyyyyy-MM-dd || yyyyyyyyyy-MM-dd 

正如我所说,这与ISO 8601不同.我也同意您的看法,这并不好.很显然它将失败10位或更多位数字,但是无论如何都会失败:java.time处理-999 999 999到+999 999 999区间中的年.因此,尝试 yyyyyyyyyy-MM-dd (10位数字的年份)会给您带来严重的麻烦,除非在特殊情况下用户输入前导零的年份.

对不起,这是最好的. DateTimeFormatter 格式模式不支持您所要求的所有内容.没有(单个)模式可以为您提供0000到9999范围内的四位数年份,在此之后的年份中可以提供更多位数.

DateTimeFormatter 的文档说明了有关格式和解析年份的信息:

因此,无论您要查询哪种模式字母,您都将无法解析没有符号的数字较多的年份,而位数较少的年份将以这么多的数字加上前导零来格式化

原始答案

您可能可以摆脱 u-MM-dd 模式.演示:

 字符串formatPattern ="u-MM-dd";DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern(formatPattern);LocalDate normalDate = LocalDate.parse("2020-07-14",dateFormatter);字符串formattedAgain = normalDate.format(dateFormatter);System.out.format("LocalDate:%s.String:%s.%n",normalDate,formattedAgain);LocalDate largeDate = LocalDate.parse("20202-07-14",dateFormatter);字符串largeFormattedAgain = largeDate.format(dateFormatter);System.out.format("LocalDate:%s.String:%s.%n",largeDate,largeFormattedAgain); 

输出:

反算,但实际上,一个格式字母并不表示 1位数,而是尽可能多的位数.因此,上述情况的另一面是,将在1000年之前的年份中使用少于4位数字进行格式化.正如您所说,它不符合ISO 8601.

有关年份的图案字母 y u 之间的差异,请参阅底部的链接.

您可能还会考虑一个 M 和/或一个 d 接受 2020-007-014 ,但这又会导致格式化小于10的数字只能变成1位数字,例如 2020-7-14 ,这可能不是您想要的,并且再次与ISO不一致.

链接

I discovered this issue in connection with Elastic Search queries, but since the ES date format documentation links to the API documentation for the java.time.format.DateTimeFormatter class, the problem is not really ES specific.

Short summary: We are having problems with dates beyond year 9999, more exactly, years with more than 4 digits.

The documents stored in ES have a date field, which in the index descriptor is defined with format "date", which corresponds to "yyyy-MM-dd" using the pattern language from DateTimeFormatter. We are getting user input, validate the input using org.apache.commons.validator.DateValidator.isValid also with the pattern "yyyy-MM-dd" and if valid, we create an ES query with the user input. This fails with an execption if the user inputs something like 20202-12-03. The search term is probably not intentional, but the expected behaviour would be not to find anything and not that the software coughs up an exception.

The problem is that org.apache.commons.validator.DateValidator is internally using the older SimpleDateFormat class to verify if the input conforms to the pattern and the meaning of "yyyy" as interpreted by SimpleDateFormat is something like: Use at least 4 digits, but allow more digits if required. Creating a SimpleDateFormat with pattern "yyyy-MM-dd" will thus both parse an input like "20202-07-14" and similarly format a Date object with a year beyond 9999.

The new DateTimeFormatter class is much more strict and means with "yyyy" exactly four digits. It will fail to parse an input string like "20202-07-14" and also fail to format a Temporal object with a year beyond 9999. It is worth to notice that DateTimeFormatter is itself capable of handling variable-length fields. The constant DateTimeFormatter.ISO_LOCAL_DATE is for example not equivalent to "yyyy-MM-dd", but does, conforming with ISO8601, allow years with more than four digits, but will use at least four digits. This constant is created programmatically with a DateTimeFormatterBuilder and not using a pattern string.

ES can't be configured to use the constants defined in DateTimeFormatter like ISO_LOCAL_DATE, but only with a pattern string. ES also knows a list of predefined patterns, occasionally the ISO standard is also referred to in the documentation, but they seem to be mistaken and ignore that a valid ISO date string can contain five digit years.

I can configure ES with a list of multiple allowed date patterns, e.g "yyyy-MM-dd||yyyyy-MM-dd". That will allow both four and five digits in the year, but fail for a six digit year. I can support six digit years by adding yet another allowed pattern: "yyyy-MM-dd||yyyyy-MM-dd||yyyyyy-MM-dd", but then it fails for seven digit years and so on.

Am I overseeing something, or is it really not possible to configure ES (or a DateTimeFormatter instance using a pattern string) to have a year field with at least four digits (but potentially more) as used by the ISO standard?

解决方案

Edit

ISO 8601

Since your requirement is to conform with ISO 8601, let’s first see what ISO 8601 says (quoted from the link at the bottom):

So 20202-12-03 is not a valid date in ISO 8601. If you explicitly inform your users that you accept, say, up to 6 digit years, then +20202-12-03 and -20202-12-03 are valid, and only with the + or - sign.

Accepting more than 4 digits

The format pattern uuuu-MM-dd formats and parses dates in accordance with ISO 8601, also years with more than four digits. For example:

    DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern("uuuu-MM-dd");
    LocalDate date = LocalDate.parse("+20202-12-03", dateFormatter);
    System.out.println("Parsed: " + date);
    System.out.println("Formatted back: " + date.format(dateFormatter));

Output:

It works quite similarly for a prefixed minus instead of the plus sign.

Accepting more than 4 digits without sign

    yyyy-MM-dd||yyyyy-MM-dd||yyyyyy-MM-dd||yyyyyyy-MM-dd||yyyyyyyy-MM-dd||yyyyyyyyy-MM-dd

As I said, this disagrees with ISO 8601. I also agree with you that it isn’t nice. And obviously it will fail for 10 or more digits, but that would fail for a different reason anyway: java.time handles years in the interval -999 999 999 through +999 999 999. So trying yyyyyyyyyy-MM-dd (10 digit year) would get you into serious trouble except in the corner case where the user enters a year with a leading zero.

I am sorry, this is as good as it gets. DateTimeFormatter format patterns do not support all of what you are asking for. There is no (single) pattern that will give you four digit years in the range 0000 through 9999 and more digits for years after that.

The documentation of DateTimeFormatter says about formatting and parsing years:

So no matter which count of pattern letters you go for, you will be unable to parse years with more digits without sign, and years with fewer digits will be formatted with this many digits with leading zeroes.

Original answer

You can probably get away with the pattern u-MM-dd. Demonstration:

    String formatPattern = "u-MM-dd";

    DateTimeFormatter dateFormatter = DateTimeFormatter.ofPattern(formatPattern);

    LocalDate normalDate = LocalDate.parse("2020-07-14", dateFormatter);
    String formattedAgain = normalDate.format(dateFormatter);
    System.out.format("LocalDate: %s. String: %s.%n", normalDate, formattedAgain);

    LocalDate largeDate = LocalDate.parse("20202-07-14", dateFormatter);
    String largeFormattedAgain = largeDate.format(dateFormatter);
    System.out.format("LocalDate: %s. String: %s.%n", largeDate, largeFormattedAgain);

Output:

Counter-intuituvely but very practically one format letter does not mean 1 digit but rather as many digits as it takes. So the flip side of the above is that years before year 1000 will be formatted with fewer than 4 digits. Which, as you say, disagrees with ISO 8601.

For the difference between pattern letter y and u for year see the link at the bottom.

You might also consider one M and/or one d to accept 2020-007-014, but again, this will cause formatting into just 1 digit for numbers less than 10, like 2020-7-14, which probably isn’t what you want and again disagrees with ISO.

Links

这篇关于弹性搜寻和Y10k(超过4位数字的年份)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 22:42