,它将列中的第一个非空值赋给 _guess_ datetime_format 。这将尝试构建日期时间格式的字符串以用于将来的解析。 (在这里我的答案在它可以识别的格式上方有更多详细信息。)This calls _guess_datetime_format_for_array, which takes the first non-null value in the column and gives it to _guess_datetime_format. This tries to build a datetime format string to use for future parsing. (My answer here has more detail above the formats it is able recognise.)幸运的是,YYYY-MM-DD格式是此功能可以识别的格式。更为幸运的是,这种特殊的格式可以通过熊猫码快速访问!Fortunately, the YYYY-MM-DD format is one that can be recognised by this function. Even more fortunately, this particular format has a fast-path through the pandas code!您可以看到熊猫集 infer_datetime_format 回到 False 此处:You can see pandas sets infer_datetime_format back to False here:if format is not None: # There is a special fast-path for iso8601 formatted # datetime strings, so in those cases don't use the inferred # format because this path makes process slower in this # special case format_is_iso8601 = _format_is_iso(format) if format_is_iso8601: require_iso8601 = not infer_datetime_format format = None这允许代码采用与上述相同的路径到 parse_iso_8601_datetime 函数。This allows the code to take the same path as above to the parse_iso_8601_datetime function.我们提供了一个函数解析日期,因此pandas执行此代码块。We've provided a function to parse the date with, so pandas executes this code block.但是,这在内部引发异常:However, this raises as exception internally:strptime() argument 1 must be str, not numpy.ndarray立即发现异常,pandas退回到使用 try_parse_dates ,然后调用 to_datetime 。This exception is immediately caught, and pandas falls back to using try_parse_dates before calling to_datetime. try_parse_dates 意味着不是在数组上调用,而是对此循环:try_parse_dates means that instead of being called on an array, the lambda function is called repeatedly for each value of the array in this loop:for i from 0 <= i < n: if values[i] == '': result[i] = np.nan else: result[i] = parse_date(values[i]) # parse_date is the lambda function尽管正在编译代码,但要付出对Python函数调用的代价码。与上面的其他方法相比,这非常慢。Despite being compiled code, we pay the penalty of having function calls to Python code. This makes it very slow in comparison to the other approaches above.回到 to_datetime ,我们现在有了一个对象装有 datetime 对象的数组。我们再次点击 array_to_datetime ,但这一次熊猫看到日期对象并使用另一个函数( pydate_to_dt64 )将其设置为datetime64对象。Back in to_datetime, we now have an object array filled with datetime objects. Again we hit array_to_datetime, but this time pandas sees a date object and uses another function (pydate_to_dt64) to make it into a datetime64 object.减速的原因实际上是由于重复调用了lambda函数。The cause of the slowdown is really due to the repeated calls to the lambda function.系列 s 在MM / DD中具有日期字符串/ YYYY格式。The Series s has date strings in the MM/DD/YYYY format.这不是 ISO8601格式。 pd.to_datetime(s,infer_datetime_format = False)尝试使用 parse_iso_8601_datetime 失败,但是c $ c> ValueError 。在此处处理此错误>:熊猫将使用 parse_datetime_string 。这意味着 dateutil.parser.parse 用于将字符串转换为日期时间。这就是为什么这种情况下速度很慢的原因:在循环中重复使用Python函数。This is not an ISO8601 format. pd.to_datetime(s, infer_datetime_format=False) tries to parse the string using parse_iso_8601_datetime but this fails with a ValueError. The error is handled here: pandas is going to use parse_datetime_string instead. This means that dateutil.parser.parse is used to convert the string to datetime. This is why it is slow in this case: repeated use of a Python function in a loop. pd.to_datetime( s,format ='%m /%d /%Y')和 pd.to_datetime(s,infer_datetime_format = True) 。后者使用 _guess_datetime_format_for_array 再次推断MM / DD / YYYY格式。然后都击中 array_strptime 此处:There's not much difference between pd.to_datetime(s, format='%m/%d/%Y') and pd.to_datetime(s, infer_datetime_format=True) in terms of speed. The latter uses _guess_datetime_format_for_array again to infer the MM/DD/YYYY format. Both then hit array_strptime here:if format is not None: ... if result is None: try: result = array_strptime(arg, format, exact=exact, errors=errors) array_strptime 是一种快速的Cython函数,用于将字符串数组解析为具有特定格式的datetime结构。array_strptime is a fast Cython function for parsing an array of strings to datetime structs given a specific format. 这篇关于推断日期格式与传递解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!