问题描述
我正在使用PyMongo来简单地遍历Mongo集合,但是我正努力处理大型Mongodb日期对象.
I am using PyMongo to simply iterate over a Mongo collection, but I'm struggling with handling large Mongodb date objects.
例如,如果我的集合中有一些数据,如下所示:
For example, if I have some data in a collection that looks like this:
"bad_data" : [
{
"id" : "id01",
"label" : "bad_data",
"value" : "exist",
"type" : "String",
"lastModified" : ISODate("2018-06-01T10:04:35.000Z"),
"expires" : Date(9223372036854775000)
}
]
我会做类似的事情:
from pymongo import MongoClient, database, cursor, collection
client = MongoClient('localhost')
db = client['db1']
db.authenticate('user', 'pass', source='admin')
collection = db['collection']
for i in collection:
# do something with i
并得到错误InvalidBSON: year 292278994 is out of range
有什么方法可以处理这个冗长的Date()
对象,而不会让bson摔倒?我意识到在Mongodb中有这样一个约会是很疯狂的,但是对此我无能为力,因为它不是我的数据.
Is there any way I can handle dealing with this rediculous Date()
object without bson falling over? I realise that having such a date in Mongodb is crazy but there is nothing I can do about this as it's not my data.
推荐答案
在PyMongo常见问题解答中实际上有一个关于该主题的部分:
There actually is a section in the PyMongo FAQ about this very topic:
PyMongo将BSON日期时间值解码为Python的datetime.datetime
实例. datetime.datetime
的实例限于datetime.MINYEAR
(通常为1)和datetime.MAXYEAR
(通常为9999)之间的年份.某些MongoDB驱动程序(例如PHP驱动程序)可以存储BSON日期时间,其年份值远远超出datetime.datetime
支持的年份.
PyMongo decodes BSON datetime values to instances of Python’s datetime.datetime
. Instances of datetime.datetime
are limited to years between datetime.MINYEAR
(usually 1) and datetime.MAXYEAR
(usually 9999). Some MongoDB drivers (e.g. the PHP driver) can store BSON datetimes with year values far outside those supported by datetime.datetime
.
因此,这里的基本约束是datetime.datetime
类型,对于驱动程序从BSON进行映射所实现的约束,尽管这可能是荒谬的",但对于其他语言来说,创建这样的日期值也是有效的.
So the basic constraint here is on the datetime.datetime
type as implemented for the mapping from BSON by the driver, and though it might be "ridiculous" it's valid for other languages to create such a date value.
常见问题解答中指出,您的一般解决方法是:
As pointed to in the FAQ your general workarounds are:
-
处理有问题的BSON日期.虽然有效,但可能不是任何人/任何人将其最初存储的真实"意图.
Deal with the offending BSON Date. Whilst valid to store, it possibly was not the "true" intention of whomever/whatever stored it in the first place.
在代码中添加日期范围"条件以过滤超出范围"日期:
Add a "date range" condition to your code to filter "out of range" dates:
result = db['collection'].find({
'expires': { '$gte': datetime.min, '$lte': datetime.max }
})
for i in result:
# do something with i
如果不需要进一步处理的数据,请忽略投影中的违规日期字段:
Omit the offending date field in projection if you don't need the data in further processing:
result = db['collection'].find({ }, projection={ 'expires': False })
for i in result:
# do something with i
当然,'expires'
的名称暗示了该值的原始意图是一个到现在为止的日期,直到它永远不会出现,并且该数据的原始作者(而且很可能是当前代码仍在写它) )不了解"Python"日期约束.因此,在所有文档中以及在任何代码仍在编写的位置降低"该数字可能很安全.
Certainly 'expires'
as a name suggests the original intent of the value was a date so far into the future that it was never going to come about, with the original author of that data ( and very possibly current code still writing it ) not being aware of the "Python" date constraint. So it's probably quite safe to "lower" that number in all documents and where any code is still writing it.
这篇关于通过PyMongo游标迭代抛出InvalidBSON:年份超出范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!