我有一个mongo数据库,其中包含我用pymongo处理的3000.000个文档。我想遍历所有文档,而不更新集合。
我试着用四根线:
cursors = db[collection].parallel_scan(CURSORS_NUM)
threads = [
threading.Thread(target=process_cursor, args=(cursor, )) for cursor in cursors
]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
以及进程光标函数:
def process_cursor(cursor):
for document in cursor:
dosomething(document)
在处理文档一段时间后,我收到错误消息:
File "extendDocuments.py", line 133, in process_cursor
for document in cursor:
File "/usr/local/lib/python2.7/dist-packages/pymongo/command_cursor.py", line 165, in next
if len(self.__data) or self._refresh():
File "/usr/local/lib/python2.7/dist-packages/pymongo/command_cursor.py", line 142, in _refresh
self.__batch_size, self.__id))
File "/usr/local/lib/python2.7/dist-packages/pymongo/command_cursor.py", line 110, in __send_message
*self.__decode_opts)
File "/usr/local/lib/python2.7/dist-packages/pymongo/helpers.py", line 97, in _unpack_response
cursor_id)
CursorNotFound: cursor id '116893918402' not valid at server
如果我使用find()来代替,我可以将超时设置为false以避免出现这种情况。
我能用并行扫描得到的游标做些类似的事情吗?
最佳答案
当前无法关闭ParallelCollectionScan返回的游标的空闲超时。我打开了一个功能请求:
https://jira.mongodb.org/browse/SERVER-15042