问题描述
我想一次从Python的文件/流中读取多个JSON对象.不幸的是,json.load()
只是.read()
s直到文件结束;似乎没有任何方法可以使用它来读取单个对象或懒惰地遍历这些对象.
I'd like to read multiple JSON objects from a file/stream in Python, one at a time. Unfortunately json.load()
just .read()
s until end-of-file; there doesn't seem to be any way to use it to read a single object or to lazily iterate over the objects.
有没有办法做到这一点?使用标准库将是理想的选择,但是如果有第三方库,我会改用它.
Is there any way to do this? Using the standard library would be ideal, but if there's a third-party library I'd use that instead.
此刻,我将每个对象放在单独的行上并使用json.loads(f.readline())
,但是我真的希望不需要这样做.
At the moment I'm putting each object on a separate line and using json.loads(f.readline())
, but I would really prefer not to need to do this.
import my_json as json
import sys
for o in json.iterload(sys.stdin):
print("Working on a", type(o))
in.txt
{"foo": ["bar", "baz"]} 1 2 [] 4 5 6
示例会话
$ python3.2 example.py < in.txt
Working on a dict
Working on a int
Working on a int
Working on a list
Working on a int
Working on a int
Working on a int
推荐答案
这是一个非常简单的解决方案.秘诀是尝试,失败并使用异常中的信息正确解析.唯一的限制是该文件必须可搜索.
Here's a much, much simpler solution. The secret is to try, fail, and use the information in the exception to parse correctly. The only limitation is the file must be seekable.
def stream_read_json(fn):
import json
start_pos = 0
with open(fn, 'r') as f:
while True:
try:
obj = json.load(f)
yield obj
return
except json.JSONDecodeError as e:
f.seek(start_pos)
json_str = f.read(e.pos)
obj = json.loads(json_str)
start_pos += e.pos
yield obj
只是注意到这仅适用于Python> = 3.5.对于更早的版本,失败会返回ValueError,并且您必须从字符串中解析出位置,例如
just noticed that this will only work for Python >=3.5. For earlier, failures return a ValueError, and you have to parse out the position from the string, e.g.
def stream_read_json(fn):
import json
import re
start_pos = 0
with open(fn, 'r') as f:
while True:
try:
obj = json.load(f)
yield obj
return
except ValueError as e:
f.seek(start_pos)
end_pos = int(re.match('Extra data: line \d+ column \d+ .*\(char (\d+).*\)',
e.args[0]).groups()[0])
json_str = f.read(end_pos)
obj = json.loads(json_str)
start_pos += end_pos
yield obj
这篇关于如何在Python中从文件/流中懒惰地读取多个JSON值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!