问题描述
我有一个python脚本,当前正在访问一个返回JSON的API。然后它接受JSON字符串并将其保存为本地文件系统上的文件,然后将其手动移入HDFS。我想改变这个,所以我的python脚本直接保存到HDFS,而不是先打到本地文件系统。我目前正在尝试使用和HDFS DFS命令来保存文件,但我不认为复制命令是正确的方式,因为它不是一个文件,而是一个JSON字符串,当我试图保存它时。 / p>
当前代码
import urllib2
import json
import os
f = urllib2.urlopen('RESTful_API_URL.json')
json_string = json.loads(f.read()。decode('utf-8'))
with open('\home\user\filename.json','w')作为outfile:
json.dump(json_string,outfile)
新代码
f = urllib2.urlopen(' RESTful_API_URL.json')
json_string = json.loads(f.read()。decode('utf-8'))
os.environ ['json_string'] = json.dump(json_string)
os.system('hdfs dfs -cp -f $ json_string hdfs / user / test')
我认为问题与此线程相同。
首先,这个命令可以将stdin重定向到hdfs文件,
hadoop fs -put - / path / to / file / in / hdfs .txt
然后,您可以在python中执行此操作,
os.system('echo%s| hadoop fs -put - /path/to/file/in/hdfs.txt'%(json.dump(json_string)))
I have a python script which currently accesses an API which returns JSON. It then takes the JSON string and saves it off as a file on the local file system, where i then move it into HDFS manually. I would like to change this so my python script is saving directly to HDFS instead of hitting the local file system first. I am currently trying to save the file using and HDFS DFS command but I don't think the copy command is the correct way to do this because it isn't a file but rather a JSON string when I am trying to save it.
Current Code
import urllib2
import json
import os
f = urllib2.urlopen('RESTful_API_URL.json')
json_string = json.loads(f.read().decode('utf-8'))
with open('\home\user\filename.json', 'w') as outfile:
json.dump(json_string,outfile)
New Code
f = urllib2.urlopen('RESTful_API_URL.json')
json_string = json.loads(f.read().decode('utf-8'))
os.environ['json_string'] = json.dump(json_string)
os.system('hdfs dfs -cp -f $json_string hdfs/user/test')
I think the problem is the same with this thread Stream data into hdfs directly without copying.
Firstly, this command can redirect stdin to hdfs file,
hadoop fs -put - /path/to/file/in/hdfs.txt
Then, you can do this in python,
os.system('echo "%s" | hadoop fs -put - /path/to/file/in/hdfs.txt' %(json.dump(json_string)))
这篇关于使用python将JSON保存到HDFS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!