使用python将JSON保存到HDFS

使用python将JSON保存到HDFS

本文介绍了使用python将JSON保存到HDFS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个python脚本,当前正在访问一个返回JSON的API。然后它接受JSON字符串并将其保存为本地文件系统上的文件,然后将其手动移入HDFS。我想改变这个,所以我的python脚本直接保存到HDFS,而不是先打到本地文件系统。我目前正在尝试使用和HDFS DFS命令来保存文件,但我不认为复制命令是正确的方式,因为它不是一个文件,而是一个JSON字符串,当我试图保存它时。 / p>

当前代码

  import urllib2 
import json
import os

f = urllib2.urlopen('RESTful_API_URL.json')
json_string = json.loads(f.read()。decode('utf-8'))
with open('\home\user\filename.json','w')作为outfile:
json.dump(json_string,outfile)

新代码

  f = urllib2.urlopen(' RESTful_API_URL.json')
json_string = json.loads(f.read()。decode('utf-8'))
os.environ ['json_string'] = json.dump(json_string)
os.system('hdfs dfs -cp -f $ json_string hdfs / user / test')


解决方案

我认为问题与此线程相同。

首先,这个命令可以将stdin重定向到hdfs文件,

  hadoop fs -put  -  / path / to / file / in / hdfs .txt 

然后,您可以在python中执行此操作,

  os.system('echo%s| hadoop fs -put  -  /path/to/file/in/hdfs.txt'%(json.dump(json_string)))


I have a python script which currently accesses an API which returns JSON. It then takes the JSON string and saves it off as a file on the local file system, where i then move it into HDFS manually. I would like to change this so my python script is saving directly to HDFS instead of hitting the local file system first. I am currently trying to save the file using and HDFS DFS command but I don't think the copy command is the correct way to do this because it isn't a file but rather a JSON string when I am trying to save it.

Current Code

import urllib2
import json
import os

f = urllib2.urlopen('RESTful_API_URL.json')
json_string = json.loads(f.read().decode('utf-8'))
with open('\home\user\filename.json', 'w') as outfile:
    json.dump(json_string,outfile)

New Code

f = urllib2.urlopen('RESTful_API_URL.json')
json_string = json.loads(f.read().decode('utf-8'))
os.environ['json_string'] = json.dump(json_string)
os.system('hdfs dfs -cp -f $json_string hdfs/user/test')
解决方案

I think the problem is the same with this thread Stream data into hdfs directly without copying.

Firstly, this command can redirect stdin to hdfs file,

hadoop fs -put - /path/to/file/in/hdfs.txt

Then, you can do this in python,

os.system('echo "%s" | hadoop fs -put - /path/to/file/in/hdfs.txt' %(json.dump(json_string)))

这篇关于使用python将JSON保存到HDFS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-01 22:59