我有一本巨蟒的口述
{'609232972': 4, '975151075': 4, '14247572': 4, '2987788788': 4, '3064695250': 2}
如何在rdd中直接加载它,而不丢失键值对?
当我这样装的时候
usr_group = sc.parallelize(partition)
print(usr_group.take(5))
我只是把键值对分解,然后给出
['609232972', '975151075', '14247572', '2987788788', '3064695250']
我期待着RDD闯入
{'609232972': 4, '975151075': 4, '14247572': 4, '2987788788': 4, '3064695250': 2}
这样我可以一起处理键值对
最佳答案
不确定您希望rdd作为一行有什么,但有三个选项:
my_dict = {'609232972': 4, '975151075': 4, '14247572': 4, '2987788788': 4, '3064695250': 2}
rdd1 = sc.parallelize([my_dict])
rdd2 = sc.parallelize(list(my_dict.iteritems()))
rdd3 = rdd2.map(lambda x: (dict([x])))
print rdd1.collect()
print rdd2.take(4)
print rdd3.take(4)
[{2987788788':4,'975151075':4,'3064695250':2,'14247572':4,
'609232972':4}]
[('2987788788',4),('975151075',4),('3064695250',2),('14247572',
4)]
[{'2987788788':4},{'975151075':4},{'3064695250':2},{'14247572':
4}]