不确定如何描述这一点,因此这里是示例:

这是原始文件:

original = pd.DataFrame({'a':[{1:'hi',2:'there'}],'b':[1]})

   a                      b
0  {1: 'hi', 2: 'there'}  1


这是预期的输出,并重新标记了列:

expected = pd.DataFrame({'numbers':[1,2],'text':['hi','there'],'b':[1,1]})

   b  numbers   text
0  1        1     hi
1  1        2  there


编辑:

我试图简化问题,并且解决方案起作用了,但是当应用于我的数据时却没有。这是我用来避免通信丢失的数据:

record_1 = {'1': {
                 'url': 'https://www.politico.com/magazine',
                 'title': 'Worst case '},
           '2': {
                 'url': 'https://www.nbcnews.com/pol',
                 'title': 'Bad Night '},
           '3': {
                 'url': 'https://www.usatoday.com/stor',
                 'title': "On the anniversary"
                 }}
record_2 = {'1': {
                 'url': 'https://www.nytimes.com/maga',
                  'title': 'Bad Things Happ '},
            '2': {
                  'url': 'https://www.cnn.com/pols',
                  'title': 'Best Night '}}

original = pd.DataFrame([[1,record_1],[2,record_2]],columns=['position','news_results'])


   position                                       news_results
0         1  {'1': {'title': 'Worst case ', 'url': 'https:/...
1         2  {'1': {'title': 'Bad Things Happ ', 'url': 'ht...


这是预期的结果:

data = [[1,1,'https://www.politico.com/magazine','Worst case '],
       [1,2,'https://www.nbcnews.com/pol','Bad Night ',],
       [1,3,'https://www.usatoday.com/stor',"On the anniversary"],
       [2,1,'https://www.nytimes.com/maga','Bad Things Happ '],
       [2,2,'https://www.cnn.com/pols','Best Night ']]

expected = pd.DataFrame(data,columns=['position','sub_rank','url','title'])

   position  sub_rank                                url               title
0         1         1  https://www.politico.com/magazine         Worst case
1         1         2        https://www.nbcnews.com/pol          Bad Night
2         1         3      https://www.usatoday.com/stor  On the anniversary
3         2         1       https://www.nytimes.com/maga    Bad Things Happ
4         2         2           https://www.cnn.com/pols         Best Night

最佳答案

这是一种方法。...我仍然认为您可以使用更好的构造函数来达到预期的输出。

original.set_index('b').a.apply(pd.Series).stack().\
    reset_index(name='text').rename(columns={'level_1':'numbers'})
    Out[1623]:
       b  numbers   text
    0  1        1     hi
    1  1        2  there


编辑

original.set_index('position')['news_results'].apply(pd.Series).stack().apply(pd.Series).reset_index()
Out[1633]:
   position level_1               title                                url
0         1       1         Worst case   https://www.politico.com/magazine
1         1       2          Bad Night         https://www.nbcnews.com/pol
2         1       3  On the anniversary      https://www.usatoday.com/stor
3         2       1    Bad Things Happ        https://www.nytimes.com/maga
4         2       2         Best Night            https://www.cnn.com/pols

关于python - 将字典扩展到DataFrame中,然后将其添加到原始数据帧中,并带有新列和复制的原始数据,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/47187962/

10-12 17:09
查看更多