将新列添加到Python

将新列添加到Python

本文介绍了将新列添加到Python pandas 中的现有DataFrame中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame,其名称列和行以不连续的数字进行索引,例如:

I have a DataFrame with named columns and rows indexed with not- continuous numbers like from the code:

df1 = DataFrame(np.random.randn(10, 4), columns=['a', 'b', 'c', 'd'])
mask = df1.applymap(lambda x: x <-0.7)
df1 = df1[-mask.any(axis=1)]
sLength = len(df1['a'])
e = Series(np.random.randn(sLength))

我想在现有数据框中添加一个新列'e'更改数据框中的任何内容。 (这个系列总是和数据框一样长。)我尝试了不同版本的 join append merge ,但是我并没有像我想要的那样,只有错误最多。

I would like to add a new column, 'e', to the existing data frame and do not change anything in the data frame. (The series always got the same length as a dataframe.) I tried different versions of join, append, merge, but I did not get it as what I want, only errors at the most.

系列和数据框已经给出,上面的代码只是用一个例子来说明。

The series and data frame is already given and the above code is only to illustrate it with an example.

我相信有一些简单的方法,但是我无法弄清楚

I am sure there is some easy way to that, but I can't figure it out.

推荐答案

使用原始df1索引创建系列:

Use the original df1 indexes to create the series:

df1['e'] = Series(np.random.randn(sLength), index=df1.index)









编辑2015

有些报告使用此代码获取 SettingWithCopyWarning

但是,代码仍然使用当前的大熊猫版本0.16.1完美。



Edit 2015
Some reported to get the SettingWithCopyWarning with this code.
However, the code still runs perfect with the current pandas version 0.16.1.

>>> sLength = len(df1['a'])
>>> df1
          a         b         c         d
6 -0.269221 -0.026476  0.997517  1.294385
8  0.917438  0.847941  0.034235 -0.448948

>>> df1['e'] = p.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e
6 -0.269221 -0.026476  0.997517  1.294385  1.757167
8  0.917438  0.847941  0.034235 -0.448948  2.228131

>>> p.version.short_version
'0.16.1'

SettingWithCopyWarning 旨在通知Dataframe副本中的可能无效的分配。它不一定说你做错了(它可以触发误报),但从0.13.0它让你知道有更多的适当的方法为同一目的。然后,如果您收到警告,只需按照其建议:尝试使用.loc [row_index,col_indexer] =值而不是

The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead

>>> df1.loc[:,'f'] = p.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e         f
6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927
8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109
>>>

事实上,这是当前更有效的方法,因为

In fact, this is currently the more efficient method as described in pandas docs

编辑2017

如意见和@Alexander,目前最好的方法是将一个Series的值添加为DataFrame的新列,可以使用 assign

As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign:

df1 = df1.assign(e=p.Series(np.random.randn(sLength)).values)

这篇关于将新列添加到Python pandas 中的现有DataFrame中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!