问题描述
我正在尝试在数据帧的4个特定列(即字符串/对象类型)中用"填充NA.我可以在fillna()时将这些列分配给新变量,但是当我填充fillna()时,基础数据不会更改.
I'm trying to fill NAs with "" on 4 specific columns in a data frame that are string/object types. I can assign these columns to a new variable as I fillna(), but when I fillna() inplace the underlying data doesn't change.
a_n6 = a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("")
a_n6
给我:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1542 entries, 0 to 3611
Data columns (total 4 columns):
PROV LAST 1542 non-null values
PROV FIRST 1542 non-null values
PROV MID 1542 non-null values
SPEC NM 1542 non-null values
dtypes: object(4)
但是
a_n6[["PROV LAST", "PROV FIRST", "PROV MID", "SPEC NM"]].fillna("", inplace=True)
a_n6
给我:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1542 entries, 0 to 3611
Data columns (total 7 columns):
NPI 1103 non-null values
PIN 1542 non-null values
PROV FIRST 1541 non-null values
PROV LAST 1542 non-null values
PROV MID 1316 non-null values
SPEC NM 1541 non-null values
flag 439 non-null values
dtypes: float64(2), int64(1), object(4)
只有一排,但仍然令人沮丧.我在做什么错了?
It's just one row, but still frustrating. What am I doing wrong?
推荐答案
使用dict
作为fillna()
的value
自变量正如@rhkarls在@Jeff的答案中的评论中所述,使用索引到列列表的.loc
将不支持inplace
操作,这也让我感到沮丧.这是一种解决方法.
Use a dict
as the value
argument to fillna()
As mentioned in the comment by @rhkarls on @Jeff's answer, using .loc
indexed to a list of columns won't support inplace
operations, which I too find frustrating. Here's a workaround.
示例:
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,3,4,np.nan],
'b':[6,7,8,np.nan,np.nan],
'x':[11,12,13,np.nan,np.nan],
'y':[16,np.nan,np.nan,19,np.nan]})
print(df)
# a b x y
#0 1.0 6.0 11.0 16.0
#1 2.0 7.0 12.0 NaN
#2 3.0 8.0 13.0 NaN
#3 4.0 NaN NaN 19.0
#4 NaN NaN NaN NaN
假设我们只想fillna
仅用于x
和y
,不 a
和b
.
Let's say we want to fillna
for x
and y
only, not a
and b
.
我希望使用.loc
可以正常工作(就像在作业中一样),但是不能使用,如前所述:
I would expect using .loc
to work (as in an assignment), but it doesn't, as mentioned earlier:
# doesn't work
df.loc[:,['x','y']].fillna(0, inplace=True)
print(df) # nothing changed
但是,文档表示, fillna()
的value
参数可以是:
However, the documentation says that the value
argument to fillna()
can be:
事实证明,使用值的字典将起作用:
It turns out that using a dict of values will work:
# works
df.fillna({'x':0, 'y':0}, inplace=True)
print(df)
# a b x y
#0 1.0 6.0 11.0 16.0
#1 2.0 7.0 12.0 0.0
#2 3.0 8.0 13.0 0.0
#3 4.0 NaN 0.0 19.0
#4 NaN NaN 0.0 0.0
此外,如果子集中有很多列,则可以使用dict理解,如:
Also, if you have a lot of columns in your subset, you could use a dict comprehension, as in:
df.fillna({x:0 for x in ['x','y']}, inplace=True) # also works
这篇关于 pandas 不会到位fillna()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!