假设我有以下 Pandas 数据框
import pandas as pd
import numpy as np
df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
for i in df.index.values:
df.at[i, c]=np.arange(5).tolist()
这导致 df 的单元格是 numpy 数组
df
Out[16]:
A B C
0 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
1 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
2 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
3 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
我想计算数据帧的平均值,但它不起作用,因为每个单元格都被视为一个字符串。例如,
type(df.loc[0][0])
Out[19]: list
因此,如果我计算它的平均值,它返回 nan
df["Average"]= df.mean(axis=1)
df
Out[21]:
A B C Average
0 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
1 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
2 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
3 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
我的问题是,我如何将这个 df 转换回我可以使用的数值?
最佳答案
我认为将值转换为列的想法非常好,因为可以使用 Pandas 向量化函数:
df1 = pd.concat([pd.DataFrame(df[c].values.tolist()) for c in df.columns],
axis=1,
keys=df.columns)
df1.columns = ['{}{}'.format(i, j) for i, j in df1.columns]
print (df1)
A0 A1 A2 A3 A4 B0 B1 B2 B3 B4 C0 C1 C2 C3 C4
0 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
1 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
2 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
3 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
但是如果需要将所有列表的
mean
放在一起:df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
for i in df.index.values:
df.at[i, c]=np.arange(i+1).tolist()
print (df)
A B C
0 [0] [0] [0]
1 [0, 1] [0, 1] [0, 1]
2 [0, 1, 2] [0, 1, 2] [0, 1, 2]
3 [0, 1, 2, 3] [0, 1, 2, 3] [0, 1, 2, 3]
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
from itertools import chain
from statistics import mean
df['Average'] = [mean(list(chain.from_iterable(x))) for x in df.values.tolist()]
print (df)
A B C Average
0 [0] [0] [0] 0.0
1 [0, 1] [0, 1] [0, 1] 0.5
2 [0, 1, 2] [0, 1, 2] [0, 1, 2] 1.0
3 [0, 1, 2, 3] [0, 1, 2, 3] [0, 1, 2, 3] 1.5
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] 2.0
编辑:
如果值是字符串:
df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
for i in df.index.values:
df.at[i, c]=np.arange(5).tolist()
df=df.astype(str)
print (df)
A B C
0 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
1 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
2 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
3 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
df1 = pd.concat([df[c].str.strip('[]').str.split(', ', expand=True) for c in df.columns],
axis=1,
keys=df.columns).astype(float)
df1.columns = ['{}{}'.format(i, j) for i, j in df1.columns]
df1["Average"]= df1.mean(axis=1)
print (df1)
A0 A1 A2 A3 A4 B0 B1 B2 B3 B4 C0 C1 C2 C3 C4 \
0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
1 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
2 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
3 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
4 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
Average
0 2.0
1 2.0
2 2.0
3 2.0
4 2.0
关于python - 计算单元格为列表的 Pandas 数据框的平均值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/51355151/