问题描述
我正在寻找一个返回数据帧中元素位置的函数.- 数据框中的值存在重复项- 数据框约 10*2000- 该函数将使用 applymap() 应用于数据帧
# 初始数据框df = pandas.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})
示例:
get_position(2) 不清楚,因为它可能是R1"或R2".我是想知道是否有另一种方式让 python 知道哪个位置元素保持 - 可能在 applymap() 操作期间
df.rank(axis=1,pct=True)
#intial 数据框df_initial = pandas.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})
step1)
df_rank = df_initial.rank(axis=1,pct=True)
step2)
#根据各自值的百分比构建组定义函数103(x):如果 0.0
step3)
# 尝试获取相应值的列名称# 我的想法是确定每个值的位置,然后编写一个函数def get_column_name1(x)#返回值列名称
步骤 4)
#应用函数P1=[]P2=[]P3=[]P4=[]P5=[]P6=[]P7=[]P8=[]P9=[]P10=[]P11=[]df_rank.applymap(function103).head()
如果需要在 DataFrame 中按值索引或列名,请使用 numpy.where
用于位置,然后选择转换为 numpy 数组的所有索引或列值:
df = pd.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})i, c = np.where(df == 2)打印 (i, c)[0 1] [1 0]打印(df.index.values[i])[0 1]打印(df.columns.values[c])['R2''R1']
i, c = np.where(df == 2)df1 = df.rank(axis=1,pct=True)打印 (df1)R1 R2 R30 1.000000 0.666667 0.3333331 0.333333 0.666667 1.0000002 0.666667 1.000000 0.333333打印 (df1.iloc[i, c])R2 R10 0.666667 1.0000001 0.666667 0.333333打印 (df1.where(df == 2).dropna(how='all').dropna(how='all',axis=1))R1 R20 南 0.6666671 0.333333 南
或者:
out = df1.stack()[df.stack() == 2].rename_axis(('idx','cols')).reset_index(name='val')打印)idx 列值0 0 R2 0.6666671 1 R1 0.333333
您的函数的解决方案 - 需要通过 reshape 创建的一列 DataFrame 进行迭代并提取 Series.name,与列名相同:
def get_column_name1(x):返回 x.name
P1=[]P2=[]P3=[]P4=[]P5=[]P6=[]P7=[]P8=[]P9=[]P10=[]P11=[]定义函数103(x):如果 0.0
a = df_rank.stack().reset_index(level=0, drop=True).to_frame().apply(function103, axis=1)
打印 (P4)['R3','R1','R3']
Im searching for a function that Returns the Position of an element in a dataframe.- there is duplicates in the dataframe amongst the values- dataframe About 10*2000- the function will be applied on a dataframe using applymap()
# initial dataframe
df = pandas.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})
Edit:
df.rank(axis=1,pct=True)
EDIT2:
#intial dataframe
df_initial = pandas.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})
step1)
df_rank = df_initial.rank(axis=1,pct=True)
step2)
# Building Groups based on the percentage of the respective value
def function103(x):
if 0.0 <= x <= 0.1:
P1.append(get_column_name1(x))
return x
elif 0.1 < x <= 0.2:
P2.append(get_column_name1(x))
return x
elif 0.2 < x <= 0.3:
P3.append(get_column_name1(x))
return x
elif 0.3 < x <= 0.4:
P4.append(get_column_name1(x))
return x
elif 0.4 < x <= 0.5:
P5.append(get_column_name1(x))
return x
elif 0.5 < x <= 0.6:
P6.append(get_column_name1(x))
return x
elif 0.6 < x <= 0.7:
P7.append(get_column_name1(x))
return x
elif 0.7 < x <= 0.8:
P8.append(get_column_name1(x))
return x
elif 0.8 < x <= 0.9:
P9.append(get_column_name1(x))
return x
elif 0.9 < x <= 1.0:
P10.append(get_column_name1(x))
return x
else:
return x
step3)
# trying to get the columns Name of the the respective value
# my idea was to determine the Position of each value to then write a function
def get_column_name1(x)
#to return the values column Name
step 4)
# apply the function
P1=[]
P2=[]
P3=[]
P4=[]
P5=[]
P6=[]
P7=[]
P8=[]
P9=[]
P10=[]
P11=[]
df_rank.applymap(function103).head()
If need index or columns names by value in DataFrame use numpy.where
for positions and then select all index or columns values converted to numpy array:
df = pd.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})
i, c = np.where(df == 2)
print (i, c)
[0 1] [1 0]
print (df.index.values[i])
[0 1]
print (df.columns.values[c])
['R2' 'R1']
EDIT:
i, c = np.where(df == 2)
df1 = df.rank(axis=1,pct=True)
print (df1)
R1 R2 R3
0 1.000000 0.666667 0.333333
1 0.333333 0.666667 1.000000
2 0.666667 1.000000 0.333333
print (df1.iloc[i, c])
R2 R1
0 0.666667 1.000000
1 0.666667 0.333333
print (df1.where(df == 2).dropna(how='all').dropna(how='all', axis=1))
R1 R2
0 NaN 0.666667
1 0.333333 NaN
Or:
out = df1.stack()[df.stack() == 2].rename_axis(('idx','cols')).reset_index(name='val')
print (out)
idx cols val
0 0 R2 0.666667
1 1 R1 0.333333
EDIT:
Solution for your function - need iterate by one column DataFrame created by reshape and extract Series.name, what is same like column name:
def get_column_name1(x):
return x.name
P1=[]
P2=[]
P3=[]
P4=[]
P5=[]
P6=[]
P7=[]
P8=[]
P9=[]
P10=[]
P11=[]
def function103(x):
if 0.0 <= x[0] <= 0.1:
P1.append(get_column_name1(x))
return x
elif 0.1 < x[0] <= 0.2:
P2.append(get_column_name1(x))
return x
elif 0.2 < x[0] <= 0.3:
P3.append(get_column_name1(x))
return x
elif 0.3 < x[0] <= 0.4:
P4.append(get_column_name1(x))
return x
elif 0.4 < x[0] <= 0.5:
P5.append(get_column_name1(x))
return x
elif 0.5 < x[0] <= 0.6:
P6.append(get_column_name1(x))
return x
elif 0.6 < x[0] <= 0.7:
P7.append(get_column_name1(x))
return x
elif 0.7 < x[0] <= 0.8:
P8.append(get_column_name1(x))
return x
elif 0.8 < x[0] <= 0.9:
P9.append(get_column_name1(x))
return x
elif 0.9 < x[0] <= 1.0:
P10.append(get_column_name1(x))
return x
else:
return x
a = df_rank.stack().reset_index(level=0, drop=True).to_frame().apply(function103, axis=1)
print (P4)
['R3', 'R1', 'R3']
这篇关于确定数据框中元素的位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!