python-3.x - 如何查找基于其他列的列值是被低估还是被高估？

因此，我正在尝试解决此 0 44525 Golden Mile 1 44859 Nagüeles 2 45465 Nagüeles 3 50685 Nagüeles 4 130728 Golden Mile 5 130856 Nagüeles 6 130857 Golden Mile 7 130897 Golden Mile 8 3484102 Marinha 9 3484124 Marinha 10 3485461 Marinha

因此，现 0 44525 Golden Mile 1 44859 Nagüeles 2 45465 Nagüeles 3 50685 Nagüeles 4 130728 Golden Mile 5 130856 Nagüeles 6 130857 Golden Mile 7 130897 Golden Mile 8 3484102 Marinha 9 3484124 Marinha 10 3485461 Marinha

已经被卡 pandas练习。我从Kaggle获得了房地产公司的数据集，数据框df看起来像这样。

           id           location       type     price House   4400000 House   2400000 House   1900000 Plot   4250000 House  32000000 Plot   2900000 House   3900000 House   3148000 Plot    478000 Plot   2200000 House   1980000 在，我必须根据列location和type找出哪个属性被低估或高估，以及哪个具有真实价格。所需的结果应如下所示：

       id           location       type     price   Over_val   Under_val    Norm_val House   4400000         0      0             1 House   2400000         0      0             1 House   1900000         0      0             1 Plot   4250000         0      1             0 House  32000000         1      0             0 Plot   2900000         0      1             0 House   3900000         0      0             1 House   3148000         0      0             1 Plot    478000         0      0             1 Plot   2200000         0      0             1 House   1980000         0      1             0 住了一段时间了。解决这个问题应该尝试什么逻辑？
                                    最佳答案            
            
            这是我的解决方案。说明包括在内嵌注释中。可能有一些方法可以减少步骤数；我也会有兴趣学习。

import pandas as pd

# Replace this with whatever you have to load your data. This is set up for a sample data file I used
df = pd.read_csv('my_sample_data.csv', encoding='latin-1')

# Mean by location - type
mdf = df.set_index('id').groupby(['location','type'])['price'].mean().rename('mean').to_frame().reset_index()
# StdDev by location - type
sdf = df.set_index('id').groupby(['location','type'])['price'].std().rename('sd').to_frame().reset_index()
# Merge back into the original dataframe
df = df.set_index(['location','type']).join(mdf.set_index(['location','type'])).reset_index()
df = df.set_index(['location','type']).join(sdf.set_index(['location','type'])).reset_index()

# Add the indicator columns
df['Over_val'] = 0
df['Under_val'] = 0
df['Normal_val'] = 0

# Update the indicators
df.loc[df['price'] > df['mean'] + 2 * df['sd'], 'Over_val'] = 1
df.loc[df['price'] < df['mean'] - 2 * df['sd'], 'Under_val'] = 1

df['Normal_val'] = df['Over_val'] + df['Under_val']
df['Normal_val'] = df['Normal_val'].apply(lambda x: 1 if x == 0 else 0)
关于python-3.x - 如何查找基于其他列的列值是被低估还是被高估？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/54469791/