python - 如果单元格位于顶部或底部x％，则转换为NaN

我希望通过删除特定列中顶部和底部大约5％的数据来修剪数据框。错误的异常值使我无法有效使用数据。

数据框具有“名称”列和其他一些非数字列，因此我希望能够选择特定的列来修剪df。

我认为，如果单元格的值是最大或最小x％的值，则将其转换为NaN是一种有效的方法，但如果它们也起作用，我也欢迎其他方法。

这是我要执行的操作的一个示例：

for column in df.columns:
    top = column.quantile(0.95)
    bottom = column.quantile(0.05)
    for cell in column:
        if (cell >= top)|(cell <= bottom):
            cell = np.NaN

最佳答案

我认为您想要between。另外，您可以将数组传递给quantile()：

for column in [your_list_of_columns]:
    bottom, top = df[column].quantile([0.05,0.95])

    df[column] = df[column].where(df[column].between(bottom, top))

关于python - 如果单元格位于顶部或底部x％，则转换为NaN，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/58417575/