python - Pandas -集团的改变值(value)

如果行的点数不足，则需要更改行的组标签的值。
例如，

+-----+
|c1|c2|
+-----+
|A |1 |
|A |2 |
|B |1 |
|A |2 |
|E |5 |
|E |6 |
|W |1 |
+-----+

如果我要对c2内的值进行分组，并且每组中的最小点数必须大于或等于2。

c2:
1 : count(c1) = 3
2 : count(c1) = 2
5 : count(c1) = 1
6 : count(c1) = 1

显然，第5组和第6组各只有1个元素，所以我想将这些行的c2值重新标记为-1。

可以在下面看到。

+-----+
|c1|c2|
+-----+
|A |1 |
|A |2 |
|B |1 |
|A |2 |
|E |-1|
|E |-1|
|W |1 |
+-----+

这是我编写的代码，但是它没有更新数据框。

labels = df["c2"].unique()
for l in labels:
    group_size = df[DB["c2"]==l].shape[0]
    if group_size<=minPts:
        df[df["c2"]==l]["c2"] = -1

最佳答案

您可以使用value_counts，然后使用mask通过isin过滤和最后设置值：

s = df['c2'].value_counts()
s = s.index[s < 2]
print (s)
Int64Index([6, 5], dtype='int64')

df.loc[df['c2'].isin(s), 'c2'] = -1
print (df)
  c1  c2
0  A   1
1  A   2
2  B   1
3  A   2
4  E  -1
5  E  -1
6  W   1

详情：

print (df['c2'].value_counts())
1    3
2    2
6    1
5    1
Name: c2, dtype: int64

print (df['c2'].isin(s))
0    False
1    False
2    False
3    False
4     True
5     True
6    False
Name: c2, dtype: bool

关于python - Pandas -集团的改变值(value)，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/47771894/