问题描述
说我有这样的数据:
group value
1 fox
1 fox
1 fox
2 dog
2 cat
3 frog
3 frog
4 dog
4 dog
我希望能够判断 value
的所有值在 group
中是否相同.另一种查看方式是,我是否可以创建一个新变量,其中包含组内 value
的所有唯一值,如下所示:
I want to be able to tell if all values of value
are the same within group
. Another way to see this is if I could create a new variable that contains all unique values of value
within group like the following:
group value all_values
1 fox fox
1 fox fox
1 fox fox
2 dog dog cat
2 cat dog cat
3 frog frog
3 frog frog
4 dog dog
4 dog dog
正如我们所见,除了组 2
之外的所有组都只有一个不同的 value
条目.
As we see, all groups except group 2
have only one distinct entry for value
.
我认为可以做类似事情(但没有那么好)的一种方法是执行以下操作:
One way I thought that a similar thing (but not as good) could be done is to do the following:
bys group: egen tag = tag(value)
bys group: egen sum = sum(tag)
然后根据 sum
的值,我可以确定是否有多个条目.
And then based on the value of sum
I could determine if there were more than one entry.
但是,egen 标记不适用于 bysort
.有没有其他有效的方式来获取我需要的信息?
However, egen tag does not work with bysort
. Is there any other efficient way to get the information I need?
推荐答案
有几种方法可以做到这一点.一种是:
There are several ways to do this. One is:
clear
set more off
input ///
group str5 value
1 fox
1 fox
1 fox
2 dog
2 cat
3 frog
3 frog
4 dog
4 dog
end
*-----
bysort group (value) : gen onevalue = value[1] == value[_N]
list, sepby(group)
假设你有遗漏,但想忽略它们(不是drop
它们);然后以下工作:
Suppose you have missings, but want to ignore them (not drop
them); then the following works:
clear
set more off
input ///
group str5 value
1 fox
1 fox
1 fox
2 dog
2 cat
3 frog
3 frog
4 dog
4 dog
5 ox
5 ox
5
6 cow
6 goat
6
end
*-----
encode value, gen(value2)
bysort group (value2) : replace value2 = value2[_n-1] if missing(value2)
by group: gen onevalue = value2[1] == value2[_N]
list, sepby(group)
另请参阅此常见问题解答,其技术与您的原始策略相似.
See also this FAQ, which has technique that resembles your original strategy.
这篇关于如何查看组中的所有值是否唯一/识别那些不是的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!