问题描述
我想在Stata中找到变量的四个最大数字,因为我想根据销售额计算不同群体的行业集中度.我有多年的公司销售业务,这些公司根据行业和国家属于不同的组.
I am trying to find the four biggest numbers of a variable in Stata, as I want to calculate the industry concentration of different groups based on sales. I have firms sales from multiple years and the firms belong to different groups based on industries and countries.
因此,我想找到:
我有大约10
年的大约10000
家公司:
I have about 10000
firms for about 10
years:
firms country year industry sales
a usa 1 1 300
a usa 2 1 4000
b ger 1 1 200
b ger 2 1 400
c usa 1 1 100
c usa 2 1 300
d usa 1 1 400
d usa 2 1 200
e usa 1 1 7000
e usa 2 1 900
f ger 1 2 100
f ger 2 2 700
h ger 1 2 700
h ger 2 2 600
我知道如何找到每个行业-国家/地区-年的销售总额:
I know how to find the sum of sales per industry-country-year-group:
bysort country industry year: egen sum_sales = sum(sales)
推荐答案
四个最大的和是
bysort country industry year (sales): generate four_biggest_sales = sales[_N] + ///
sales[_N-1] + sales[_N-2] + sales[_N-3]
前提是没有任何sales
值丢失.如果只有三个值,则需要
provided that no values of sales
are missing. If there are only three values then you'd need
max(0, sales[_N-3])
对于两个值(一个值或一个不值)的情况进行类似的更正.
with similar corrections for the cases of two values, one value or none.
所有这些都源自by
前缀的基本语法.请参阅《 Stata Journal》上的这篇文章以获取教程.
This all follows from basic syntax for the by
prefix. See this article on Stata Journal for a tutorial.
如果有缺失,则可以通过
If there are missings, then they can be segregated by
generate isnotmiss = !missing(sales)
bysort isnotmiss country industry year (sales): generate four_biggest_sales = sales[_N] + ///
sales[_N-1] + sales[_N-2] + sales[_N-3]
这篇关于根据四个最大数值计算行业集中度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!