本文介绍了根据四个最大数值计算行业集中度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Stata中找到变量的四个最大数字,因为我想根据销售额计算不同群体的行业集中度.我有多年的公司销售业务,这些公司根据行业和国家属于不同的组.

I am trying to find the four biggest numbers of a variable in Stata, as I want to calculate the industry concentration of different groups based on sales. I have firms sales from multiple years and the firms belong to different groups based on industries and countries.

因此,我想找到:

我有大约10年的大约10000家公司:

I have about 10000 firms for about 10 years:

firms   country   year   industry   sales  
    a       usa      1          1     300  
    a       usa      2          1    4000  
    b       ger      1          1     200  
    b       ger      2          1     400  
    c       usa      1          1     100  
    c       usa      2          1     300  
    d       usa      1          1     400  
    d       usa      2          1     200  
    e       usa      1          1    7000  
    e       usa      2          1     900  
    f       ger      1          2     100  
    f       ger      2          2     700  
    h       ger      1          2     700  
    h       ger      2          2     600   

我知道如何找到每个行业-国家/地区-年的销售总额:

I know how to find the sum of sales per industry-country-year-group:

bysort country industry year: egen sum_sales = sum(sales)

推荐答案

四个最大的和是

bysort country industry year (sales): generate four_biggest_sales = sales[_N] + ///
                                      sales[_N-1] + sales[_N-2] + sales[_N-3] 

前提是没有任何sales值丢失.如果只有三个值,则需要

provided that no values of sales are missing. If there are only three values then you'd need

max(0, sales[_N-3]) 

对于两个值(一个值或一个不值)的情况进行类似的更正.

with similar corrections for the cases of two values, one value or none.

所有这些都源自by前缀的基本语法.请参阅《 Stata Journal》上的这篇文章以获取教程.

This all follows from basic syntax for the by prefix. See this article on Stata Journal for a tutorial.

如果有缺失,则可以通过

If there are missings, then they can be segregated by

generate isnotmiss = !missing(sales) 
bysort isnotmiss country industry year (sales): generate four_biggest_sales = sales[_N] + ///
                                                sales[_N-1] + sales[_N-2] + sales[_N-3] 

这篇关于根据四个最大数值计算行业集中度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-12 12:20