我试图分析呼叫概率和车辆距离之间的关系。
示例数据集(here csv)如下所示:

id  day         time    called  d
1   2009-06-24  1700    0       1037.6
1   2009-06-24  1710    1       1191.9
1   2009-06-24  1720    0       165.5

真正的数据集有1000万行。在10分钟的不同时间窗口中,有代表调用或不调用的位置的ids。
我想首先删除所有具有相同id的行,这些行在整个期间的任何日期都从未调用过。
然后剩下的行表示在给定时间的分析过程中某一天调用的ids。
我想创建一个变量,该变量在调用行中的值为0且在调用前一天(或小时、周、月,无论如何,但这里是天)同时等于-1+1后一天,等等。稍后,我将该变量与calleddistance一起用作输入,以便在不同位置进行分析和比较
我已经找了其他回答过的问题,但没有找到合适的答案。所以,请回答或指出一个问题。我正在使用Stata 13,但也欢迎用Postgres 9.3或R解决这个问题。
我需要对多个数据集重复此过程多次,因此理想情况下,我希望尽可能实现自动化。
更新:
Here is所需结果的示例:
id  day         time    called  d  newvar   newvar2
1   2009-06-24  1700    0   1037.6  null
1   2009-06-24  1710    1   1191.9  0        -2
1   2009-06-24  1720    0   165.5   -1
1   2009-06-25  1700    0   526.7   null
1   2009-06-25  1710    0   342.5   1        -1
1   2009-06-25  1720    1   416.1   0
1   2009-06-26  1700    0   428.3   null
1   2009-06-26  1710    1   240.7   2        0
1   2009-06-26  1720    0   228.7   1
1   2009-06-27  1700    0   282.5   null
1   2009-06-27  1710    0   182.1   3        1
1   2009-06-27  1720    0   195.5   2
2   2009-06-24  1700    0   198.0   -1
2   2009-06-24  1710    0   157.4   null
2   2009-06-24  1720    0   234.9   null
2   2009-06-25  1700    1   247.0   0

我添加了newvar2,因为某些位置可能在给定的时间窗口调用多次

最佳答案

在寻找Stata解决方案时,最好使用dataex(来自SSC)提供一个数据示例。
在数据按idtime排序(并按day进一步排序)之前,这个问题很难可视化。我没有将day变量转换为Stata数字日期,因为在构造时,字符串排序顺序与自然日期顺序匹配。
对于id time组中的每个呼叫,您似乎需要与呼叫日期相关的日期偏移量。这可以通过生成一个order变量来跟踪每个id time组中当前观测的索引,然后减去进行调用的观测的索引来完成。
由于每个时隙可以有多个调用,所以必须在数据中的任意给定时隙中循环调用的最大数量。
与您的解决方案相比,此解决方案生成的结果有一个不同之处:您似乎忽略了在2009-06-27中对1710的调用。
在下面的示例中,原始数据按id == 2排序,以便读者更好地了解发生了什么。

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10 day int time byte called float distance str4 newvar byte newvar2
1 "2009-06-24" 1700 0 1037.6 "null"  .
1 "2009-06-25" 1700 0  526.7 "null"  .
1 "2009-06-26" 1700 0  428.3 "null"  .
1 "2009-06-27" 1700 0  282.5 "null"  .
1 "2009-06-24" 1710 1 1191.9 "0"    -2
1 "2009-06-25" 1710 0  342.5 "1"    -1
1 "2009-06-26" 1710 1  240.7 "2"     0
1 "2009-06-27" 1710 0  182.1 "3"     1
1 "2009-06-24" 1720 0  165.5 "-1"    .
1 "2009-06-25" 1720 1  416.1 "0"     .
1 "2009-06-26" 1720 0  228.7 "1"     .
1 "2009-06-27" 1720 0  195.5 "2"     .
2 "2009-06-24" 1700 0    198 "-1"    .
2 "2009-06-25" 1700 1    247 "0"     .
2 "2009-06-26" 1700 0  188.7 "1"     .
2 "2009-06-27" 1700 0  203.5 "2"     .
2 "2009-06-24" 1710 0  157.4 "null"  .
2 "2009-06-25" 1710 0  221.3 "null"  .
2 "2009-06-26" 1710 0  283.8 "null"  .
2 "2009-06-27" 1710 1   91.7 "null"  .
2 "2009-06-24" 1720 0  234.9 "null"  .
2 "2009-06-25" 1720 0  249.6 "null"  .
2 "2009-06-26" 1720 0  279.7 "null"  .
2 "2009-06-27" 1720 0  198.2 "null"  .
3 "2009-06-24" 1700 0  156.1 "-1"    .
3 "2009-06-25" 1700 1   19.9 "0"     .
3 "2009-06-26" 1700 0  195.2 "1"     .
3 "2009-06-27" 1700 0  306.2 "2"     .
3 "2009-06-24" 1710 0  150.1 "null"  .
3 "2009-06-25" 1710 0  163.7 "null"  .
3 "2009-06-26" 1710 0  288.2 "null"  .
3 "2009-06-27" 1710 0  311.7 "null"  .
3 "2009-06-24" 1720 0  135.1 "-2"    .
3 "2009-06-25" 1720 0    186 "-1"    .
3 "2009-06-26" 1720 1  297.2 "0"     .
3 "2009-06-27" 1720 0  375.9 "1"     .
end

* order observations by date within a id time group
sort id time day
by id time: gen order = _n

* number of calls at any given time
by id time: gen call = sum(called)

* repeat enough to cover the max number of calls per time
sum call, meanonly
local n = r(max)
forvalues i = 1/`n' {
    // the index of the called observation in the id time group
    by id time: gen index = order if called & call == `i'

    // replicate the index for all observations in the id time group
    by id time: egen gindex = total(index)

    // the relative position of each obs in groups with a call
    gen wanted`i' = order - gindex if gindex > 0

    drop index gindex
}

list, sepby(id time) noobs compress

以及结果
. list, sepby(id time) noobs compress

  +----------------------------------------------------------------------------------------+
  | id          day   time   cal~d   dist~e   new~r   new~2   order   call   wan~1   wan~2 |
  |----------------------------------------------------------------------------------------|
  |  1   2009-06-24   1700       0   1037.6    null       .       1      0       .       . |
  |  1   2009-06-25   1700       0    526.7    null       .       2      0       .       . |
  |  1   2009-06-26   1700       0    428.3    null       .       3      0       .       . |
  |  1   2009-06-27   1700       0    282.5    null       .       4      0       .       . |
  |----------------------------------------------------------------------------------------|
  |  1   2009-06-24   1710       1   1191.9       0      -2       1      1       0      -2 |
  |  1   2009-06-25   1710       0    342.5       1      -1       2      1       1      -1 |
  |  1   2009-06-26   1710       1    240.7       2       0       3      2       2       0 |
  |  1   2009-06-27   1710       0    182.1       3       1       4      2       3       1 |
  |----------------------------------------------------------------------------------------|
  |  1   2009-06-24   1720       0    165.5      -1       .       1      0      -1       . |
  |  1   2009-06-25   1720       1    416.1       0       .       2      1       0       . |
  |  1   2009-06-26   1720       0    228.7       1       .       3      1       1       . |
  |  1   2009-06-27   1720       0    195.5       2       .       4      1       2       . |
  |----------------------------------------------------------------------------------------|
  |  2   2009-06-24   1700       0      198      -1       .       1      0      -1       . |
  |  2   2009-06-25   1700       1      247       0       .       2      1       0       . |
  |  2   2009-06-26   1700       0    188.7       1       .       3      1       1       . |
  |  2   2009-06-27   1700       0    203.5       2       .       4      1       2       . |
  |----------------------------------------------------------------------------------------|
  |  2   2009-06-24   1710       0    157.4    null       .       1      0      -3       . |
  |  2   2009-06-25   1710       0    221.3    null       .       2      0      -2       . |
  |  2   2009-06-26   1710       0    283.8    null       .       3      0      -1       . |
  |  2   2009-06-27   1710       1     91.7    null       .       4      1       0       . |
  |----------------------------------------------------------------------------------------|
  |  2   2009-06-24   1720       0    234.9    null       .       1      0       .       . |
  |  2   2009-06-25   1720       0    249.6    null       .       2      0       .       . |
  |  2   2009-06-26   1720       0    279.7    null       .       3      0       .       . |
  |  2   2009-06-27   1720       0    198.2    null       .       4      0       .       . |
  |----------------------------------------------------------------------------------------|
  |  3   2009-06-24   1700       0    156.1      -1       .       1      0      -1       . |
  |  3   2009-06-25   1700       1     19.9       0       .       2      1       0       . |
  |  3   2009-06-26   1700       0    195.2       1       .       3      1       1       . |
  |  3   2009-06-27   1700       0    306.2       2       .       4      1       2       . |
  |----------------------------------------------------------------------------------------|
  |  3   2009-06-24   1710       0    150.1    null       .       1      0       .       . |
  |  3   2009-06-25   1710       0    163.7    null       .       2      0       .       . |
  |  3   2009-06-26   1710       0    288.2    null       .       3      0       .       . |
  |  3   2009-06-27   1710       0    311.7    null       .       4      0       .       . |
  |----------------------------------------------------------------------------------------|
  |  3   2009-06-24   1720       0    135.1      -2       .       1      0      -2       . |
  |  3   2009-06-25   1720       0      186      -1       .       2      0      -1       . |
  |  3   2009-06-26   1720       1    297.2       0       .       3      1       0       . |
  |  3   2009-06-27   1720       0    375.9       1       .       4      1       1       . |
  +----------------------------------------------------------------------------------------+

关于postgresql - 在面板数据上按组以及时间和日期创建条件变量,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43705646/

10-16 17:14
查看更多