问题描述
这使我困惑了一段时间,但我还没有找到答案.
This is something that has puzzled me for some time and I have yet to find an answer.
我处于一种情况,我正在对(据称)结构相似的文件应用标准化的数据清理过程,每年一次.我有如下声明:
I am in a situation where I am applying a standardized data cleaning process to (supposedly) similarly structured files, one file for each year. I have a statement such as the following:
replace field="Plant" if field=="Plant & Machinery"
这是基于第一年的数据文件编写原始代码的结果.然后,我对代码进行了概括,以遍历多年的数据.如果在第3年,该变量中的类似值被编码为"Plant and MachInery"
,那么问题就变成了,由于文本字符串的差异,上面的代码行将不会进行预期的更改,但不会导致错误提示未进行更改.
Which was a result of the original code-writing based on the data file for year 1. Then I generalize the code to loop through the years of data. The problem becomes if in year 3, the analogous value in that variable was coded as "Plant and MachInery "
, such that the code line above would not make the intended change due to the difference in the text string, but not result in an error alerting the change was not made.
我要得到的是某种确认,即> 0个观察实际上满足了在循环中执行代码的每个实例的条件,否则返回错误.修剪,删除空格和标准化文本大小写的任何组合都不是解决方法.同时,我不想在每个有条件的 replace
之前添加 count if
,然后添加 assert
语句,因为这变得非常庞大.
What I am after is some sort of confirmation that >0 observations actually satisfied the condition each instance the code is executed in the loop, otherwise return an error. Any combination of trimming, removing spaces, and standardizing the text case are not workaround options. At the same time, I don't want to add a count if
and then assert
statement before every conditional replace
as that becomes quite bulky.
除了转到原始文件以确保变量值标准化外,还有什么方法可以像我试图描述的那样即时"执行此验证?也许只是编写一个结合了 count ,
assert
和 replace
的自定义程序?
Aside from going to the raw files to ensure the variable values are standardized, is there any way to do this validation "on the fly" as I have tried to describe? Maybe just write a custom program that combines a count if
, assert
and replace
?
推荐答案
这种想法偶尔浮出水面,即 replace
应该返回已更改的观察值,但是有充分的理由不这样做,尤其是它无论如何都不是 r-class 或 e-class 命令,重要的是不要改变它的工作方式,因为这可能会破坏无数程序和 do 文件.
The idea has surfaced occasionally that replace
should return the number of observations changed, but there are good reasons why not, notably that it is not a r-class or e-class command any way and it's quite important not to change the way it works because that could break innumerable programs and do-files.
因此,我认为任何答案的实质是您必须建立自己的监视过程,计算已更改(或将要更改)的值数量.
So, I think the essence of any answer is that you have to set up your own monitoring process counting how many values have (or would be) changed.
一种模式是-在处理 current
变量时:
One pattern is -- when working on a current
variable:
gen was = .
foreach ... {
...
replace was = current
replace current = ...
qui count if was != current
<use the result>
}
这篇关于确认条件陈述适用于Stata中的> 0观测值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!