问题描述
我想知道计算美国两个邮政编码列之间以英里为单位的距离的最有效方法是使用R.
I was wondering what the most efficient method of calculating the distance in miles between two US zipcode columns would be using R.
我听说过Geosphere软件包,用于计算邮政编码之间的差异,但我并不完全了解它,并且想知道是否还有其他方法.
I have heard of the geosphere package for computing the difference between zipcodes but do not fully understand it and was wondering if there were alternative methods as well.
例如说我有一个看起来像这样的数据框.
For example say I have a data frame that looks like this.
ZIP_START ZIP_END
95051 98053
94534 94128
60193 60666
94591 73344
94128 94128
94015 73344
94553 94128
10994 7105
95008 94128
我想创建一个看起来像这样的新数据框.
I want to create a new data frame that looks like this.
ZIP_START ZIP_END MILES_DIFFERENCE
95051 98053 x
94534 94128 x
60193 60666 x
94591 73344 x
94128 94128 x
94015 73344 x
94553 94128 x
10994 7105 x
95008 94128 x
其中x是两个邮政编码之间的英里差.
Where x is the difference in miles between the two zipcodes.
计算此距离的最佳方法是什么?
What is the best method of calculating this distance?
这是创建示例数据框的R代码.
Here is the R code to create the example data frame.
df <- data.frame("ZIP_START" = c(95051, 94534, 60193, 94591, 94128, 94015, 94553, 10994, 95008), "ZIP_END" = c(98053, 94128, 60666, 73344, 94128, 73344, 94128, 7105, 94128))
如果您有任何疑问,请告诉我.
Please let me know if you have any questions.
任何建议都值得赞赏.
谢谢您的帮助.
推荐答案
这里有一个方便的R包,名为"zipcode".其中提供了邮政编码,城市,州和纬度和经度的表格.因此,一旦获得了这些信息,地理圈"便会出现.包可以计算点之间的距离.
There is a handy R package out there named "zipcode" which provides a table of zip code, city, state and the latitude and longitude. So once you have that information, the "geosphere" package can calculate the distance between points.
library(zipcode)
library(geosphere)
#dataframe need to be character arrays or the else the leading zeros will be dropped causing errors
df <- data.frame("ZIP_START" = c(95051, 94534, 60193, 94591, 94128, 94015, 94553, 10994, 95008),
"ZIP_END" = c(98053, 94128, 60666, 73344, 94128, 73344, 94128, "07105", 94128),
stringsAsFactors = FALSE)
data("zipcode")
df$distance_meters<-apply(df, 1, function(x){
startindex<-which(x[["ZIP_START"]]==zipcode$zip)
endindex<-which(x[["ZIP_END"]]==zipcode$zip)
distGeo(p1=c(zipcode[startindex, "longitude"], zipcode[startindex, "latitude"]), p2=c(zipcode[endindex, "longitude"], zipcode[endindex, "latitude"]))
})
关于输入数据框的列类的警告.邮政编码应为字符而不是数字,否则前导零将被丢弃,从而导致错误.
Warning about your column class for your input data frame. Zip codes should be a character and not numeric, otherwise leading zeros are dropped causing errors.
从distGeo返回的距离以米为单位,我将允许读者确定适当的单位转换为英里.
The return distance from distGeo is in meters, I will allow the reader to determine the proper unit conversion to miles.
更新
邮递区号档案似乎已被封存.有一个替换包:"zipcodeR"提供经度和纬度数据以及附加信息.
Update
The zipcode package appears to have been archived. There is a replacement package: "zipcodeR" which provides the longitude and latitude data along with addition information.
这篇关于R查找两个美国邮政编码列之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!