R查找两个美国邮政编码列之间的距离

R查找两个美国邮政编码列之间的距离

本文介绍了R查找两个美国邮政编码列之间的距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道计算美国两个邮政编码列之间以英里为单位的距离的最有效方法是使用R.

I was wondering what the most efficient method of calculating the distance in miles between two US zipcode columns would be using R.

我听说过Geosphere软件包,用于计算邮政编码之间的差异,但我并不完全了解它,并且想知道是否还有其他方法.

I have heard of the geosphere package for computing the difference between zipcodes but do not fully understand it and was wondering if there were alternative methods as well.

例如说我有一个看起来像这样的数据框.

For example say I have a data frame that looks like this.

 ZIP_START     ZIP_END
 95051         98053
 94534         94128
 60193         60666
 94591         73344
 94128         94128
 94015         73344
 94553         94128
 10994         7105
 95008         94128

我想创建一个看起来像这样的新数据框.

I want to create a new data frame that looks like this.

 ZIP_START     ZIP_END     MILES_DIFFERENCE
 95051         98053       x
 94534         94128       x
 60193         60666       x
 94591         73344       x
 94128         94128       x
 94015         73344       x
 94553         94128       x
 10994         7105        x
 95008         94128       x

其中x是两个邮政编码之间的英里差.

Where x is the difference in miles between the two zipcodes.

计算此距离的最佳方法是什么?

What is the best method of calculating this distance?

这是创建示例数据框的R代码.

Here is the R code to create the example data frame.

df <- data.frame("ZIP_START" = c(95051, 94534, 60193, 94591, 94128, 94015, 94553, 10994, 95008), "ZIP_END" = c(98053, 94128, 60666, 73344, 94128, 73344, 94128, 7105, 94128))

如果您有任何疑问,请告诉我.

Please let me know if you have any questions.

任何建议都值得赞赏.

谢谢您的帮助.

推荐答案

这里有一个方便的R包,名为"zipcode".其中提供了邮政编码,城市,州和纬度和经度的表格.因此,一旦获得了这些信息,地理圈"便会出现.包可以计算点之间的距离.

There is a handy R package out there named "zipcode" which provides a table of zip code, city, state and the latitude and longitude. So once you have that information, the "geosphere" package can calculate the distance between points.

library(zipcode)
library(geosphere)

#dataframe need to be character arrays or the else the leading zeros will be dropped causing errors
df <- data.frame("ZIP_START" = c(95051, 94534, 60193, 94591, 94128, 94015, 94553, 10994, 95008),
       "ZIP_END" = c(98053, 94128, 60666, 73344, 94128, 73344, 94128, "07105", 94128),
       stringsAsFactors = FALSE)

data("zipcode")

df$distance_meters<-apply(df, 1, function(x){
  startindex<-which(x[["ZIP_START"]]==zipcode$zip)
  endindex<-which(x[["ZIP_END"]]==zipcode$zip)
  distGeo(p1=c(zipcode[startindex, "longitude"], zipcode[startindex, "latitude"]), p2=c(zipcode[endindex, "longitude"], zipcode[endindex, "latitude"]))
})

关于输入数据框的列类的警告.邮政编码应为字符而不是数字,否则前导零将被丢弃,从而导致错误.

Warning about your column class for your input data frame. Zip codes should be a character and not numeric, otherwise leading zeros are dropped causing errors.

从distGeo返回的距离以米为单位,我将允许读者确定适当的单位转换为英里.

The return distance from distGeo is in meters, I will allow the reader to determine the proper unit conversion to miles.

更新
邮递区号档案似乎已被封存.有一个替换包:"zipcodeR"提供经度和纬度数据以及附加信息.

Update
The zipcode package appears to have been archived. There is a replacement package: "zipcodeR" which provides the longitude and latitude data along with addition information.

这篇关于R查找两个美国邮政编码列之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-06 05:45