问题描述
我有一个59720 obs的数据帧.如下图所示.我想为查找表中的每个观察结果分配一个MARKETNAME.
I have a data frame of 59720 obs. that looks like below. I want to assign a MARKETNAME to each observation from a lookup table.
> data (a)
DAY HOUR LEAD Row.Count DATE ITIME HOMEPHONE CITY STATE ZIPCODE ZONENAME
1 Monday 13:00 1 9430 7/1/2013 13:42:51 FORT LAUDERDALE FL 33315 68
2 Monday 13:00 1 9432 7/1/2013 13:43:50 xxxxx9802x PLEASANT GROVE AL 35127 82
3 Monday 13:00 1 9434 7/1/2013 13:46:18 5xxxx85x10 ORO VALLEY AZ 85737 54
4 Monday 0:00 1 9435 7/1/2013 0:04:34 50xxxx1x364 SPOKANE WA 99204 211
5 Monday 11:00 1 9436 7/1/2013 11:45:43 951xxxxx20 RIVERSIDE CA 92507 31
6 Monday 11:00 1 9437 7/1/2013 11:46:26 760xxxxx679 VISTA CA 92081 539
我有一个邮政编码查找表,其中包含43126个唯一的邮政编码,如下所示:
I have a lookup table of zip codes with 43126 unique zip codes that looks like this:
> data (b)
MARKETNAME ZIPCODE
NEW YORK 00501
NEW YORK 00544
SPRINGFIELD-HOLYOKE 01001
SPRINGFIELD-HOLYOKE 01002
SPRINGFIELD-HOLYOKE 01003
SPRINGFIELD-HOLYOKE 01004
我想简单地将MARKETNAME分配给我的数据集"a"
,比较"b"
中的ZIPCODE
.所以我用
I wanted to simply assign the MARKETNAME to my dataset "a"
comparing the ZIPCODE
in "b"
. So I used
> c <- merge(a, b, by="ZIPCODE")
.
它返回58,972磅.这意味着我损失了748磅.我不想丢失a
中的任何记录,所以我将代码更改如下:
It returned 58,972 obs. which meant I lost 748 obs. I did not want to lose any record from a
so I changed my code as follows:
> c <- merge (a, b, by = "ZIPCODE" , all.x=TRUE)
.
奇怪的是,它返回了61652 obs.而不是我的预期,即返回59,720磅.根据具有某些NA的原始a
数据帧.
Strangely this returned 61,652 obs. instead of my expectation which was returning 59,720 obs. as per original a
data frame with some NAs.
根据文档
我对此的解释肯定是错误的.有人可以解释我做错了什么以及如何完成这个简单的任务吗?
My interpretation of this is definitely wrong. Can someone please explain what I am doing wrong and how I can accomplish this simple task?
我提到了:如何根据特定条件合并数据框并更改元素值?,,,但是它们都不类似于我的问题.
I referred : How to merge data frames and change element values based on certain conditions?, Subsetting and Merging from 2 Related Data Frames in r, how to merge two unequal size data frame in R but none of them are akin to my problem.
推荐答案
我更喜欢plyr
中的join
,默认情况下是左联接,返回第一个数据帧中记录的所有匹配项.
I prefer join
from plyr
which by default is a left-join returning all matches of records in the first data frame.
c <- join(a, b, by="ZIPCODE")
这篇关于合并数据框架和R中的查找表,保留数据框架中的所有记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!