合并数据框架和R中的查找表

合并数据框架和R中的查找表

本文介绍了合并数据框架和R中的查找表,保留数据框架中的所有记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个59720 ob​​s的数据帧.如下图所示.我想为查找表中的每个观察结果分配一个MARKETNAME.

I have a data frame of 59720 obs. that looks like below. I want to assign a MARKETNAME to each observation from a lookup table.

> data (a)

     DAY  HOUR LEAD Row.Count     DATE    ITIME  HOMEPHONE            CITY  STATE ZIPCODE     ZONENAME
1 Monday 13:00    1      9430 7/1/2013 13:42:51            FORT LAUDERDALE     FL  33315       68
2 Monday 13:00    1      9432 7/1/2013 13:43:50 xxxxx9802x  PLEASANT GROVE     AL  35127       82
3 Monday 13:00    1      9434 7/1/2013 13:46:18 5xxxx85x10      ORO VALLEY     AZ  85737       54
4 Monday  0:00    1      9435 7/1/2013  0:04:34 50xxxx1x364          SPOKANE    WA  99204      211
5 Monday 11:00    1      9436 7/1/2013 11:45:43 951xxxxx20        RIVERSIDE    CA  92507       31
6 Monday 11:00    1      9437 7/1/2013 11:46:26 760xxxxx679            VISTA    CA  92081      539

我有一个邮政编码查找表,其中包含43126个唯一的邮政编码,如下所示:

I have a lookup table of zip codes with 43126 unique zip codes that looks like this:

> data (b)

MARKETNAME            ZIPCODE
NEW YORK              00501
NEW YORK              00544
SPRINGFIELD-HOLYOKE   01001
SPRINGFIELD-HOLYOKE   01002
SPRINGFIELD-HOLYOKE   01003
SPRINGFIELD-HOLYOKE   01004

我想简单地将MARKETNAME分配给我的数据集"a",比较"b"中的ZIPCODE.所以我用

I wanted to simply assign the MARKETNAME to my dataset "a" comparing the ZIPCODE in "b". So I used

> c <- merge(a, b, by="ZIPCODE").

它返回58,972磅.这意味着我损失了748磅.我不想丢失a中的任何记录,所以我将代码更改如下:

It returned 58,972 obs. which meant I lost 748 obs. I did not want to lose any record from a so I changed my code as follows:

> c <- merge (a, b, by = "ZIPCODE" , all.x=TRUE).

奇怪的是,它返回了61652 obs.而不是我的预期,即返回59,720磅.根据具有某些NA的原始a数据帧.

Strangely this returned 61,652 obs. instead of my expectation which was returning 59,720 obs. as per original a data frame with some NAs.

根据文档

我对此的解释肯定是错误的.有人可以解释我做错了什么以及如何完成这个简单的任务吗?

My interpretation of this is definitely wrong. Can someone please explain what I am doing wrong and how I can accomplish this simple task?

我提到了:如何根据特定条件合并数据框并更改元素值?,,,但是它们都不类似于我的问题.

I referred : How to merge data frames and change element values based on certain conditions?, Subsetting and Merging from 2 Related Data Frames in r, how to merge two unequal size data frame in R but none of them are akin to my problem.

推荐答案

我更喜欢plyr中的join,默认情况下是左联接,返回第一个数据帧中记录的所有匹配项.

I prefer join from plyr which by default is a left-join returning all matches of records in the first data frame.

c <- join(a, b, by="ZIPCODE")

这篇关于合并数据框架和R中的查找表,保留数据框架中的所有记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 10:38