本文介绍了是否有R函数以日期范围为条件镜像EXCEL COUNTIFS?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前面临以下问题。

I currently face the following issue.

我想拿出一个R代码,该代码在我的主目录中创建一个名为 reviews_last30days 的新列。数据框列表,它应该能够为每个唯一的列表$ ID 计算或累积所有评论。

I want to come up with an R code that creates a new column called, e.g., reviews_last30days in my main dataframe listings which should be able to count or accumulate all reviews for each unique listings$ID.

每个ID的唯一评论在另一个数据框中列出,如下所示:

The unique reviews per ID are listed in another dataframe like this:

REVIEWS
   ID   review_date
   1    2015-12-30
   1    2015-12-31
   1    2016-10-27
   2    2014-05-10
   2    2016-10-19
   2    2016-10-22
   2    2016-10-23

我还需要添加一个日期条件,例如这样,仅考虑从 last_scrape 开始的最后30天。

I also need to add a date condition, e.g. such that only the last 30 days starting from the last_scrape are considered.

因此,我的结果应该看起来有点像第三列:(更新:请参见编辑以获取对预期结果的更好描述)

LISTINGS
   ID   last_scrape   reviews_last30days
   1    2016-11-15    1
   2    2016-11-15    3

因此,最后,对于每个 ID reviews_last30days 应该将 review_date 计为自 last_scape 之后的30天的指示时间范围。

So finally, the column reviews_last30days should count review_date for each ID since the indicated time frame of 30 days since the last_scape.

我已经格式化了这两个格式日期列 as.Date和%Y-%m-%d。

I already formatted both date columns "as.Date" with "%Y-%m-%d".

对不起,如果我的问题可能对你们来说不够清楚,难以解释或形象化,但就代码而言,希望它毕竟不应该那么复杂。

Sorry if my problem might not be formulated clearly enough for you guys, it's quite hard to explain or visualize, but in terms of code it hopefully shouldn't be that complicated after all.

除了上述输入的REVIEWS外,我确实有第二个输入数据框,它是OVERVIEW,目前它看起来像是简化形式:

Besides the input REVIEWS indicated above, I do have a second input dataframe, be it OVERVIEW, that currently looks somewhat like this in a simplified form:

OVERVIEW
   ID   last_scrape
   1    2016-11-15
   2    2016-11-15
   3    2016-11-15
   4    2017-01-15
   5    2017-01-15
   6    2017-01-15
   7    2017-01-15
etc

所以我真正需要的是一个代码,用于对 ID review_date 项进行计数来自OVERVIEW的c>与REVIEWS中的 ID 匹配,而来自REVIEWS的 review_date 从<$ c开始最多30天$ c> last_scrape 中的概述。

So what I actually need is a code to count all entries of review_date for which the ID from OVERVIEW matches with the ID in REVIEWS and the review_date from REVIEWS is max 30 days from the last_scrape in OVERVIEW.

然后,代码应该理想地将此新计算的值分配为OVERVIEW中的新列,如下所示:

The code should then ideally assign this newly calculated value as new column in OVERVIEW like this:

OVERVIEW
   ID   last_scrape   rev_last30days
   1    2016-11-15    1
   2    2016-11-15    3
   3    2016-11-15    ..
   4    2017-01-15    ..
   5    2017-01-15    ..
   6    2017-01-15    ..
   7    2017-01-15    ..
etc



#2编辑-希望是我的最后一个;)



感谢您到目前为止提供的帮助@mfidino!绘制最新代码仍然会导致一个小错误,即以下错误:

#2 EDIT - hopefully my last ;)

Thanks for your help so far @mfidino! Plotting your latest code still results in one minor mistake, namely the following:

TOTALREV$review_date <- ymd(TOTALREV$review_date)

    TOTALLISTINGS$last_scraped.calc <- ymd(TOTALLISTINGS$last_scraped.calc)

    gen_listings <- function(review = NULL, overview = NULL){
      # tibble to return
      to_return <- review %>%
        inner_join(., overview, by = 'listing_id') %>%
        group_by(listing_id) %>%
        summarise(last_scraped.calc = unique(last_scraped.calc),
                  reviews_last30days = sum(review_date >= (last_scraped.calc-30)))
      return(to_return)
    }

    REVIEWCOUNT <- gen_listings(TOTALREV, TOTALLISTINGS)

Error: Column `last_scraped.calc` must be length 1 (a summary value), not 2

您是否知道如何解决此错误?

Do you have any idea how to fix this error?

注意:
我使用了我的名字

NOTE:I used the names as in my original file, code should still be the same.

如果有帮助,向量 last_scraped.calc 的某些属性:

If it helps, some properties of the vector last_scraped.calc:

$ last_scraped.calc   : Date, format: "2018-08-07" "2018-08-07" ...



typeof(TOTALLISTINGS$last_scraped.calc)
[1] "double"



length(TOTALLISTINGS$last_scraped.calc)
[1] 549281

unique(TOTALLISTINGS$last_scraped.calc)
 [1] "2018-08-07" "2019-01-13" "2018-08-15" "2019-01-16" "2018-08-14"
"2019-01-15" "2019-01-14" "2019-01-22" [9] "2018-08-22" "2018-08-21"
"2019-01-28" "2018-08-20" "2019-01-23" "2019-01-31" "2018-08-09"
"2018-08-10" [17] "2018-08-08" "2018-08-16"

任何进一步的帮助,不胜感激-预先感谢!

Any further help much appreciated - thanks in advance!

推荐答案

您可以使用 dplyr 。我在这里使用 lubridate :: ymd()而不是 as.Date()

You can do this pretty easily with dplyr. I am using lubridate::ymd() here instead of as.Date() as well.

library(lubridate)
library(dplyr)

REVIEWS <- data.frame(ID = c(1,1,1,2,2,2,2),
             review_date = c("2015-12-30",
                             "2015-12-31",
                             "2016-10-27",
                             "2014-05-10",
                             "2016-10-19",
                             "2016-10-22",
                             "2016-10-23"))

REVIEWS$review_date <- ymd(REVIEWS$review_date)

LISTINGS <- REVIEWS %>% group_by(ID) %>%
              summarise(last_scrape = max(review_date),
              reviews_last30days = sum(review_date >= (max(review_date)-30)))

LISTINGS 的输出是您的预期输出:

The output of LISTINGS is your expected output:

# A tibble: 2 x 3
     ID last_scrape reviews_last30days
  <dbl> <date>                   <int>
1     1 2016-10-27                   1
2     2 2016-10-23                   3

编辑:

相反,如果您有兴趣让 last_scrape 作为输入而不是最新输入每个小组的审查日期,您可以这样修改代码。假设每个ID可以有多个 last_scrape

If, instead, you are interested in letting last_scrape be an input rather than the latest review date per group, you can modify the code as such. Assuming that there can be multiple last_scrape per ID:

library(lubridate)
library(dplyr)

REVIEWS <- data.frame(ID = c(1,1,1,2,2,2,2),
             review_date = c("2015-12-30",
                             "2015-12-31",
                             "2016-10-27",
                             "2014-05-10",
                             "2016-10-19",
                             "2016-10-22",
                             "2016-10-23"))

REVIEWS$review_date <- ymd(REVIEWS$review_date)

OVERVIEW <- data.frame(ID = rep(1:7, 2),
                       last_scrape = c("2016-11-15",
                                       "2016-11-15",
                                       "2016-11-15",
                                       "2017-01-15",
                                       "2017-01-15",
                                       "2017-01-15",
                                       "2017-01-15",
                                       "2016-11-20",
                                       "2016-11-20",
                                       "2016-11-20",
                                       "2017-01-20",
                                       "2017-01-20",
                                       "2017-01-20",
                                       "2017-01-20"))

OVERVIEW$last_scrape <- ymd(OVERVIEW$last_scrape)

gen_listings <- function(review = NULL, overview = NULL){
  # tibble to return
  to_return <- review %>%
    inner_join(., overview, by ='ID') %>%
    group_by(ID, last_scrape) %>%
    summarise(
    reviews_last30days = sum(review_date >= (last_scrape-30)))
  return(to_return)
}

LISTINGS <- gen_listings(REVIEWS, OVERVIEW)

此 LISTINGS 对象是:

     ID last_scrape reviews_last30days
  <dbl> <date>                   <int>
1     1 2016-11-15                   1
2     1 2016-11-20                   1
3     2 2016-11-15                   3
4     2 2016-11-20                   2

这篇关于是否有R函数以日期范围为条件镜像EXCEL COUNTIFS?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-18 18:56