本文介绍了根据行条件过滤数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想了一个例子来说明我的问题。

假设有5个球:

  • 红色
  • 蓝色
  • 绿色
  • 黄色
  • 橙色

通常,这些球有5!=120种组织方式。我可以在下面列举所有这些组合:

library(combinat)
library(dplyr)

my_list = c("Red", "Blue", "Green", "Yellow", "Orange")

d = permn(my_list)

all_combinations  = as.data.frame(matrix(unlist(d), ncol = 120)) %>%
  setNames(paste0("col", 1:120))


all_combinations[,1:5]

    col1   col2   col3   col4   col5
1    Red    Red    Red    Red Orange
2   Blue   Blue   Blue Orange    Red
3  Green  Green Orange   Blue   Blue
4 Yellow Orange  Green  Green  Green
5 Orange Yellow Yellow Yellow Yellow

我的问题:

假设我想按以下条件筛选此列表:

  • 红色球可以位于第一个位置,也可以位于第二个位置(从左到右)
  • 蓝色球和绿色球之间必须至少有2个位置
  • 黄色球不能在最后

然后我尝试根据以下3个条件筛选上述数据:

# attempt to write first condition
    cond_1 <- all_combinations[which(all_combinations[1,]== "Red" || all_combinations[2,] == "Red"), ]

#not sure how to write the second condition
    
 # attempt to write the third condition   
    cond_3 <- data_frame_version[which(data_frame_version[5,] !== "Yellow" ), ]

# if everything worked, an "anti join" style statement could be written to remove "cond_1, cond_2, cond_3" from the original data?

但这根本不起作用-第一个和第三个条件返回的数据框仅包含所有列的4行。

谁能告诉我如何使用以上3个筛选器正确筛选&All_Companies&Quot;?

注意:

以下代码可以调换原始数据:

 library(data.table)

    tpose = transpose(all_combinations)

    df = tpose
    
#group every 5 rows by the same id to identify unique combinations

    bloc_len <- 5
    
    df$bloc <- 
        rep(seq(1, 1 + nrow(df) %/% bloc_len), each = bloc_len, length.out = nrow(df))
    
   
 head(df)

      V1     V2     V3     V4     V5 bloc
1    Red   Blue  Green Yellow Orange    1
2    Red   Blue  Green Orange Yellow    1
3    Red   Blue Orange  Green Yellow    1
4    Red Orange   Blue  Green Yellow    1
5 Orange    Red   Blue  Green Yellow    1
6 Orange    Red   Blue Yellow  Green    2

推荐答案

您可以:

library(tidyverse)
tpose %>%
  mutate(blue_delete = case_when(V1 == "Blue" & V2 == "Green" ~ TRUE,
                                 V1 == "Blue" & V3 == "Green" ~ TRUE,
                                 V2 == "Blue" & V3 == "Green" ~ TRUE,
                                 V3 == "Blue" & V4 == "Green" ~ TRUE,
                                 V4 == "Blue" & V5 == "Green" ~ TRUE,
                                 TRUE ~ FALSE)) %>%
  filter(V3 != "Red" & V4 != "Red" & V5 != "Red",
         V5 != "Yellow",
         blue_delete == FALSE) %>%
  select(-blue_delete)

这篇关于根据行条件过滤数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 17:09