R：复制R data.frame的每一行，并指定每行的复制次数？

本文介绍了R：复制R data.frame的每一行，并指定每行的复制次数？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在R编程，我遇到以下问题：

我有一个数据字符串jb，这是相当长的。它是一个简单的版本：

  jb：ab frequency jb.expanded：ab 
 5 3 2 5 3 
 5 7 1 5 3 
 9 1 40 5 7 
 12 4 5 9 1 
 12 5 13 9 1 
 ... ...

我想复制行，复制的频率是列频。这意味着第一行被复制两次，第二行被复制1次，依此类推。我已经用代码解决了这个问题。

  jb.expanded<  -  jb [rep（row.names（jb），jb $ freqency），1：2]

现在这里是问题：

无论频率角中的任何数字大于10，复制列数都是错误的。例如：

 频率：43  - > 14列
 40  - > 13列
 13  - > 11列
 14  - > 12列

你能帮我吗？我不知道如何解决，我也找不到任何东西在互联网上。

感谢您的帮助！

解决方案

更新

在重新审视这个问题后，我觉得@Codoremifa是正确的，假设你的频率列可能是因素。

如果是这样的话，这里是一个例子。它不会匹配您的实际数据，因为我不知道您的数据集中的其他级别。

  mydf $ F2< ;  - 因子（as.character（mydf $ frequency））
 ## expandRows（mydf，F2）
 mydf [rep（rownames（mydf），mydf $ F2），] 
 ＃ab频率F2 
＃1 5 3 2 2 
＃1.1 5 3 2 2 
＃1.2 5 3 2 2 
＃2 5 7 1 1 
＃3 9 1 40 40 
＃3.1 9 1 40 40 
＃3.2 9 1 40 40 
＃3.3 9 1 40 40 
＃4 12 4 5 5 
＃4.1 12 4 5 5 
＃4.2 12 4 5 5 
＃4.3 12 4 5 5 
＃4.4 12 4 5 5 
＃5 12 5 13 13 
＃5.1 12 5 13 13

嗯那看起来不像我61行。为什么不？因为 rep 使用因子之下的数值，在这种情况下与显示的值完全不同： p>

  as.numeric（mydf $ F2）
＃[1] 3 1 4 5 2

要正确转换，您需要：

 code> as.numeric（as.character（mydf $ F2））
＃[1] 2 1 40 5 13

原始答案

以前我写了一个更多的功能@ Simono101的答案的泛化。该功能如下所示：

  expandRows<  -  function（dataset，count，count.is.col = TRUE）{
 if（！isTRUE（count.is.col））{
 if（length（count）== 1）{
 dataset [rep（rownames（dataset），each = count） 
} else {
 if（length（count）！= nrow（dataset））{
 stop（展开向量与data.frame中的行数不匹配）
 } 
 dataset [rep（rownames（dataset），count）]] 
} 
} else {
 dataset [rep（rownames（dataset），dataset [[count]]） ，
 setdiff（name（dataset），names（dataset [count]）]] 
} 
}

为了您的目的，您可以使用 expandRows（mydf，frequency）

  head（expandRows（mydf，frequency））
＃ab 
＃1 5 3 
＃1.1 5 3 
＃2 5 7 
＃3 9 1 
＃3.1 9 1 
＃3.2 9 1

其他选项是重复每个行相同次数：

  expandRows（mydf，2，count.is.col = FALSE）
＃ ab频率
＃1 5 3 2 
＃1.1 5 3 2 
＃2 5 7 1 
＃2.1 5 7 1 
＃3 9 1 40 
＃3.1 9 1 40 
＃4 12 4 5 
＃4.1 12 4 5 
＃5 12 5 13 
＃5.1 12 5 13

或指定一个向量重复每行多少次。

  expandRows（mydf，c（1，2，1，0，2），count.is.col = FALSE）
＃ab频率
＃1 5 3 2 
＃2 5 7 1 
＃2.1 5 7 1 
＃3 9 1 40 
＃5 12 5 13 
＃5.1 12 5 13

请注意这些中必需的 count.is.col = FALSE 参数最后两个选项。

I am programming in R and I got the following problem:

I have a data String jb, that is quite long. Heres a simple version of it:

jb:    a     b     frequency               jb.expanded: a    b
       5     3        2                                 5    3
       5     7        1                                 5    3
       9     1        40                                5    7
       12    4        5                                 9    1
       12    5        13                                9    1
                                                        ...  ...

I want to replicate the rows and the frequency of the replication is the column frequency. That means, the first row is replicated two times, the second row is replicated 1 time and so on. I already solved that problem with the code

jb.expanded <- jb[rep(row.names(jb), jb$freqency), 1:2]

Now here is the problem:

Whenever any number in the frequency corner is greater than 10, the number of replicated columns is wrong. For example:

Frequency: 43 --> 14 columns
           40 --> 13 columns
           13 --> 11 columns
           14 --> 12 columns

Can you help me? I have no idea how to fix that, I also cannot find anything on the internet.

Thanks for your help!

解决方案

Update

Upon revisiting this question, I have a feeling that @Codoremifa was correct in their assumption that your "frequency" column might be a factor.

Here's an example if that were the case. It won't match your actual data since I don't know what other levels are in your dataset.

mydf$F2 <- factor(as.character(mydf$frequency))
## expandRows(mydf, "F2")
mydf[rep(rownames(mydf), mydf$F2), ]
#      a b frequency F2
# 1    5 3         2  2
# 1.1  5 3         2  2
# 1.2  5 3         2  2
# 2    5 7         1  1
# 3    9 1        40 40
# 3.1  9 1        40 40
# 3.2  9 1        40 40
# 3.3  9 1        40 40
# 4   12 4         5  5
# 4.1 12 4         5  5
# 4.2 12 4         5  5
# 4.3 12 4         5  5
# 4.4 12 4         5  5
# 5   12 5        13 13
# 5.1 12 5        13 13

Hmmm. That doesn't look like 61 rows to me. Why not? Because rep uses the numeric values underlying the factor, which is quite different in this case from the displayed value:

as.numeric(mydf$F2)
# [1] 3 1 4 5 2

To properly convert it, you would need:

as.numeric(as.character(mydf$F2))
# [1]  2  1 40  5 13

Original answer

A while ago I wrote a function that is a bit more of a generalization of @Simono101's answer. The function looks like this:

expandRows <- function(dataset, count, count.is.col = TRUE) {
  if (!isTRUE(count.is.col)) {
    if (length(count) == 1) {
      dataset[rep(rownames(dataset), each = count), ]
    } else {
      if (length(count) != nrow(dataset)) {
        stop("Expand vector does not match number of rows in data.frame")
      }
      dataset[rep(rownames(dataset), count), ]
    }
  } else {
    dataset[rep(rownames(dataset), dataset[[count]]),
            setdiff(names(dataset), names(dataset[count]))]
  }
}

For your purposes, you could just use expandRows(mydf, "frequency")

head(expandRows(mydf, "frequency"))
#     a b
# 1   5 3
# 1.1 5 3
# 2   5 7
# 3   9 1
# 3.1 9 1
# 3.2 9 1

Other options are to repeat each row the same number of times:

expandRows(mydf, 2, count.is.col=FALSE)
#      a b frequency
# 1    5 3         2
# 1.1  5 3         2
# 2    5 7         1
# 2.1  5 7         1
# 3    9 1        40
# 3.1  9 1        40
# 4   12 4         5
# 4.1 12 4         5
# 5   12 5        13
# 5.1 12 5        13

Or to specify a vector of how many times to repeat each row.

expandRows(mydf, c(1, 2, 1, 0, 2), count.is.col=FALSE)
#      a b frequency
# 1    5 3         2
# 2    5 7         1
# 2.1  5 7         1
# 3    9 1        40
# 5   12 5        13
# 5.1 12 5        13

Note the required count.is.col = FALSE argument in those last two options.

这篇关于R：复制R data.frame的每一行，并指定每行的复制次数？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！