问题描述
我在R编程,我遇到以下问题:我有一个数据字符串jb,这是相当长的。它是一个简单的版本:
jb:ab frequency jb.expanded:ab
5 3 2 5 3
5 7 1 5 3
9 1 40 5 7
12 4 5 9 1
12 5 13 9 1
... ...
我想复制行,复制的频率是列频。这意味着第一行被复制两次,第二行被复制1次,依此类推。我已经用代码解决了这个问题。
jb.expanded< - jb [rep(row.names(jb),jb $ freqency),1:2]
现在这里是问题:
无论频率角中的任何数字大于10,复制列数都是错误的。例如:
频率:43 - > 14列
40 - > 13列
13 - > 11列
14 - > 12列
你能帮我吗?我不知道如何解决,我也找不到任何东西在互联网上。
感谢您的帮助!
更新
在重新审视这个问题后,我觉得@Codoremifa是正确的,假设你的频率列可能是因素
。
如果是这样的话,这里是一个例子。它不会匹配您的实际数据,因为我不知道您的数据集中的其他级别。
mydf $ F2< ; - 因子(as.character(mydf $ frequency))
## expandRows(mydf,F2)
mydf [rep(rownames(mydf),mydf $ F2),]
#ab频率F2
#1 5 3 2 2
#1.1 5 3 2 2
#1.2 5 3 2 2
#2 5 7 1 1
#3 9 1 40 40
#3.1 9 1 40 40
#3.2 9 1 40 40
#3.3 9 1 40 40
#4 12 4 5 5
#4.1 12 4 5 5
#4.2 12 4 5 5
#4.3 12 4 5 5
#4.4 12 4 5 5
#5 12 5 13 13
#5.1 12 5 13 13
嗯那看起来不像我61行。为什么不?因为 rep
使用因子
之下的数值,在这种情况下与显示的值完全不同: p>
as.numeric(mydf $ F2)
#[1] 3 1 4 5 2
要正确转换,您需要:
code> as.numeric(as.character(mydf $ F2))
#[1] 2 1 40 5 13
原始答案
以前我写了一个更多的功能@ Simono101的答案的泛化。该功能如下所示:
expandRows< - function(dataset,count,count.is.col = TRUE){
if(!isTRUE(count.is.col)){
if(length(count)== 1){
dataset [rep(rownames(dataset),each = count)
} else {
if(length(count)!= nrow(dataset)){
stop(展开向量与data.frame中的行数不匹配)
}
dataset [rep(rownames(dataset),count)]]
}
} else {
dataset [rep(rownames(dataset),dataset [[count]]) ,
setdiff(name(dataset),names(dataset [count])]]
}
}
为了您的目的,您可以使用 expandRows(mydf,frequency)
head(expandRows(mydf,frequency))
#ab
#1 5 3
#1.1 5 3
#2 5 7
#3 9 1
#3.1 9 1
#3.2 9 1
其他选项是重复每个行相同次数:
expandRows(mydf,2,count.is.col = FALSE)
# ab频率
#1 5 3 2
#1.1 5 3 2
#2 5 7 1
#2.1 5 7 1
#3 9 1 40
#3.1 9 1 40
#4 12 4 5
#4.1 12 4 5
#5 12 5 13
#5.1 12 5 13
或指定一个向量重复每行多少次。
expandRows(mydf,c(1,2,1,0,2),count.is.col = FALSE)
#ab频率
#1 5 3 2
#2 5 7 1
#2.1 5 7 1
#3 9 1 40
#5 12 5 13
#5.1 12 5 13
请注意这些中必需的 count.is.col = FALSE
参数最后两个选项。
I am programming in R and I got the following problem:
I have a data String jb, that is quite long. Heres a simple version of it:
jb: a b frequency jb.expanded: a b
5 3 2 5 3
5 7 1 5 3
9 1 40 5 7
12 4 5 9 1
12 5 13 9 1
... ...
I want to replicate the rows and the frequency of the replication is the column frequency. That means, the first row is replicated two times, the second row is replicated 1 time and so on. I already solved that problem with the code
jb.expanded <- jb[rep(row.names(jb), jb$freqency), 1:2]
Now here is the problem:
Whenever any number in the frequency corner is greater than 10, the number of replicated columns is wrong. For example:
Frequency: 43 --> 14 columns
40 --> 13 columns
13 --> 11 columns
14 --> 12 columns
Can you help me? I have no idea how to fix that, I also cannot find anything on the internet.
Thanks for your help!
Update
Upon revisiting this question, I have a feeling that @Codoremifa was correct in their assumption that your "frequency" column might be a factor
.
Here's an example if that were the case. It won't match your actual data since I don't know what other levels are in your dataset.
mydf$F2 <- factor(as.character(mydf$frequency))
## expandRows(mydf, "F2")
mydf[rep(rownames(mydf), mydf$F2), ]
# a b frequency F2
# 1 5 3 2 2
# 1.1 5 3 2 2
# 1.2 5 3 2 2
# 2 5 7 1 1
# 3 9 1 40 40
# 3.1 9 1 40 40
# 3.2 9 1 40 40
# 3.3 9 1 40 40
# 4 12 4 5 5
# 4.1 12 4 5 5
# 4.2 12 4 5 5
# 4.3 12 4 5 5
# 4.4 12 4 5 5
# 5 12 5 13 13
# 5.1 12 5 13 13
Hmmm. That doesn't look like 61 rows to me. Why not? Because rep
uses the numeric values underlying the factor
, which is quite different in this case from the displayed value:
as.numeric(mydf$F2)
# [1] 3 1 4 5 2
To properly convert it, you would need:
as.numeric(as.character(mydf$F2))
# [1] 2 1 40 5 13
Original answer
A while ago I wrote a function that is a bit more of a generalization of @Simono101's answer. The function looks like this:
expandRows <- function(dataset, count, count.is.col = TRUE) {
if (!isTRUE(count.is.col)) {
if (length(count) == 1) {
dataset[rep(rownames(dataset), each = count), ]
} else {
if (length(count) != nrow(dataset)) {
stop("Expand vector does not match number of rows in data.frame")
}
dataset[rep(rownames(dataset), count), ]
}
} else {
dataset[rep(rownames(dataset), dataset[[count]]),
setdiff(names(dataset), names(dataset[count]))]
}
}
For your purposes, you could just use expandRows(mydf, "frequency")
head(expandRows(mydf, "frequency"))
# a b
# 1 5 3
# 1.1 5 3
# 2 5 7
# 3 9 1
# 3.1 9 1
# 3.2 9 1
Other options are to repeat each row the same number of times:
expandRows(mydf, 2, count.is.col=FALSE)
# a b frequency
# 1 5 3 2
# 1.1 5 3 2
# 2 5 7 1
# 2.1 5 7 1
# 3 9 1 40
# 3.1 9 1 40
# 4 12 4 5
# 4.1 12 4 5
# 5 12 5 13
# 5.1 12 5 13
Or to specify a vector of how many times to repeat each row.
expandRows(mydf, c(1, 2, 1, 0, 2), count.is.col=FALSE)
# a b frequency
# 1 5 3 2
# 2 5 7 1
# 2.1 5 7 1
# 3 9 1 40
# 5 12 5 13
# 5.1 12 5 13
Note the required count.is.col = FALSE
argument in those last two options.
这篇关于R:复制R data.frame的每一行,并指定每行的复制次数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!