问题描述
我有一个数据框df
,其中包含几列,但下面仅列出了相关列.
I have a dataframe df
that contains a couple of columns, but the only relevant ones are given below.
node | precedingWord
-------------------------
A-bom de
A-bom die
A-bom de
A-bom een
A-bom n
A-bom de
acroniem het
acroniem t
acroniem het
acroniem n
acroniem een
act de
act het
act die
act dat
act t
act n
我想使用这些值对每个节点的前一个单词进行计数,但要包含子类别.例如:要为其添加值的一列标题为neuter
,另一列non-neuter
和最后一个rest
. neuter
将包含所有值,其中previousWord是以下值之一:t
,het
,dat
. non-neuter
将包含de
和die,
,而rest
将包含不属于neuter
或non-neuter
的所有内容. (最好是动态的,换句话说,rest
使用某种用于中性和非中性的反向变量.或者简单地从长度中减去中性和非中性的值.具有该节点的行.)
I'd like to use these values to make a count of the precedingWords per node, but with subcategories. For instance: one column to add values to that is titled neuter
, another non-neuter
and a last one rest
. neuter
would contain all values for which precedingWord is one of these values: t
,het
, dat
. non-neuter
would contain de
and die,
and rest
would contain everything that doesn't belong into neuter
or non-neuter
. (It would be nice if this could be dynamic, in other words that rest
uses some sort of reversed variable that is used for neuter and non-neuter. Or which simply subtracts the values in neuter and non-neuter from the length of rows with that node.)
示例输出(在一个新的数据框中,假设为freqDf
,看起来像这样:
Example output (in a new dataframe, let's say freqDf
, would look like this:
node | neuter | nonNeuter | rest
-----------------------------------------
A-bom 0 4 2
acroniem 3 0 2
act 3 2 1
要创建freqDf $ node,可以执行以下操作:
To create freqDf$node, I can do this:
freqDf<- data.frame(node = unique(df$node), stringsAsFactors = FALSE)
但这已经是我所拥有的;我不知道如何继续.我以为我可以做这样的事情,但是不幸的是++
运算符没有按我希望的那样工作.
But that's already all I got; I don't know how to continue. I figured I could do something like this, but unfortunately the ++
operator doesn't work as I had hoped.
freqDf$neuter[grep("dat|het|t", df$precedingWord, perl=TRUE)] <- ++
freqDf$nonNeuter[grep("de|die", df$precedingWord, perl=TRUE)] <- ++
e <- table(df$Node)
freqDf$rest <- as.numeric(e - freqDf$neuter - freqDf$nonNeuter)
此外,这不适用于每个节点.我需要某种针对freqDf$node
中每个不同值自动运行的循环.
Also, this won't work for each node individually. I need some sort of loop that automatically runs for each different value in freqDf$node
.
推荐答案
一种方法是用值的类别替换值,然后使用table
函数生成频率.
One way is to replace the values by their categories and then use the table
function to generate the frequecies.
neuter <- c("t", "het", "dat")
non.neuter <- c("de", "die")
df$precedingWord[df$precedingWord %in% neuter] <- "neuter"
df$precedingWord[df$precedingWord %in% non.neuter] <- "non.neuter"
df$precedingWord[!df$precedingWord %in% c(neuter, non.neuter)] <- "rest"
table(df)
precedingWord
node neuter non.neuter rest
A-bom 0 4 2
acroniem 3 0 2
act 3 2 1
但是我敢肯定,例如dplyr软件包有更好的解决方案.
But I'm sure there is a better solution with the dplyr package for example.
也许是这样的:(它不会覆盖您的"precedingWord"列,而是添加一个新的"gender")
EDIT : Maybe something like that :(It dont overwrite your "precedingWord" column but add a new "gender" one)
library(dplyr)
df %>%
mutate(gender = ifelse(!precedingWord %in% c(neuter, non.neuter), "rest",
ifelse(precedingWord %in% neuter, "neuter", "non.neuter"))) %>%
count(node, gender)
Source: local data frame [7 x 3]
Groups: node
node gender n
1 A-bom non.neuter 4
2 A-bom rest 2
3 acroniem neuter 3
4 acroniem rest 2
5 act neuter 3
6 act non.neuter 2
7 act rest 1
# And if you want the same output you put in your question, you can use table
df2 <- mutate(df, gender = ifelse(!precedingWord %in% c(neuter, non.neuter), "rest",
ifelse(precedingWord %in% neuter, "neuter", "non.neuter")))
table(df2$node, df2$gender)
neuter non.neuter rest
A-bom 0 4 2
acroniem 3 0 2
act 3 2 1
将表转换为可操作的数据框
Edit : Convert table to a manipulable data frame
myTable <- table(df2$node, df2$gender) %>%
as.data.frame.matrix %>%
mutate(node = row.names(.))
> myTable
neuter non.neuter rest node
1 0 4 2 A-bom
2 3 0 2 acroniem
3 3 2 1 act
> str(myTable)
'data.frame': 3 obs. of 4 variables:
$ neuter : int 0 3 3
$ non.neuter: int 4 0 2
$ rest : int 2 2 1
$ node : chr "A-bom" "acroniem" "act"
# And here is a more understandable way if you are not familiar with piping
# To learn more about forward piping : https://github.com/smbache/magrittr
myTable <- table(df2$node, df2$gender)
myTable2 <- as.data.frame.matrix(myTable)
myTable3 <- mutate(myTable2, node = row.names(myTable2))
这篇关于循环并添加到R中的计数器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!