问题描述
我有一个开放式调查的数据.我有一个注释表和一个代码表.代码表是一组主题或字符串.
I have data from an open ended survey. I have a comments table and a codes table. The codes table is a set of themes or strings.
我正在尝试做的事情:在开放式注释中检查代码表中相关列中是否存在单词/字符串.在注释表中为特定主题添加新列,并使用二进制1或0表示已标记了哪些记录.
What I am trying to do:Check to see if a word / string exists from the relevant column in the codes table is in an open ended comment. Add a new column in the comments table for the specific theme and a binary 1 or 0 to denote what records have been tagged.
在代码表中有很多列,这些列是实时变化的,列顺序和列数可能会发生变化.
There are quite a number of columns in the codes table, these are live and ever changing, column orders and number of columns subject to change.
我目前正在以一种相当复杂的方式来执行此操作,我正在用多行代码分别检查每一列,并且我认为可能有更好的方法来执行此操作.
I am currently doing this in a rather convoluted way, I am checking each column individually with multiple lines of code and I reckon there is likely a much better way of doing it.
我不知道如何使之适用于stringi函数.
I can't figure out how to get lapply to work with the stringi function.
非常感谢您的帮助.
这是一组示例代码,因此您可以看到我要执行的操作:
Here is an example set of code so you can see what I am trying to do:
#Two tables codes and comments
#codes table
codes <- structure(
list(
Support = structure(
c(2L, 3L, NA),
.Label = c("",
"help", "questions"),
class = "factor"
),
Online = structure(
c(1L,
3L, 2L),
.Label = c("activities", "discussion board", "quiz"),
class = "factor"
),
Resources = structure(
c(3L, 2L, NA),
.Label = c("", "pdf",
"textbook"),
class = "factor"
)
),
row.names = c(NA,-3L),
class = "data.frame"
)
#comments table
comments <- structure(
list(
SurveyID = structure(
1:5,
.Label = c("ID_1", "ID_2",
"ID_3", "ID_4", "ID_5"),
class = "factor"
),
Open_comments = structure(
c(2L,
4L, 3L, 5L, 1L),
.Label = c(
"I could never get the pdf to download",
"I didn’t get the help I needed on time",
"my questions went unanswered",
"staying motivated to get through the textbook",
"there wasn’t enough engagement in the discussion board"
),
class = "factor"
)
),
class = "data.frame",
row.names = c(NA,-5L)
)
#check if any words from the columns in codes table match comments
#here I am looking for a match column by column but looking for a better way - lappy?
support = paste(codes$Support, collapse = "|")
supp_stringi = stri_detect_regex(comments$Open_comments, support)
supp_grepl = grepl(pattern = support, x = comments$Open_comments)
identical(supp_stringi, supp_grepl)
comments$Support = ifelse(supp_grepl == TRUE, 1, 0)
# What I would like to do is loop through all columns in codes rather than outlining the above code for each column in codes
推荐答案
这里是一种将 string :: stri_detect_regex()
与 lapply()
一起使用的方法的TRUE = 1,FALSE = 0取决于注释中是否包含 Support
, Online
或 Resources
向量中的任何单词,以及将此数据与注释合并回去.
Here is an approach that uses string::stri_detect_regex()
with lapply()
to create vectors of TRUE = 1, FALSE = 0 depending on whether any of the words in the Support
, Online
or Resources
vectors are in the comments, and merges this data back with the comments.
# build data structures from OP
resultsList <- lapply(1:ncol(codes),function(x){
y <- stri_detect_regex(comments$Open_comments,paste(codes[[x]],collapse = "|"))
ifelse(y == TRUE,1,0)
})
results <- as.data.frame(do.call(cbind,resultsList))
colnames(results) <- colnames(codes)
mergedData <- cbind(comments,results)
mergedData
...以及结果.
> mergedData
SurveyID Open_comments Support Online
1 ID_1 I didn’t get the help I needed on time 1 0
2 ID_2 staying motivated to get through the textbook 0 0
3 ID_3 my questions went unanswered 1 0
4 ID_4 there wasn’t enough engagement in the discussion board 0 1
5 ID_5 I could never get the pdf to download 0 0
Resources
1 0
2 1
3 0
4 0
5 1
>
这篇关于匹配的字符串遍历多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!