我正在尝试在dplyr中使用mutate处理字符串,但没有得到想要的输出(请参见下文),在这里,mutate而不是逐行操作,而是采用第一个元素并将其向下填充。我想知道是否有人可以帮助我了解我在做错什么以及如何调整此代码以使其正常工作。

short.idfun = function(longid)
{
    x      = strsplit(longid,"_")
    y      = x[[1]]
    study  = substr(y[1],8,nchar(y[1]))
    subj   = y[length(y)]
    subj   = substr(subj,regexpr("[^0]",subj),nchar(subj)) #remove leading zeros
    shortid= paste(study,subj,sep="-")
    return(shortid)
}

data = data.frame(test=c("1234567Andy_003_003003","1234567Beth_004_003004","1234567Char_003_003005"),stringsAsFactors=FALSE)
data= mutate(data,shortid=short.idfun(test))
print(data)

#### Below is my output
#                       test   shortid
#1    1234567Andy_003_003003 Andy-3003
#2    1234567Beth_004_003004 Andy-3003
#3    1234567Char_003_003005 Andy-3003

#### This is the behavior I was hoping for
#                       test   shortid
#1    1234567Andy_003_003003 Andy-3003
#2    1234567Beth_004_003004 Beth-3004
#3    1234567Char_003_003005 Char-3005

最佳答案

另一种选择是使用rowwise()

data %>%
  rowwise() %>%
  mutate(shortid = short.idfun(test))


这使:

#Source: local data frame [3 x 2]
#Groups: <by row>
#
#                    test   shortid
#                   (chr)     (chr)
#1 1234567Andy_003_003003 Andy-3003
#2 1234567Beth_004_003004 Beth-3004
#3 1234567Char_003_003005 Char-3005

关于r - R-进行字符串处理的mutate-无法获得我希望的行为,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/34643632/

10-12 17:11