对于文本挖掘项目,我必须调查单词列表随时间的发展。为此,我需要拆分行名,以便我有一列包含公司名称和一列包含年份。这是我的数据框中的摘录:

                    abs  access   allow     analysis application approach base big business challenge company
Adidas_2010.txt     13    25       26          11       41        132   1      266        13     115       1
Adidas_2011.txt      1     3        1           0        0         8   0       11         2      10       0
Adidas_2012.txt     29    35       37          22      110        181   7      384        31     136       3
Adidas_2013.txt     28    47       38          32      180        184   4      451        30     129       3
Adidas_2014.txt     12    42       38          27      159        207   6      921        32     128       6
Adidas_2016.txt     30    47       50          47      162        251   9     1061        32     171      13
Nike_2009.txt       16    15       17          12       33        177   9      346        93     196       1
Nike_2011.txt       10    30        0           3        0         0    0       81         7      31       0
Nike_2012.txt       21    22       12          57      199        300   7      214        11     107       3
Nike_2013.txt       20    32       30          11      123        321   4      331        90     239       3
Nike_2014.txt       33    43       30          33      119        137   6      441        67     318       6
Nike_2015.txt       51    42       41          27      102        151   9     1061        32     221      13

这是我的代码:
dtm <- DocumentTermMatrix(corpus, control=list(dictionary = word_list))
df1 <- data.frame(as.matrix(dtm), row.names = filenames_annualreports)

我试过这个:
 names_plus_year <- rownames(df1)
 names_plus_year_split <- strsplit(names_plus_year, "_")
 rownames(df1) <- sapply(names_plus_year_split, "[", 1)

但我收到以下错误:
Error in `.rowNamesDF<-`(x, value = value) :
  double 'row.names' not allowed

有没有另一种方法来拆分行名?非常感谢! :)

最佳答案

您可以拆分行名,按行绑定(bind)它们,然后按列将它们绑定(bind)到您的数据框,即

 cbind.data.frame(df, do.call(rbind, strsplit(sub('\\..*','' ,rownames(df)), '_')))

这使,



您可以照常更改名称。

数据
dput(df)
structure(list(abs = c(13L, 1L, 29L, 28L, 12L, 30L, 16L, 10L,
21L, 20L, 33L, 51L), access = c(25L, 3L, 35L, 47L, 42L, 47L,
15L, 30L, 22L, 32L, 43L, 42L), allow = c(26L, 1L, 37L, 38L, 38L,
50L, 17L, 0L, 12L, 30L, 30L, 41L), analysis = c(11L, 0L, 22L,
32L, 27L, 47L, 12L, 3L, 57L, 11L, 33L, 27L), application = c(41L,
0L, 110L, 180L, 159L, 162L, 33L, 0L, 199L, 123L, 119L, 102L),
    approach = c(132L, 8L, 181L, 184L, 207L, 251L, 177L, 0L,
    300L, 321L, 137L, 151L), base = c(1L, 0L, 7L, 4L, 6L, 9L,
    9L, 0L, 7L, 4L, 6L, 9L), big = c(266L, 11L, 384L, 451L, 921L,
    1061L, 346L, 81L, 214L, 331L, 441L, 1061L), business = c(13L,
    2L, 31L, 30L, 32L, 32L, 93L, 7L, 11L, 90L, 67L, 32L), challenge = c(115L,
    10L, 136L, 129L, 128L, 171L, 196L, 31L, 107L, 239L, 318L,
    221L), company = c(1L, 0L, 3L, 3L, 6L, 13L, 1L, 0L, 3L, 3L,
    6L, 13L)), row.names = c("Adidas_2010.txt", "Adidas_2011.txt",
"Adidas_2012.txt", "Adidas_2013.txt", "Adidas_2014.txt", "Adidas_2016.txt",
"Nike_2009.txt", "Nike_2011.txt", "Nike_2012.txt", "Nike_2013.txt",
"Nike_2014.txt", "Nike_2015.txt"), class = "data.frame")

关于r - 从数据框中拆分行名,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59531791/

10-11 12:16