问题描述
拥有一个主要包含数字值的data.table怎么只转换一部分列并将其放回原始数据表?通常,我不想将任何摘要统计信息添加为单独的列,只需交换转换后的统计信息即可。
How can one, having a data.table with mostly numeric values, transform just a subset of columns and put them back to the original data table? Generally, I don't want to add any summary statistic as a separate column, just exchange the transformed ones.
假设我们有DT。它具有1列名称和10列数字值。我对数据表的每一行都使用基R的缩放功能感兴趣,但仅适用于那10个数字列。
Assume we have a DT. It has 1 column with names and 10 columns with numeric values. I am interested in using "scale" function of base R for each row of that data table, but only applied to those 10 numeric columns.
并对此进行扩展。如果我有一个包含更多列的数据表,并且需要使用列名来告诉scale函数在哪个数据点上应用该函数,该怎么办?
And to expand on this. What if I have a data table with more columns and I need to use column names to tell the scale function on which datapoints to apply the function?
使用常规data.frame我只会做:
With regular data.frame I would just do:
df[,grep("keyword",colnames(df))] <- t(apply(df[,grep("keyword",colnames(df))],1,scale))
我知道这看起来很麻烦,但始终对我有用。但是,我找不到在data.tables中执行此操作的简单方法。
I know this looks cumbersome but always worked for me. However, I can't figure out a simple way to do it in data.tables.
我会想象这样的图像对data.tables起作用:
I would image something like this to work for data.tables:
dt[,grep("keyword",colnames(dt)) := scale(grep("keyword",colnames(dt)),center=F)]
但不是。
编辑:
另一个使用按行缩放版本更新列的示例:
Another example of doing that updating columns with their per-row-scaled version:
dt = data.table对象
dt = data.table object
dt[,grep("keyword",colnames(dt),value=T) := as.data.table(t(apply(dt[,grep("keyword",colnames(dt)),with=F],1,scale)))]
太可惜了,它需要里面的 as.data.table部分,因为apply函数的转置值是一个矩阵。
Too bad it needs the "as.data.table" part inside, as the transposed value from apply function is a matrix. Maybe data.table should automatically coerce matrices into data.tables upon updating of columns?
推荐答案
也许您在扩展列时会自动将矩阵强制转换为data.tables?按行,您可以尝试分两个步骤进行操作:
If what you need is really to scale by row, you can try doing it in 2 steps:
# compute mean/sd:
mean_sd <- DT[, .(mean(unlist(.SD)), sd(unlist(.SD))), by=1:nrow(DT), .SDcols=grep("keyword",colnames(DT))]
# scale
DT[, grep("keyword",colnames(DT), value=TRUE) := lapply(.SD, function(x) (x-mean_sd$V1)/mean_sd$V2), .SDcols=grep("keyword",colnames(DT))]
这篇关于data.table:使用函数逐行转换列的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!