使用函数逐行转换列的子集

使用函数逐行转换列的子集

本文介绍了data.table:使用函数逐行转换列的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

拥有一个主要包含数字值的data.table怎么只转换一部分列并将其放回原始数据表?通常,我不想将任何摘要统计信息添加为单独的列,只需交换转换后的统计信息即可。

How can one, having a data.table with mostly numeric values, transform just a subset of columns and put them back to the original data table? Generally, I don't want to add any summary statistic as a separate column, just exchange the transformed ones.

假设我们有DT。它具有1列名称和10列数字值。我对数据表的每一行都使用基R的缩放功能感兴趣,但仅适用于那10个数字列。

Assume we have a DT. It has 1 column with names and 10 columns with numeric values. I am interested in using "scale" function of base R for each row of that data table, but only applied to those 10 numeric columns.

并对此进行扩展。如果我有一个包含更多列的数据表,并且需要使用列名来告诉scale函数在哪个数据点上应用该函数,该怎么办?

And to expand on this. What if I have a data table with more columns and I need to use column names to tell the scale function on which datapoints to apply the function?

使用常规data.frame我只会做:

With regular data.frame I would just do:

df[,grep("keyword",colnames(df))] <- t(apply(df[,grep("keyword",colnames(df))],1,scale))

我知道这看起来很麻烦,但始终对我有用。但是,我找不到在data.tables中执行此操作的简单方法。

I know this looks cumbersome but always worked for me. However, I can't figure out a simple way to do it in data.tables.

我会想象这样的图像对data.tables起作用:

I would image something like this to work for data.tables:

dt[,grep("keyword",colnames(dt)) := scale(grep("keyword",colnames(dt)),center=F)]

但不是。

编辑:

另一个使用按行缩放版本更新列的示例:

Another example of doing that updating columns with their per-row-scaled version:

dt = data.table对象

dt = data.table object

dt[,grep("keyword",colnames(dt),value=T) := as.data.table(t(apply(dt[,grep("keyword",colnames(dt)),with=F],1,scale)))]

太可惜了,它需要里面的 as.data.table部分,因为apply函数的转置值是一个矩阵。

Too bad it needs the "as.data.table" part inside, as the transposed value from apply function is a matrix. Maybe data.table should automatically coerce matrices into data.tables upon updating of columns?

推荐答案

也许您在扩展列时会自动将矩阵强制转换为data.tables?按行,您可以尝试分两个步骤进行操作:

If what you need is really to scale by row, you can try doing it in 2 steps:

# compute mean/sd:
mean_sd <- DT[, .(mean(unlist(.SD)), sd(unlist(.SD))), by=1:nrow(DT), .SDcols=grep("keyword",colnames(DT))]

# scale
DT[, grep("keyword",colnames(DT), value=TRUE) := lapply(.SD, function(x) (x-mean_sd$V1)/mean_sd$V2), .SDcols=grep("keyword",colnames(DT))]

这篇关于data.table:使用函数逐行转换列的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 03:19