问题描述
我有一个数据表,形式为
ID区域INCOME_BAND RESIDENCY_YEARS
1个软件在5,000以下10-15
2威尔士超过70,000 1-5
3中心15,000-19,999 6-9
4 SE 15,000-19,999 15-19
5北15,000-19,999 10-15
6北15,000-19,999 6-9
创建者
exp = data.table(
ID = c(1,2,3,4,5,6),
REGION = c( SW, Wales, Center, SE, North, North),
INCOME_BAND = c(?5,000以下, 70,000以上, 15,000-?19,999 ,?15,000-?19,999,?15,000-?19,999,?15,000-?19,999),
RESIDENCY_YEARS = c( 10-15, 1-5, 6- 9, 15-19, 10-15, 6-9)))
我想将其转换为
我已经设法通过dcast执行了大部分工作:
exp.dcast = dcast(exp,ID〜REGION + INCOME_BAND + RESIDENCY_YEARS,fun = length,
value.var = c('REGION','INCOME_BAND','RESIDENCY_YEARS'))
但是我需要一些帮助来创建合理的列标题。
当前我有
我希望列标题为
ID SW威尔士中心东南部5,000以下70,000以上15,000-19,999 1-5 6-9 10-15 15-19
有人可以建议吗?
这个看似简单的问题并不容易回答。因此,我们将逐步进行。
首先,OP尝试同时重塑多个值列,这会造成不必要的交叉所有可用组合的乘积。
为了以相同方式处理所有值,我们需要 melt()
重整前先排所有值列:
melt(exp,id.vars = ID)[,dcast(.SD ,ID〜值,长度)]
现在,结果有13列而不是19列,并且这些列按要求由各自的值命名。
不幸的是,这些列显示顺序错误,因为它们是按字母顺序排列的。更改顺序有两种方法:
重塑后更改列的顺序
setcolorder()
函数对 data.table
的列进行重新排序,例如无需复制:
#定义列顺序=值的顺序
col_order<-c( North,威尔士,中锋, SW, SE,低于5,000欧元, 15,000--19,999英镑,超过70,000欧元, 1-5, 6-9, 10- 15, 15-19)
melt(exp,id.vars = ID)[,dcast(.SD,ID〜值,长度)] [
#重新排序列
,setcolorder(.SD,c( ID,col_order))]
现在,所有 REGION
列首先出现,然后依次是 INCOME_BAND
和 RESIDENCY_YEARS
列按指定顺序。
在重塑之前设置因子水平
如果将值
转换为具有适当排序的因子水平的因子,则 dcast()
将使用因子水平排序列:
melt(exp,id.vars = ID)[,value:= factor(value,col_order )] [
,dcast(.SD,ID〜值,长度)]
在重塑之前设置因子水平-惰性版本
如果具有足够的列由 REGION
, INCOME_BAND
和 RESIDENCY_YEARS
分组使用快捷方式以避免在 col_order
中指定每个值。 forcats
包中的 fct_inorder()
函数通过它们在向量中的首次出现来重新排序因子水平:
melt(exp,id.vars = ID)[,value:= factor(value,col_order)] [
, dcast(.SD,ID〜值,长度)]
之所以有效,是因为输出 melt()
由变量
排序:
melt(exp,id.vars = ID)
And I would like the column headings to be
ID SW Wales Center SE North Under 5,000 Over 70,000 15,000-19,999 1-5 6-9 10-15 15-19
Could anybody advise?
This apparently simple question is not easy to answer. So, we will go forward step-by step.
First, the OP has tried to reshape multiple value columns simultaneously which creates an unwanted cross product of all available combinations.
In order to treat all values in the same way, we need to melt()
all value columns first before reshaping:
melt(exp, id.vars = "ID")[, dcast(.SD, ID ~ value, length)]
Now, the result has 13 columns instead of 19 and the columns are named by the respective value as requested.
Unfortunately, the columns appear in the wrong order because they alphabetically ordered. There are two approaches to change the order:
Change order of columns after reshaping
The setcolorder()
function reorders the columns of a data.table
in place, e.g. without copying:
# define column order = order of values
col_order <- c("North", "Wales", "Center", "SW", "SE", "Under ?5,000", "?15,000-?19,999", "Over ?70,000", "1-5", "6-9", "10-15", "15-19")
melt(exp, id.vars = "ID")[, dcast(.SD, ID ~ value, length)][
# reorder columns
, setcolorder(.SD, c("ID", col_order))]
Now, all REGION
columns appear first, followed by INCOME_BAND
and RESIDENCY_YEARS
columns in the specified order.
Set factor levels before reshaping
If value
is turned into a factor with appropriately ordered factor levels dcast()
will use the factor levels for ordering the columns:
melt(exp, id.vars = "ID")[, value := factor(value, col_order)][
, dcast(.SD, ID ~ value, length)]
Set factor levels before reshaping - lazy version
If it is sufficient to have the columns grouped by REGION
, INCOME_BAND
, and RESIDENCY_YEARS
then we can use a short cut to avoid specifying each value in col_order
. The fct_inorder()
function from the forcats
package reorders factor levels by their first appearance in a vector:
melt(exp, id.vars = "ID")[, value := factor(value, col_order)][
, dcast(.SD, ID ~ value, length)]
This works because the output of melt()
is ordered by variable
:
melt(exp, id.vars = "ID")
这篇关于数据表dcast列标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!