Wednesday, 7 August 2013

how to integrate properties defined on multiple rows using a data.frame or data.table long format approach

how to integrate properties defined on multiple rows using a data.frame or
data.table long format approach

I have been recently starting to use the data.table package in R. I find
it super-convenient for transforming and aggregating data. One thing that
I miss is how do you transform data that are defined on multiple rows? Do
I need to reshape the data.frame/table in a wide format first?
Say you have the following data table:
dt=data.table(group=c("a","a","a","b","b","b"),
subg=c("f1","f2","f3","f1","f2","f3"),
counts=c(3,4,5,8,9,10))
and for each group you want to calculate the relative frequency of each
subgroup (c1/(c1+c2+c3)) and other properties as a function of c1, c2 ,c3
(c1, c2, c3 are the counts associated to f1, f2 and f3).
I can see how transform the data table in a wide format and then apply the
transformation. Is there any way to calculate this directly in the long
format (ideally using the data table)?
In general the group and subgroup could be represented by multiple factors.

No comments:

Post a Comment