将汇总函数应用于多个变量
# example data
DT = data.table(iris)
DT[, Bin := cut(Sepal.Length, c(4,6,8))]
要按组分别对每个列应用相同的汇总函数,我们可以使用 lapply
和 .SD
DT[, lapply(.SD, median), by=.(Species, Bin)]
# Species Bin Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1: setosa (4,6] 5.0 3.4 1.50 0.2
# 2: versicolor (6,8] 6.4 2.9 4.60 1.4
# 3: versicolor (4,6] 5.6 2.7 4.05 1.3
# 4: virginica (6,8] 6.7 3.0 5.60 2.1
# 5: virginica (4,6] 5.8 2.7 5.00 1.9
我们可以使用 .SDcols
参数过滤 .SD
中的列:
DT[, lapply(.SD, median), by=.(Species, Bin), .SDcols="Petal.Length"]
# Species Bin Petal.Length
# 1: setosa (4,6] 1.50
# 2: versicolor (6,8] 4.60
# 3: versicolor (4,6] 4.05
# 4: virginica (6,8] 5.60
# 5: virginica (4,6] 5.00
多个汇总功能
目前,对多个函数的最简单扩展可能是:
DT[, unlist(recursive=FALSE, lapply(
.(med = median, iqr = IQR),
function(f) lapply(.SD, f)
)), by=.(Species, Bin), .SDcols=Petal.Length:Petal.Width]
# Species Bin med.Petal.Length med.Petal.Width iqr.Petal.Length iqr.Petal.Width
# 1: setosa (4,6] 1.50 0.2 0.175 0.100
# 2: versicolor (6,8] 4.60 1.4 0.300 0.200
# 3: versicolor (4,6] 4.05 1.3 0.525 0.275
# 4: virginica (6,8] 5.60 2.1 0.700 0.500
# 5: virginica (4,6] 5.00 1.9 0.200 0.200
如果你希望名称类似于 Petal.Length.med
而不是 med.Petal.Length
,请更改顺序:
DT[, unlist(recursive=FALSE, lapply(
.SD,
function(x) lapply(.(med = median, iqr = IQR), function(f) f(x))
)), by=.(Species, Bin), .SDcols=Petal.Length:Petal.Width]
# Species Bin Petal.Length.med Petal.Length.iqr Petal.Width.med Petal.Width.iqr
# 1: setosa (4,6] 1.50 0.175 0.2 0.100
# 2: versicolor (6,8] 4.60 0.300 1.4 0.200
# 3: versicolor (4,6] 4.05 0.525 1.3 0.275
# 4: virginica (6,8] 5.60 0.700 2.1 0.500
# 5: virginica (4,6] 5.00 0.200 1.9 0.200