刪除密切相關的功能
密切相關的特徵可能會增加模型的差異,刪除相關對中的一個可能有助於減少這種差異。有很多方法可以檢測相關性。這是一個:
library(purrr) # in order to use keep()
# select correlatable vars
toCorrelate<-mtcars %>% keep(is.numeric)
# calculate correlation matrix
correlationMatrix <- cor(toCorrelate)
# pick only one out of each highly correlated pair's mirror image
correlationMatrix[upper.tri(correlationMatrix)]<-0
# and I don't remove the highly-correlated-with-itself group
diag(correlationMatrix)<-0
# find features that are highly correlated with another feature at the +- 0.85 level
apply(correlationMatrix,2, function(x) any(abs(x)>=0.85))
mpg cyl disp hp drat wt qsec vs am gear carb
TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
我想看看 MPG 與之相關的是什麼,並決定要保留什麼和折騰什麼。對於 cyl 和 disp 也是如此。或者,我可能需要結合一些強相關的功能。