改變和重新排序因素

Created: November-22, 2018

當使用預設值建立因子時，levels 由應用於輸入的 as.character 形成，並按字母順序排序。

charvar <- rep(c("W", "n", "c"), times=c(17,20,14))
f <- factor(charvar)
levels(f)
# [1] "c" "n" "W"

在某些情況下，對 levels（字母/詞彙順序）的預設排序的處理是可以接受的。例如，如果一個人只想要頻率，那將是結果：

plot(f,col=1:length(levels(f)))

StackOverflow 文件

但是如果我們想要 levels 的不同排序，我們需要在 levels 或 labels 引數中指定它（注意這裡 order 的含義與有序因子不同，見下文）。根據具體情況，有許多替代方案可以完成該任務。

1.重新定義因素

如果可能，我們可以使用 levels 引數和我們想要的順序重新建立因子。

ff <- factor(charvar, levels = c("n", "W", "c"))
levels(ff)
# [1] "n" "W" "c"
 
gg <- factor(charvar, levels = c("W", "c", "n"))
levels(gg)
# [1] "W" "c" "n"

當輸入電平不同於所需的輸出電平時，我們使用 labels 引數使得 levels 引數成為可接受輸入值的濾波器，但將因子向量的電平的最終值留作引數。labels：

fm <- factor(as.numeric(f),levels = c(2,3,1),
             labels = c("nn", "WW", "cc"))
levels(fm)
# [1] "nn" "WW" "cc"

fm <- factor(LETTERS[1:6], levels = LETTERS[1:4],  # only 'A'-'D' as input
                 labels = letters[1:4])            # but assigned to 'a'-'d'
fm
#[1] a    b    c    d    <NA> <NA>
#Levels: a b c d

2.使用 `relevel` 功能

當有一個特定的 level 需要是第一個我們可以使用 relevel。例如，在統計分析的背景下，當測試假設需要 base 類別時，就會發生這種情況。

g<-relevel(f, "n") # moves n to be the first level
levels(g)
# [1] "n" "c" "W"

可以證實 f 和 g 是相同的

all.equal(f, g)
# [1] "Attributes: < Component `levels`: 2 string mismatches >"
all.equal(f, g, check.attributes = F)
# [1] TRUE

3.重新排序因素

有些情況下，我們需要根據數字，部分結果，計算統計資料或先前的計算來計算 levels。讓我們根據 levels 的頻率重新排序

table(g)
# g
#  n  c  W 
# 20 14 17

reorder 函式是通用的（參見 help(reorder)），但在這種情況下需要：x，在這種情況下是因子; X，與 x 相同長度的數值; 和 FUN，一個應用於 X 的函式，由 x 的級別計算，它確定 levels 的順序，預設增加。結果與其重新排序的級別相同。

g.ord <- reorder(g,rep(1,length(g)), FUN=sum) #increasing
levels(g.ord)
# [1] "c" "W" "n"

為了得到降序，我們考慮負值（-1）

g.ord.d <- reorder(g,rep(-1,length(g)), FUN=sum)
levels(g.ord.d)
# [1] "n" "W" "c"

該因素再次與其他因素相同。

data.frame(f,g,g.ord,g.ord.d)[seq(1,length(g),by=5),] #just same lines
#    f g g.ord g.ord.d
# 1  W W     W       W
# 6  W W     W       W
# 11 W W     W       W
# 16 W W     W       W
# 21 n n     n       n
# 26 n n     n       n
# 31 n n     n       n
# 36 n n     n       n
# 41 c c     c       c
# 46 c c     c       c
# 51 c c     c       c

當存在與因子變數相關的定量變數時，我們可以使用其他函式對 levels 進行重新排序。讓我們看一下 iris 資料（help("iris") 獲取更多資訊），通過使用它的平均值來重新排序 Species 因子 35。

miris <- iris  #help("iris") # copy the data
with(miris, tapply(Sepal.Width,Species,mean))
#    setosa versicolor  virginica 
#     3.428      2.770      2.974 

miris$Species.o<-with(miris,reorder(Species,-Sepal.Width))
levels(miris$Species.o)
# [1] "setosa"     "virginica"  "versicolor"

通常的 boxplot（比如說：with(miris, boxplot(Petal.Width~Species)）會按順序顯示 especies： setosa ， versicolor 和 virginica 。但是使用有序因子我們得到的物種按其平均值排序 38：

boxplot(Petal.Width~Species.o, data = miris,
        xlab = "Species", ylab = "Petal Width",
        main = "Iris Data, ordered by mean sepal width", varwidth = TRUE,
        col = 2:4)

StackOverflow 文件

此外，還可以更改 levels 的名稱，將它們組合成組，或新增新的 levels。為此，我們使用相同名稱 levels 的功能。

f1<-f
levels(f1)
# [1] "c" "n" "W"
levels(f1) <- c("upper","upper","CAP") #rename and grouping
levels(f1)
# [1] "upper" "CAP" 

f2<-f1
levels(f2) <- c("upper","CAP", "Number") #add Number level, which is empty
levels(f2)
# [1] "upper"  "CAP"    "Number"
f2[length(f2):(length(f2)+5)]<-"Number" # add cases for the new level
table(f2)
# f2
#  upper    CAP Number 
#     33     17      6 

f3<-f1
levels(f3) <- list(G1 = "upper", G2 = "CAP", G3 = "Number") # The same using list
levels(f3)
# [1] "G1" "G2" "G3"
f3[length(f3):(length(f3)+6)]<-"G3" ## add cases for the new level
table(f3)
# f3
# G1 G2 G3 
# 33 17  7

- 有序因素

最後，我們知道 ordered 因子與 factors 不同，第一個用於表示序資料，第二個用於標稱資料。首先，改變 levels 對有序因子的順序是沒有意義的，但我們可以改變它的 labels。

ordvar<-rep(c("Low", "Medium", "High"), times=c(7,2,4))

of<-ordered(ordvar,levels=c("Low", "Medium", "High"))
levels(of)
# [1] "Low"    "Medium" "High" 

of1<-of
levels(of1)<- c("LOW", "MEDIUM", "HIGH")
levels(of1)
# [1] "LOW"    "MEDIUM" "HIGH"
is.ordered(of1)
# [1] TRUE
of1
# [1] LOW    LOW    LOW    LOW    LOW    LOW    LOW    MEDIUM MEDIUM HIGH   HIGH   HIGH   HIGH  
# Levels: LOW < MEDIUM < HIGH

1.重新定義因素

2.使用 relevel 功能

3.重新排序因素

- 有序因素

2.使用 `relevel` 功能