基本创造因素
因子是在 R 中表示分类变量的一种方式。因子在内部存储为整数向量。提供的字符向量的唯一元素称为因子的级别。默认情况下,如果用户未提供级别,则 R 将在向量中生成一组唯一值,按字母数字顺序对这些值进行排序,并将它们用作级别。
charvar <- rep(c("n", "c"), each = 3)
f <- factor(charvar)
f
levels(f)
> f
[1] n n n c c c
Levels: c n
> levels(f)
[1] "c" "n"
如果要更改级别的顺序,则可以选择手动指定级别:
levels(factor(charvar, levels = c("n","c")))
> levels(factor(charvar, levels = c("n","c")))
[1] "n" "c"
因素有很多属性。例如,级别可以给标签:
> f <- factor(charvar, levels=c("n", "c"), labels=c("Newt", "Capybara"))
> f
[1] Newt Newt Newt Capybara Capybara Capybara
Levels: Newt Capybara
可以分配的另一个属性是该因子是否有序:
> Weekdays <- factor(c("Monday", "Wednesday", "Thursday", "Tuesday", "Friday", "Sunday", "Saturday"))
> Weekdays
[1] Monday Wednesday Thursday Tuesday Friday Sunday Saturday
Levels: Friday Monday Saturday Sunday Thursday Tuesday Wednesday
> Weekdays <- factor(Weekdays, levels=c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"), ordered=TRUE)
> Weekdays
[1] Monday Wednesday Thursday Tuesday Friday Sunday Saturday
Levels: Monday < Tuesday < Wednesday < Thursday < Friday < Saturday < Sunday
当不再使用因子级别时,可以使用 droplevels()
函数将其删除:
> Weekend <- subset(Weekdays, Weekdays == "Saturday" | Weekdays == "Sunday")
> Weekend
[1] Sunday Saturday
Levels: Monday < Tuesday < Wednesday < Thursday < Friday < Saturday < Sunday
> Weekend <- droplevels(Weekend)
> Weekend
[1] Sunday Saturday
Levels: Saturday < Sunday