助手功能
辅助函数与 select
一起使用以识别要返回的变量。除非另有说明,否则这些函数需要一个字符串作为第一个参数 match
。传递矢量或其他对象将产生错误。
library(dplyr)
library(nycflights13)
以。。开始
starts_with
允许我们识别名称以字符串开头的变量。
返回以字母 e
开头的所有变量。
planes %>% select(starts_with("e"))
## # A tibble: 3,322 × 2
## engines engine
## <int> <chr>
## 1 2 Turbo-fan
## 2 2 Turbo-fan
## 3 2 Turbo-fan
## 4 2 Turbo-fan
## 5 2 Turbo-fan
## 6 2 Turbo-fan
## 7 2 Turbo-fan
## 8 2 Turbo-fan
## 9 2 Turbo-fan
## 10 2 Turbo-fan
## # ... with 3,312 more rows
对于严格套管,将 ignore.case
参数设置为 FALSE。
planes %>% select(starts_with("E", ignore.case = FALSE))
## # A tibble: 3,322 × 0
以。。结束
返回以字母 e
结尾的所有变量。
planes %>% select(ends_with("e"))
## # A tibble: 3,322 × 2
## type engine
## <chr> <chr>
## 1 Fixed wing multi engine Turbo-fan
## 2 Fixed wing multi engine Turbo-fan
## 3 Fixed wing multi engine Turbo-fan
## 4 Fixed wing multi engine Turbo-fan
## 5 Fixed wing multi engine Turbo-fan
## 6 Fixed wing multi engine Turbo-fan
## 7 Fixed wing multi engine Turbo-fan
## 8 Fixed wing multi engine Turbo-fan
## 9 Fixed wing multi engine Turbo-fan
## 10 Fixed wing multi engine Turbo-fan
## # ... with 3,312 more rows
对于严格的套管,将 ignore.case
参数设置为 FALSE。
planes %>% select(ends_with("E", ignore.case = FALSE))
## # A tibble: 3,322 × 0
包含
contains
允许你查找包含给定字符串的任何变量。
planes %>% select(contains("ea"))
## # A tibble: 3,322 × 2
## year seats
## <int> <int>
## 1 2004 55
## 2 1998 182
## 3 1999 182
## 4 1999 182
## 5 2002 55
## 6 1999 182
## 7 1999 182
## 8 1999 182
## 9 1999 182
## 10 1999 182
## # ... with 3,312 more rows
对于严格套管,将 ignore.case
参数设置为 FALSE。
planes %>% select(contains("EA", ignore.case = FALSE))
## # A tibble: 3,322 × 0
匹配
matches
是唯一允许使用正则表达式的辅助函数。
返回名称至少为六个字母字符的所有变量:
planes %>% select(matches("[[:alpha:]]{6,}"))
## # A tibble: 3,322 × 4
## tailnum manufacturer engines engine
## <chr> <chr> <int> <chr>
## 1 N10156 EMBRAER 2 Turbo-fan
## 2 N102UW AIRBUS INDUSTRIE 2 Turbo-fan
## 3 N103US AIRBUS INDUSTRIE 2 Turbo-fan
## 4 N104UW AIRBUS INDUSTRIE 2 Turbo-fan
## 5 N10575 EMBRAER 2 Turbo-fan
## 6 N105UW AIRBUS INDUSTRIE 2 Turbo-fan
## 7 N107US AIRBUS INDUSTRIE 2 Turbo-fan
## 8 N108UW AIRBUS INDUSTRIE 2 Turbo-fan
## 9 N109UW AIRBUS INDUSTRIE 2 Turbo-fan
## 10 N110UW AIRBUS INDUSTRIE 2 Turbo-fan
## # ... with 3,312 more rows
对于严格套管,将 ignore.case
参数设置为 FALSE。
num_range
对于此示例,我将生成具有随机值和顺序变量名称的虚拟数据帧。
set.seed(1)
df <- data.frame(x1 = runif(10),
x2 = runif(10),
x3 = runif(10),
x4 = runif(10),
x5 = runif(10))
num_range
可用于选择一系列的变量,给定一致的 prefix
。
从 df
中选择变量 2:4:
df %>% select(num_range('x', range = 2:4))
## x2 x3 x4
## 1 0.2059746 0.93470523 0.4820801
## 2 0.1765568 0.21214252 0.5995658
## 3 0.6870228 0.65167377 0.4935413
## 4 0.3841037 0.12555510 0.1862176
## 5 0.7698414 0.26722067 0.8273733
## 6 0.4976992 0.38611409 0.6684667
## 7 0.7176185 0.01339033 0.7942399
## 8 0.9919061 0.38238796 0.1079436
## 9 0.3800352 0.86969085 0.7237109
## 10 0.7774452 0.34034900 0.4112744
one_of
one_of
可以将矢量作为 match
参数并返回每个变量。
planes %>% select(one_of(c("tailnum", "model")))
## # A tibble: 3,322 × 2
## tailnum model
## <chr> <chr>
## 1 N10156 EMB-145XR
## 2 N102UW A320-214
## 3 N103US A320-214
## 4 N104UW A320-214
## 5 N10575 EMB-145LR
## 6 N105UW A320-214
## 7 N107US A320-214
## 8 N108UW A320-214
## 9 N109UW A320-214
## 10 N110UW A320-214
## # ... with 3,312 more rows
一切
everything
可用于重新定位数据框中的变量。
将 manufacturer
设为第一个变量,然后是所有剩余变量。
planes %>% select(manufacturer, everything())
## # A tibble: 3,322 × 9
## manufacturer tailnum year type model
## <chr> <chr> <int> <chr> <chr>
## 1 EMBRAER N10156 2004 Fixed wing multi engine EMB-145XR
## 2 AIRBUS INDUSTRIE N102UW 1998 Fixed wing multi engine A320-214
## 3 AIRBUS INDUSTRIE N103US 1999 Fixed wing multi engine A320-214
## 4 AIRBUS INDUSTRIE N104UW 1999 Fixed wing multi engine A320-214
## 5 EMBRAER N10575 2002 Fixed wing multi engine EMB-145LR
## 6 AIRBUS INDUSTRIE N105UW 1999 Fixed wing multi engine A320-214
## 7 AIRBUS INDUSTRIE N107US 1999 Fixed wing multi engine A320-214
## 8 AIRBUS INDUSTRIE N108UW 1999 Fixed wing multi engine A320-214
## 9 AIRBUS INDUSTRIE N109UW 1999 Fixed wing multi engine A320-214
## 10 AIRBUS INDUSTRIE N110UW 1999 Fixed wing multi engine A320-214
## # ... with 3,312 more rows, and 4 more variables: engines <int>,
## # seats <int>, speed <int>, engine <chr>
其他助手
虽然:
和 -
运算符不属于 dplyr
包,但我们仍然可以使用它们来识别要返回的变量。
:
定义要返回的包含范围的变量。
将每个变量从 year
返回到 manufacturer
:
planes %>% select(year:manufacturer)
## # A tibble: 3,322 × 3
## year type manufacturer
## <int> <chr> <chr>
## 1 2004 Fixed wing multi engine EMBRAER
## 2 1998 Fixed wing multi engine AIRBUS INDUSTRIE
## 3 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 4 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 5 2002 Fixed wing multi engine EMBRAER
## 6 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 7 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 8 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 9 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 10 1999 Fixed wing multi engine AIRBUS INDUSTRIE
## # ... with 3,312 more rows
返回多个变量范围:
planes %>% select(c(year:manufacturer, seats:engine))
## # A tibble: 3,322 × 6
## year type manufacturer seats speed engine
## <int> <chr> <chr> <int> <int> <chr>
## 1 2004 Fixed wing multi engine EMBRAER 55 NA Turbo-fan
## 2 1998 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 3 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 4 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 5 2002 Fixed wing multi engine EMBRAER 55 NA Turbo-fan
## 6 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 7 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 8 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 9 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## 10 1999 Fixed wing multi engine AIRBUS INDUSTRIE 182 NA Turbo-fan
## # ... with 3,312 more rows
-
-
运算符将从结果集中删除变量。
返回除 type
之外的所有变量:
planes %>% select(-type)
## # A tibble: 3,322 × 8
## tailnum year manufacturer model engines seats speed engine
## <chr> <int> <chr> <chr> <int> <int> <int> <chr>
## 1 N10156 2004 EMBRAER EMB-145XR 2 55 NA Turbo-fan
## 2 N102UW 1998 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 3 N103US 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 4 N104UW 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 5 N10575 2002 EMBRAER EMB-145LR 2 55 NA Turbo-fan
## 6 N105UW 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 7 N107US 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 8 N108UW 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 9 N109UW 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## 10 N110UW 1999 AIRBUS INDUSTRIE A320-214 2 182 NA Turbo-fan
## # ... with 3,312 more rows
你还可以传递变量名称向量以从结果集中排除。
planes %>% select(-c(type, engines:engine))
## # A tibble: 3,322 × 4
## tailnum year manufacturer model
## <chr> <int> <chr> <chr>
## 1 N10156 2004 EMBRAER EMB-145XR
## 2 N102UW 1998 AIRBUS INDUSTRIE A320-214
## 3 N103US 1999 AIRBUS INDUSTRIE A320-214
## 4 N104UW 1999 AIRBUS INDUSTRIE A320-214
## 5 N10575 2002 EMBRAER EMB-145LR
## 6 N105UW 1999 AIRBUS INDUSTRIE A320-214
## 7 N107US 1999 AIRBUS INDUSTRIE A320-214
## 8 N108UW 1999 AIRBUS INDUSTRIE A320-214
## 9 N109UW 1999 AIRBUS INDUSTRIE A320-214
## 10 N110UW 1999 AIRBUS INDUSTRIE A320-214
## # ... with 3,312 more rows
辅助函数的任意组合
选择 type
和 speed
(包括)之间的所有变量并排除 manufacturer
。
planes %>% select(type:speed, -manufacturer)
## # A tibble: 3,322 × 5
## type model engines seats speed
## <chr> <chr> <int> <int> <int>
## 1 Fixed wing multi engine EMB-145XR 2 55 NA
## 2 Fixed wing multi engine A320-214 2 182 NA
## 3 Fixed wing multi engine A320-214 2 182 NA
## 4 Fixed wing multi engine A320-214 2 182 NA
## 5 Fixed wing multi engine EMB-145LR 2 55 NA
## 6 Fixed wing multi engine A320-214 2 182 NA
## 7 Fixed wing multi engine A320-214 2 182 NA
## 8 Fixed wing multi engine A320-214 2 182 NA
## 9 Fixed wing multi engine A320-214 2 182 NA
## 10 Fixed wing multi engine A320-214 2 182 NA
## # ... with 3,312 more rows
修改前一个语句以排除 manufacturer
和 model
。
planes %>% select(type:speed, -c(manufacturer, model))
## # A tibble: 3,322 × 4
## type engines seats speed
## <chr> <int> <int> <int>
## 1 Fixed wing multi engine 2 55 NA
## 2 Fixed wing multi engine 2 182 NA
## 3 Fixed wing multi engine 2 182 NA
## 4 Fixed wing multi engine 2 182 NA
## 5 Fixed wing multi engine 2 55 NA
## 6 Fixed wing multi engine 2 182 NA
## 7 Fixed wing multi engine 2 182 NA
## 8 Fixed wing multi engine 2 182 NA
## 9 Fixed wing multi engine 2 182 NA
## 10 Fixed wing multi engine 2 182 NA
## # ... with 3,312 more rows
你可以多次使用相同的辅助函数。
planes %>% select(starts_with("m"), starts_with("s"))
## # A tibble: 3,322 × 4
## manufacturer model seats speed
## <chr> <chr> <int> <int>
## 1 EMBRAER EMB-145XR 55 NA
## 2 AIRBUS INDUSTRIE A320-214 182 NA
## 3 AIRBUS INDUSTRIE A320-214 182 NA
## 4 AIRBUS INDUSTRIE A320-214 182 NA
## 5 EMBRAER EMB-145LR 55 NA
## 6 AIRBUS INDUSTRIE A320-214 182 NA
## 7 AIRBUS INDUSTRIE A320-214 182 NA
## 8 AIRBUS INDUSTRIE A320-214 182 NA
## 9 AIRBUS INDUSTRIE A320-214 182 NA
## 10 AIRBUS INDUSTRIE A320-214 182 NA
## # ... with 3,312 more rows
你可以一起使用多个辅助函数:
planes %>% select(starts_with("m"), ends_with("l"))
## # A tibble: 3,322 × 2
## manufacturer model
## <chr> <chr>
## 1 EMBRAER EMB-145XR
## 2 AIRBUS INDUSTRIE A320-214
## 3 AIRBUS INDUSTRIE A320-214
## 4 AIRBUS INDUSTRIE A320-214
## 5 EMBRAER EMB-145LR
## 6 AIRBUS INDUSTRIE A320-214
## 7 AIRBUS INDUSTRIE A320-214
## 8 AIRBUS INDUSTRIE A320-214
## 9 AIRBUS INDUSTRIE A320-214
## 10 AIRBUS INDUSTRIE A320-214
## # ... with 3,312 more rows