助手功能

辅助函数与 select 一起使用以识别要返回的变量。除非另有说明,否则这些函数需要一个字符串作为第一个参数 match。传递矢量或其他对象将产生错误。

library(dplyr)
library(nycflights13)

以。。开始

starts_with 允许我们识别名称以字符串开头的变量。

返回以字母 e 开头的所有变量。

planes %>% select(starts_with("e"))
## # A tibble: 3,322 × 2
##    engines    engine
##      <int>     <chr>
## 1        2 Turbo-fan
## 2        2 Turbo-fan
## 3        2 Turbo-fan
## 4        2 Turbo-fan
## 5        2 Turbo-fan
## 6        2 Turbo-fan
## 7        2 Turbo-fan
## 8        2 Turbo-fan
## 9        2 Turbo-fan
## 10       2 Turbo-fan
## # ... with 3,312 more rows

对于严格套管,将 ignore.case 参数设置为 FALSE。

planes %>% select(starts_with("E", ignore.case = FALSE))
## # A tibble: 3,322 × 0

以。。结束

返回以字母 e 结尾的所有变量。

planes %>% select(ends_with("e"))
## # A tibble: 3,322 × 2
##                       type    engine
##                      <chr>     <chr>
## 1  Fixed wing multi engine Turbo-fan
## 2  Fixed wing multi engine Turbo-fan
## 3  Fixed wing multi engine Turbo-fan
## 4  Fixed wing multi engine Turbo-fan
## 5  Fixed wing multi engine Turbo-fan
## 6  Fixed wing multi engine Turbo-fan
## 7  Fixed wing multi engine Turbo-fan
## 8  Fixed wing multi engine Turbo-fan
## 9  Fixed wing multi engine Turbo-fan
## 10 Fixed wing multi engine Turbo-fan
## # ... with 3,312 more rows

对于严格的套管,将 ignore.case 参数设置为 FALSE。

planes %>% select(ends_with("E", ignore.case = FALSE))
## # A tibble: 3,322 × 0

包含

contains 允许你查找包含给定字符串的任何变量。

planes %>% select(contains("ea"))
## # A tibble: 3,322 × 2
##     year seats
##    <int> <int>
## 1   2004    55
## 2   1998   182
## 3   1999   182
## 4   1999   182
## 5   2002    55
## 6   1999   182
## 7   1999   182
## 8   1999   182
## 9   1999   182
## 10  1999   182
## # ... with 3,312 more rows

对于严格套管,将 ignore.case 参数设置为 FALSE。

planes %>% select(contains("EA", ignore.case = FALSE))
## # A tibble: 3,322 × 0

匹配

matches 是唯一允许使用正则表达式的辅助函数。

返回名称至少为六个字母字符的所有变量:

planes %>% select(matches("[[:alpha:]]{6,}"))
## # A tibble: 3,322 × 4
##    tailnum     manufacturer engines    engine
##      <chr>            <chr>   <int>     <chr>
## 1   N10156          EMBRAER       2 Turbo-fan
## 2   N102UW AIRBUS INDUSTRIE       2 Turbo-fan
## 3   N103US AIRBUS INDUSTRIE       2 Turbo-fan
## 4   N104UW AIRBUS INDUSTRIE       2 Turbo-fan
## 5   N10575          EMBRAER       2 Turbo-fan
## 6   N105UW AIRBUS INDUSTRIE       2 Turbo-fan
## 7   N107US AIRBUS INDUSTRIE       2 Turbo-fan
## 8   N108UW AIRBUS INDUSTRIE       2 Turbo-fan
## 9   N109UW AIRBUS INDUSTRIE       2 Turbo-fan
## 10  N110UW AIRBUS INDUSTRIE       2 Turbo-fan
## # ... with 3,312 more rows

对于严格套管,将 ignore.case 参数设置为 FALSE。

num_range

对于此示例,我将生成具有随机值和顺序变量名称的虚拟数据帧。

set.seed(1)
df <- data.frame(x1 = runif(10), 
                 x2 = runif(10), 
                 x3 = runif(10), 
                 x4 = runif(10), 
                 x5 = runif(10))

num_range 可用于选择一系列的变量,给定一致的 prefix

df 中选择变量 2:4:

df %>% select(num_range('x', range = 2:4))
##           x2         x3        x4
## 1  0.2059746 0.93470523 0.4820801
## 2  0.1765568 0.21214252 0.5995658
## 3  0.6870228 0.65167377 0.4935413
## 4  0.3841037 0.12555510 0.1862176
## 5  0.7698414 0.26722067 0.8273733
## 6  0.4976992 0.38611409 0.6684667
## 7  0.7176185 0.01339033 0.7942399
## 8  0.9919061 0.38238796 0.1079436
## 9  0.3800352 0.86969085 0.7237109
## 10 0.7774452 0.34034900 0.4112744

one_of

one_of 可以将矢量作为 match 参数并返回每个变量。

planes %>% select(one_of(c("tailnum", "model")))
## # A tibble: 3,322 × 2
##    tailnum     model
##      <chr>     <chr>
## 1   N10156 EMB-145XR
## 2   N102UW  A320-214
## 3   N103US  A320-214
## 4   N104UW  A320-214
## 5   N10575 EMB-145LR
## 6   N105UW  A320-214
## 7   N107US  A320-214
## 8   N108UW  A320-214
## 9   N109UW  A320-214
## 10  N110UW  A320-214
## # ... with 3,312 more rows

一切

everything 可用于重新定位数据框中的变量。

manufacturer 设为第一个变量,然后是所有剩余变量。

planes %>% select(manufacturer, everything())
## # A tibble: 3,322 × 9
##        manufacturer tailnum  year                    type     model
##               <chr>   <chr> <int>                   <chr>     <chr>
## 1           EMBRAER  N10156  2004 Fixed wing multi engine EMB-145XR
## 2  AIRBUS INDUSTRIE  N102UW  1998 Fixed wing multi engine  A320-214
## 3  AIRBUS INDUSTRIE  N103US  1999 Fixed wing multi engine  A320-214
## 4  AIRBUS INDUSTRIE  N104UW  1999 Fixed wing multi engine  A320-214
## 5           EMBRAER  N10575  2002 Fixed wing multi engine EMB-145LR
## 6  AIRBUS INDUSTRIE  N105UW  1999 Fixed wing multi engine  A320-214
## 7  AIRBUS INDUSTRIE  N107US  1999 Fixed wing multi engine  A320-214
## 8  AIRBUS INDUSTRIE  N108UW  1999 Fixed wing multi engine  A320-214
## 9  AIRBUS INDUSTRIE  N109UW  1999 Fixed wing multi engine  A320-214
## 10 AIRBUS INDUSTRIE  N110UW  1999 Fixed wing multi engine  A320-214
## # ... with 3,312 more rows, and 4 more variables: engines <int>,
## #   seats <int>, speed <int>, engine <chr>

其他助手

虽然:- 运算符不属于 dplyr 包,但我们仍然可以使用它们来识别要返回的变量。

定义要返回的包含范围的变量。

将每个变量从 year 返回到 manufacturer

planes %>% select(year:manufacturer)
## # A tibble: 3,322 × 3
##     year                    type     manufacturer
##    <int>                   <chr>            <chr>
## 1   2004 Fixed wing multi engine          EMBRAER
## 2   1998 Fixed wing multi engine AIRBUS INDUSTRIE
## 3   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 4   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 5   2002 Fixed wing multi engine          EMBRAER
## 6   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 7   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 8   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 9   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 10  1999 Fixed wing multi engine AIRBUS INDUSTRIE
## # ... with 3,312 more rows

返回多个变量范围:

planes %>% select(c(year:manufacturer, seats:engine))
## # A tibble: 3,322 × 6
##     year                    type     manufacturer seats speed    engine
##    <int>                   <chr>            <chr> <int> <int>     <chr>
## 1   2004 Fixed wing multi engine          EMBRAER    55    NA Turbo-fan
## 2   1998 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 3   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 4   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 5   2002 Fixed wing multi engine          EMBRAER    55    NA Turbo-fan
## 6   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 7   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 8   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 9   1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## 10  1999 Fixed wing multi engine AIRBUS INDUSTRIE   182    NA Turbo-fan
## # ... with 3,312 more rows

-

- 运算符将从结果集中删除变量。

返回除 type 之外的所有变量:

planes %>% select(-type)
## # A tibble: 3,322 × 8
##    tailnum  year     manufacturer     model engines seats speed    engine
##      <chr> <int>            <chr>     <chr>   <int> <int> <int>     <chr>
## 1   N10156  2004          EMBRAER EMB-145XR       2    55    NA Turbo-fan
## 2   N102UW  1998 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 3   N103US  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 4   N104UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 5   N10575  2002          EMBRAER EMB-145LR       2    55    NA Turbo-fan
## 6   N105UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 7   N107US  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 8   N108UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 9   N109UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## 10  N110UW  1999 AIRBUS INDUSTRIE  A320-214       2   182    NA Turbo-fan
## # ... with 3,312 more rows

你还可以传递变量名称向量以从结果集中排除。

planes %>% select(-c(type, engines:engine))
## # A tibble: 3,322 × 4
##    tailnum  year     manufacturer     model
##      <chr> <int>            <chr>     <chr>
## 1   N10156  2004          EMBRAER EMB-145XR
## 2   N102UW  1998 AIRBUS INDUSTRIE  A320-214
## 3   N103US  1999 AIRBUS INDUSTRIE  A320-214
## 4   N104UW  1999 AIRBUS INDUSTRIE  A320-214
## 5   N10575  2002          EMBRAER EMB-145LR
## 6   N105UW  1999 AIRBUS INDUSTRIE  A320-214
## 7   N107US  1999 AIRBUS INDUSTRIE  A320-214
## 8   N108UW  1999 AIRBUS INDUSTRIE  A320-214
## 9   N109UW  1999 AIRBUS INDUSTRIE  A320-214
## 10  N110UW  1999 AIRBUS INDUSTRIE  A320-214
## # ... with 3,312 more rows

辅助函数的任意组合

选择 typespeed(包括)之间的所有变量并排除 manufacturer

planes %>% select(type:speed, -manufacturer)
## # A tibble: 3,322 × 5
##                       type     model engines seats speed
##                      <chr>     <chr>   <int> <int> <int>
## 1  Fixed wing multi engine EMB-145XR       2    55    NA
## 2  Fixed wing multi engine  A320-214       2   182    NA
## 3  Fixed wing multi engine  A320-214       2   182    NA
## 4  Fixed wing multi engine  A320-214       2   182    NA
## 5  Fixed wing multi engine EMB-145LR       2    55    NA
## 6  Fixed wing multi engine  A320-214       2   182    NA
## 7  Fixed wing multi engine  A320-214       2   182    NA
## 8  Fixed wing multi engine  A320-214       2   182    NA
## 9  Fixed wing multi engine  A320-214       2   182    NA
## 10 Fixed wing multi engine  A320-214       2   182    NA
## # ... with 3,312 more rows

修改前一个语句以排除 manufacturermodel

planes %>% select(type:speed, -c(manufacturer, model))
## # A tibble: 3,322 × 4
##                       type engines seats speed
##                      <chr>   <int> <int> <int>
## 1  Fixed wing multi engine       2    55    NA
## 2  Fixed wing multi engine       2   182    NA
## 3  Fixed wing multi engine       2   182    NA
## 4  Fixed wing multi engine       2   182    NA
## 5  Fixed wing multi engine       2    55    NA
## 6  Fixed wing multi engine       2   182    NA
## 7  Fixed wing multi engine       2   182    NA
## 8  Fixed wing multi engine       2   182    NA
## 9  Fixed wing multi engine       2   182    NA
## 10 Fixed wing multi engine       2   182    NA
## # ... with 3,312 more rows

你可以多次使用相同的辅助函数。

planes %>% select(starts_with("m"), starts_with("s"))
## # A tibble: 3,322 × 4
##        manufacturer     model seats speed
##               <chr>     <chr> <int> <int>
## 1           EMBRAER EMB-145XR    55    NA
## 2  AIRBUS INDUSTRIE  A320-214   182    NA
## 3  AIRBUS INDUSTRIE  A320-214   182    NA
## 4  AIRBUS INDUSTRIE  A320-214   182    NA
## 5           EMBRAER EMB-145LR    55    NA
## 6  AIRBUS INDUSTRIE  A320-214   182    NA
## 7  AIRBUS INDUSTRIE  A320-214   182    NA
## 8  AIRBUS INDUSTRIE  A320-214   182    NA
## 9  AIRBUS INDUSTRIE  A320-214   182    NA
## 10 AIRBUS INDUSTRIE  A320-214   182    NA
## # ... with 3,312 more rows

你可以一起使用多个辅助函数:

planes %>% select(starts_with("m"), ends_with("l"))
## # A tibble: 3,322 × 2
##        manufacturer     model
##               <chr>     <chr>
## 1           EMBRAER EMB-145XR
## 2  AIRBUS INDUSTRIE  A320-214
## 3  AIRBUS INDUSTRIE  A320-214
## 4  AIRBUS INDUSTRIE  A320-214
## 5           EMBRAER EMB-145LR
## 6  AIRBUS INDUSTRIE  A320-214
## 7  AIRBUS INDUSTRIE  A320-214
## 8  AIRBUS INDUSTRIE  A320-214
## 9  AIRBUS INDUSTRIE  A320-214
## 10 AIRBUS INDUSTRIE  A320-214
## # ... with 3,312 more rows