寻找匹配

Created: November-22, 2018

# example data
test_sentences <- c("The quick brown fox", "jumps over the lazy dog")

有匹配吗？

grepl() 用于检查字符串或字符向量中是否存在单词或正则表达式。该函数返回 TRUE / FALSE（或布尔）向量。

请注意，我们可以检查每个字符串中的单词 fox 并接收一个布尔向量作为返回。

grepl("fox", test_sentences)
#[1]  TRUE FALSE

匹配位置

grep 接受一个字符串和一个正则表达式。它返回索引的数字向量。这将返回哪个句子中包含单词 fox。

grep("fox", test_sentences)
#[1] 1

匹配值

选择与模式匹配的句子：

# each of the following lines does the job:
test_sentences[grep("fox", test_sentences)]
test_sentences[grepl("fox", test_sentences)]
grep("fox", test_sentences, value = TRUE)
# [1] "The quick brown fox"

细节

由于 fox 模式只是一个单词，而不是正则表达式，我们可以通过指定 fixed = TRUE 来提高性能（使用 grep 或 grepl）。

grep("fox", test_sentences, fixed = TRUE)
#[1] 1

要选择与模式不匹配的句子，可以使用 grep 和 invert = TRUE; 或者使用 -grep(...) 或 !grepl(...) 遵循子集规则。

在 grepl(pattern, x) 和 grep(pattern, x) 中，x 参数是矢量化的，pattern 参数不是。因此，你不能直接使用这些来匹配 pattern[1] 与 x[1]，pattern[2] 对抗 x[2]，依此类推。

匹配总结

执行例如 grepl 命令后，你可能想要了解 TRUE 或 FALSE 的匹配次数。这在例如大数据集的情况下是有用的。为此，请运行 summary 命令：

# example data
test_sentences <- c("The quick brown fox", "jumps over the lazy dog") 

# find matches
matches <- grepl("fox", test_sentences)

# overview
summary(matches)