尋找匹配

Created: November-22, 2018

# example data
test_sentences <- c("The quick brown fox", "jumps over the lazy dog")

有匹配嗎？

grepl() 用於檢查字串或字元向量中是否存在單詞或正規表示式。該函式返回 TRUE / FALSE（或布林）向量。

請注意，我們可以檢查每個字串中的單詞 fox 並接收一個布林向量作為返回。

grepl("fox", test_sentences)
#[1]  TRUE FALSE

匹配位置

grep 接受一個字串和一個正規表示式。它返回索引的數字向量。這將返回哪個句子中包含單詞 fox。

grep("fox", test_sentences)
#[1] 1

匹配值

選擇與模式匹配的句子：

# each of the following lines does the job:
test_sentences[grep("fox", test_sentences)]
test_sentences[grepl("fox", test_sentences)]
grep("fox", test_sentences, value = TRUE)
# [1] "The quick brown fox"

細節

由於 fox 模式只是一個單詞，而不是正規表示式，我們可以通過指定 fixed = TRUE 來提高效能（使用 grep 或 grepl）。

grep("fox", test_sentences, fixed = TRUE)
#[1] 1

要選擇與模式不匹配的句子，可以使用 grep 和 invert = TRUE; 或者使用 -grep(...) 或 !grepl(...) 遵循子集規則。

在 grepl(pattern, x) 和 grep(pattern, x) 中，x 引數是向量化的，pattern 引數不是。因此，你不能直接使用這些來匹配 pattern[1] 與 x[1]，pattern[2] 對抗 x[2]，依此類推。

匹配總結

執行例如 grepl 命令後，你可能想要了解 TRUE 或 FALSE 的匹配次數。這在例如大資料集的情況下是有用的。為此，請執行 summary 命令：

# example data
test_sentences <- c("The quick brown fox", "jumps over the lazy dog") 

# find matches
matches <- grepl("fox", test_sentences)

# overview
summary(matches)