匹配字串的開頭

Created: November-22, 2018

re.match() 的第一個引數是正規表示式，第二個是要匹配的字串：

import re

pattern = r"123"
string = "123zzb"

re.match(pattern, string)
# Out: <_sre.SRE_Match object; span=(0, 3), match='123'>

match = re.match(pattern, string)

match.group()
# Out: '123'

你可能會注意到模式變數是一個字首為 r 的字串，表示該字串是原始字串文字。

原始字串文字與字串文字的語法略有不同，即原始字串文字中的反斜槓\表示只是反斜槓，並且不需要加倍反斜以轉義轉義序列，例如換行符（\n），tabs（\t），backspaces（\），form-feeds（\r）等。在普通的字串文字中，每個反斜槓必須加倍，以避免被視為轉義序列的開始。

因此，r"\n" 是一個由 2 個字元組成的字串：\和 n。正規表示式模式也使用反斜槓，例如\d 指的是任何數字字元。我們可以通過使用原始字串（r"\d"）避免雙重轉義字串（\\d）。

例如：

string = "\\t123zzb" # here the backslash is escaped, so there's no tab, just '\' and 't'
pattern = "\\t123"   # this will match \t (escaping the backslash) followed by 123
re.match(pattern, string).group()   # no match
re.match(pattern, "\t123zzb").group()  # matches '\t123'

pattern = r"\\t123"  
re.match(pattern, string).group()   # matches '\\t123'

匹配僅從字串的開頭完成。如果你想在任何地方匹配使用 re.search 代替：

match = re.match(r"(123)", "a123zzb")

match is None
# Out: True

match = re.search(r"(123)", "a123zzb")

match.group()
# Out: '123'