使用 NLTK 庫
你可以在他們的 wiki 上找到有關 Python Natural Language Toolkit (NLTK)句子級別標記器的更多資訊。
從你的命令列:
$ python
>>> import nltk
>>> sent_tokenizer = nltk.tokenize.PunktSentenceTokenizer()
>>> text = "This is a sentence. This is another sentence. More sentences are better!"
>>> sent_tokenizer.tokenize(text)
Out[4]:
['This is a sentence.',
'This is another sentence.',
'More sentences are better!']