斯坦福 CoreNLP
Stanford CoreNLP 是一種流行的自然語言處理工具包,支援許多核心 NLP 任務。
要下載並安裝該程式,請下載發行包並在類路徑中包含必要的*.jar
檔案,或者從 Maven 中心新增依賴項。有關詳細資訊,請參閱下載頁面 。例如:
curl http://nlp.stanford.edu/software/stanford-corenlp-full-2015-12-09.zip -o corenlp.zip
unzip corenlp.zip
cd corenlp
export CLASSPATH="$CLASSPATH:`pwd`/*
執行 CoreNLP 工具有三種支援的方法:(1)使用基本完全可自定義的 API ,(2)使用 Simple CoreNLP API,或(3)使用 CoreNLP 伺服器 。下面給出每個的簡單使用示例。作為一個激勵用例,這些例子將用於預測句子的句法分析。
-
CoreNLP API
public class CoreNLPDemo { public static void main(String[] args) { // 1. Set up a CoreNLP pipeline. This should be done once per type of annotation, // as it's fairly slow to initialize. // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, parse"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); // 2. Run the pipeline on some text. // read some text in the text variable String text = "the quick brown fox jumped over the lazy dog"; // Add your text here! // create an empty Annotation just with the given text Annotation document = new Annotation(text); // run all Annotators on this text pipeline.annotate(document); // 3. Read off the result // Get the list of sentences in the document List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class); for (CoreMap sentence : sentences) { // Get the parse tree for each sentence Tree parseTree = sentence.get(TreeAnnotations.TreeAnnotation.class); // Do something interesting with the parse tree! System.out.println(parseTree); } } }
-
簡單的 CoreNLP
public class CoreNLPDemo { public static void main(String[] args) { String text = "The quick brown fox jumped over the lazy dog"); // your text here! Document document = new Document(text); // implicitly runs tokenizer for (Sentence sentence : document.sentences()) { Tree parseTree = sentence.parse(); // implicitly runs parser // Do something with your parse tree! System.out.println(parseTree); } } }
-
CoreNLP 伺服器
使用以下命令啟動伺服器(適當地設定類路徑):
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer [port] [timeout]
獲取給定註釋器集的 JSON 格式輸出,並將其列印到標準輸出:
wget --post-data 'The quick brown fox jumped over the lazy dog.' 'localhost:9000/?properties={"annotators":"tokenize,ssplit,parse","outputFormat":"json"}' -O -
要從 JSON 獲取我們的解析樹,我們可以將 JSON 導航到
sentences[i].parse
。