创建自定义分析器
大多数分析定制都在 createComponents
类中,其中定义了 Tokenizer 和 TokenFilters。
可以在 initReader
方法中添加 CharFilters。
Analyzer analyzer = new Analyzer() {
@Override
protected Reader initReader(String fieldName, Reader reader) {
return new HTMLStripCharFilter(reader);
}
@Override
protected TokenStreamComponents createComponents(String fieldName) {
Tokenizer tokenizer = new StandardTokenizer();
TokenStream stream = new StandardFilter(tokenizer);
//Order matters! If LowerCaseFilter and StopFilter were swapped here, StopFilter's
//matching would be case sensitive, so "the" would be eliminated, but not "The"
stream = new LowerCaseFilter(stream);
stream = new StopFilter(stream, StopAnalyzer.ENGLISH_STOP_WORDS_SET);
return new TokenStreamComponents(tokenizer, stream);
}
};