org.apache.lucene.analysis
Class LetterTokenizer
public
class
LetterTokenizer
extends CharTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters. That's
to say, it defines tokens as maximal strings of adjacent letters, as defined
by java.lang.Character.isLetter() predicate.
Note: this does a decent job for most European languages, but does a terrible
job for some Asian languages, where words are not separated by spaces.
public LetterTokenizer(Reader in)
Construct a new LetterTokenizer.
protected boolean isTokenChar(char c)
Collects only characters which satisfy
{@link Character#isLetter(char)}.
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.