public class HyphenationCompoundWordTokenFilter extends CompoundWordTokenFilterBase
DEFAULT_MAX_SUBWORD_SIZE, DEFAULT_MIN_SUBWORD_SIZE, DEFAULT_MIN_WORD_SIZE, dictionary, maxSubwordSize, minSubwordSize, minWordSize, onlyLongestMatch, tokens
input
Constructor and Description |
---|
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
java.util.Set dictionary) |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
java.util.Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
java.lang.String[] dictionary) |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
java.lang.String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
Modifier and Type | Method and Description |
---|---|
protected void |
decomposeInternal(Token token) |
static HyphenationTree |
getHyphenationTree(java.io.File hyphenationFile)
Create a hyphenator tree
|
static HyphenationTree |
getHyphenationTree(java.io.Reader hyphenationReader)
Create a hyphenator tree
|
static HyphenationTree |
getHyphenationTree(java.lang.String hyphenationFilename)
Create a hyphenator tree
|
addAllLowerCase, createToken, decompose, makeDictionary, makeLowerCaseCopy, next
close, reset
next
public HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, java.lang.String[] dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
input
- the token stream to processhyphenator
- the hyphenation pattern tree to use for hyphenationdictionary
- the word dictionary to match againstminWordSize
- only words longer than this get processedminSubwordSize
- only subwords longer than this get to the output
streammaxSubwordSize
- only subwords shorter than this get to the output
streamonlyLongestMatch
- Add only the longest matching subword to the streampublic HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, java.lang.String[] dictionary)
input
- the token stream to processhyphenator
- the hyphenation pattern tree to use for hyphenationdictionary
- the word dictionary to match againstpublic HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, java.util.Set dictionary)
input
- the token stream to processhyphenator
- the hyphenation pattern tree to use for hyphenationdictionary
- the word dictionary to match against. If this is a CharArraySet
it must have set ignoreCase=false and only contain
lower case strings.public HyphenationCompoundWordTokenFilter(TokenStream input, HyphenationTree hyphenator, java.util.Set dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
input
- the token stream to processhyphenator
- the hyphenation pattern tree to use for hyphenationdictionary
- the word dictionary to match against. If this is a CharArraySet
it must have set ignoreCase=false and only contain
lower case strings.minWordSize
- only words longer than this get processedminSubwordSize
- only subwords longer than this get to the output
streammaxSubwordSize
- only subwords shorter than this get to the output
streamonlyLongestMatch
- Add only the longest matching subword to the streampublic static HyphenationTree getHyphenationTree(java.lang.String hyphenationFilename) throws java.lang.Exception
hyphenationFilename
- the filename of the XML grammar to loadjava.lang.Exception
public static HyphenationTree getHyphenationTree(java.io.File hyphenationFile) throws java.lang.Exception
hyphenationFile
- the file of the XML grammar to loadjava.lang.Exception
public static HyphenationTree getHyphenationTree(java.io.Reader hyphenationReader) throws java.lang.Exception
hyphenationReader
- the reader of the XML grammar to load fromjava.lang.Exception
protected void decomposeInternal(Token token)
decomposeInternal
in class CompoundWordTokenFilterBase
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.