Class ShingleAnalyzerWrapper
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.AnalyzerWrapper
org.apache.lucene.analysis.shingle.ShingleAnalyzerWrapper
- All Implemented Interfaces:
Closeable
,AutoCloseable
A ShingleAnalyzerWrapper wraps a
ShingleFilter
around another Analyzer
.
A shingle is another name for a token based n-gram.
- Since:
- 3.1
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final Analyzer
private final String
private final int
private final int
private final boolean
private final boolean
private final String
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
Constructor Summary
ConstructorsConstructorDescriptionWrapsStandardAnalyzer
.ShingleAnalyzerWrapper
(int minShingleSize, int maxShingleSize) WrapsStandardAnalyzer
.ShingleAnalyzerWrapper
(Analyzer defaultAnalyzer) ShingleAnalyzerWrapper
(Analyzer defaultAnalyzer, int maxShingleSize) ShingleAnalyzerWrapper
(Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize) ShingleAnalyzerWrapper
(Analyzer delegate, int minShingleSize, int maxShingleSize, String tokenSeparator, boolean outputUnigrams, boolean outputUnigramsIfNoShingles, String fillerToken) Creates a new ShingleAnalyzerWrapper -
Method Summary
Modifier and TypeMethodDescriptionint
The max shingle (token ngram) sizeint
The min shingle (token ngram) sizefinal Analyzer
getWrappedAnalyzer
(String fieldName) Retrieves the wrapped Analyzer appropriate for analyzing the field with the given nameboolean
boolean
protected Analyzer.TokenStreamComponents
wrapComponents
(String fieldName, Analyzer.TokenStreamComponents components) Wraps / alters the given TokenStreamComponents, taken from the wrapped Analyzer, to form new components.Methods inherited from class org.apache.lucene.analysis.AnalyzerWrapper
attributeFactory, createComponents, getOffsetGap, getPositionIncrementGap, initReader, initReaderForNormalization, normalize, wrapReader, wrapReaderForNormalization, wrapTokenStreamForNormalization
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getReuseStrategy, normalize, tokenStream, tokenStream
-
Field Details
-
delegate
-
maxShingleSize
private final int maxShingleSize -
minShingleSize
private final int minShingleSize -
tokenSeparator
-
outputUnigrams
private final boolean outputUnigrams -
outputUnigramsIfNoShingles
private final boolean outputUnigramsIfNoShingles -
fillerToken
-
-
Constructor Details
-
ShingleAnalyzerWrapper
-
ShingleAnalyzerWrapper
-
ShingleAnalyzerWrapper
-
ShingleAnalyzerWrapper
public ShingleAnalyzerWrapper(Analyzer delegate, int minShingleSize, int maxShingleSize, String tokenSeparator, boolean outputUnigrams, boolean outputUnigramsIfNoShingles, String fillerToken) Creates a new ShingleAnalyzerWrapper- Parameters:
delegate
- Analyzer whose TokenStream is to be filteredminShingleSize
- Min shingle (token ngram) sizemaxShingleSize
- Max shingle sizetokenSeparator
- Used to separate input stream tokens in output shinglesoutputUnigrams
- Whether or not the filter shall pass the original tokens to the output streamoutputUnigramsIfNoShingles
- Overrides the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? Note that if outputUnigrams==true, then unigrams are always output, regardless of whether any shingles are available.fillerToken
- filler token to use when positionIncrement is more than 1
-
ShingleAnalyzerWrapper
public ShingleAnalyzerWrapper()WrapsStandardAnalyzer
. -
ShingleAnalyzerWrapper
public ShingleAnalyzerWrapper(int minShingleSize, int maxShingleSize) WrapsStandardAnalyzer
.
-
-
Method Details
-
getMaxShingleSize
public int getMaxShingleSize()The max shingle (token ngram) size- Returns:
- The max shingle (token ngram) size
-
getMinShingleSize
public int getMinShingleSize()The min shingle (token ngram) size- Returns:
- The min shingle (token ngram) size
-
getTokenSeparator
-
isOutputUnigrams
public boolean isOutputUnigrams() -
isOutputUnigramsIfNoShingles
public boolean isOutputUnigramsIfNoShingles() -
getFillerToken
-
getWrappedAnalyzer
Description copied from class:AnalyzerWrapper
Retrieves the wrapped Analyzer appropriate for analyzing the field with the given name- Specified by:
getWrappedAnalyzer
in classAnalyzerWrapper
- Parameters:
fieldName
- Name of the field which is to be analyzed- Returns:
- Analyzer for the field with the given name. Assumed to be non-null
-
wrapComponents
protected Analyzer.TokenStreamComponents wrapComponents(String fieldName, Analyzer.TokenStreamComponents components) Description copied from class:AnalyzerWrapper
Wraps / alters the given TokenStreamComponents, taken from the wrapped Analyzer, to form new components. It is through this method that new TokenFilters can be added by AnalyzerWrappers. By default, the given components are returned.- Overrides:
wrapComponents
in classAnalyzerWrapper
- Parameters:
fieldName
- Name of the field which is to be analyzedcomponents
- TokenStreamComponents taken from the wrapped Analyzer- Returns:
- Wrapped / altered TokenStreamComponents.
-