Class ShingleAnalyzerWrapper

All Implemented Interfaces:
Closeable, AutoCloseable

public final class ShingleAnalyzerWrapper extends AnalyzerWrapper
A ShingleAnalyzerWrapper wraps a ShingleFilter around another Analyzer.

A shingle is another name for a token based n-gram.

Since:
3.1
  • Field Details

    • delegate

      private final Analyzer delegate
    • maxShingleSize

      private final int maxShingleSize
    • minShingleSize

      private final int minShingleSize
    • tokenSeparator

      private final String tokenSeparator
    • outputUnigrams

      private final boolean outputUnigrams
    • outputUnigramsIfNoShingles

      private final boolean outputUnigramsIfNoShingles
    • fillerToken

      private final String fillerToken
  • Constructor Details

    • ShingleAnalyzerWrapper

      public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer)
    • ShingleAnalyzerWrapper

      public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int maxShingleSize)
    • ShingleAnalyzerWrapper

      public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int minShingleSize, int maxShingleSize)
    • ShingleAnalyzerWrapper

      public ShingleAnalyzerWrapper(Analyzer delegate, int minShingleSize, int maxShingleSize, String tokenSeparator, boolean outputUnigrams, boolean outputUnigramsIfNoShingles, String fillerToken)
      Creates a new ShingleAnalyzerWrapper
      Parameters:
      delegate - Analyzer whose TokenStream is to be filtered
      minShingleSize - Min shingle (token ngram) size
      maxShingleSize - Max shingle size
      tokenSeparator - Used to separate input stream tokens in output shingles
      outputUnigrams - Whether or not the filter shall pass the original tokens to the output stream
      outputUnigramsIfNoShingles - Overrides the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? Note that if outputUnigrams==true, then unigrams are always output, regardless of whether any shingles are available.
      fillerToken - filler token to use when positionIncrement is more than 1
    • ShingleAnalyzerWrapper

      public ShingleAnalyzerWrapper()
    • ShingleAnalyzerWrapper

      public ShingleAnalyzerWrapper(int minShingleSize, int maxShingleSize)
  • Method Details

    • getMaxShingleSize

      public int getMaxShingleSize()
      The max shingle (token ngram) size
      Returns:
      The max shingle (token ngram) size
    • getMinShingleSize

      public int getMinShingleSize()
      The min shingle (token ngram) size
      Returns:
      The min shingle (token ngram) size
    • getTokenSeparator

      public String getTokenSeparator()
    • isOutputUnigrams

      public boolean isOutputUnigrams()
    • isOutputUnigramsIfNoShingles

      public boolean isOutputUnigramsIfNoShingles()
    • getFillerToken

      public String getFillerToken()
    • getWrappedAnalyzer

      public final Analyzer getWrappedAnalyzer(String fieldName)
      Description copied from class: AnalyzerWrapper
      Retrieves the wrapped Analyzer appropriate for analyzing the field with the given name
      Specified by:
      getWrappedAnalyzer in class AnalyzerWrapper
      Parameters:
      fieldName - Name of the field which is to be analyzed
      Returns:
      Analyzer for the field with the given name. Assumed to be non-null
    • wrapComponents

      protected Analyzer.TokenStreamComponents wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)
      Description copied from class: AnalyzerWrapper
      Wraps / alters the given TokenStreamComponents, taken from the wrapped Analyzer, to form new components. It is through this method that new TokenFilters can be added by AnalyzerWrappers. By default, the given components are returned.
      Overrides:
      wrapComponents in class AnalyzerWrapper
      Parameters:
      fieldName - Name of the field which is to be analyzed
      components - TokenStreamComponents taken from the wrapped Analyzer
      Returns:
      Wrapped / altered TokenStreamComponents.