Class BrazilianStemmer

java.lang.Object
org.apache.lucene.analysis.br.BrazilianStemmer

class BrazilianStemmer extends Object
A stemmer for Brazilian Portuguese words.
  • Field Details

  • Constructor Details

    • BrazilianStemmer

      public BrazilianStemmer()
  • Method Details

    • stem

      protected String stem(String term)
      Stems the given term to an unique discriminator.
      Parameters:
      term - The term that should be stemmed.
      Returns:
      Discriminator for term
    • isStemmable

      private boolean isStemmable(String term)
      Checks a term if it can be processed correctly.
      Returns:
      true if, and only if, the given term consists in letters.
    • isIndexable

      private boolean isIndexable(String term)
      Checks a term if it can be processed indexed.
      Returns:
      true if it can be indexed
    • isVowel

      private boolean isVowel(char value)
      See if string is 'a','e','i','o','u'
      Returns:
      true if is vowel
    • getR1

      private String getR1(String value)
      Gets R1

      R1 - is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.

      Returns:
      null or a string representing R1
    • getRV

      private String getRV(String value)
      Gets RV

      RV - IF the second letter is a consonant, RV is the region after the next following vowel,

      OR if the first two letters are vowels, RV is the region after the next consonant,

      AND otherwise (consonant-vowel case) RV is the region after the third letter.

      BUT RV is the end of the word if this positions cannot be found.

      Returns:
      null or a string representing RV
    • changeTerm

      private String changeTerm(String value)
      1) Turn to lowercase 2) Remove accents 3) ã -> a ; õ -> o 4) ç -> c
      Returns:
      null or a string transformed
    • suffix

      private boolean suffix(String value, String suffix)
      Check if a string ends with a suffix
      Returns:
      true if the string ends with the specified suffix
    • replaceSuffix

      private String replaceSuffix(String value, String toReplace, String changeTo)
      Replace a string suffix by another
      Returns:
      the replaced String
    • removeSuffix

      private String removeSuffix(String value, String toRemove)
      Remove a string suffix
      Returns:
      the String without the suffix
    • suffixPreceded

      private boolean suffixPreceded(String value, String suffix, String preceded)
      See if a suffix is preceded by a String
      Returns:
      true if the suffix is preceded
    • createCT

      private void createCT(String term)
      Creates CT (changed term) , substituting * 'ã' and 'õ' for 'a~' and 'o~'.
    • step1

      private boolean step1()
      Standard suffix removal. Search for the longest among the following suffixes, and perform the following actions:
      Returns:
      false if no ending was removed
    • step2

      private boolean step2()
      Verb suffixes.

      Search for the longest among the following suffixes in RV, and if found, delete.

      Returns:
      false if no ending was removed
    • step3

      private void step3()
      Delete suffix 'i' if in RV and preceded by 'c'
    • step4

      private void step4()
      Residual suffix

      If the word ends with one of the suffixes (os a i o á í ó) in RV, delete it

    • step5

      private void step5()
      If the word ends with one of ( e é ê) in RV,delete it, and if preceded by 'gu' (or 'ci') with the 'u' (or 'i') in RV, delete the 'u' (or 'i')

      Or if the word ends ç remove the cedilha

    • log

      String log()
      For log and debug purpose
      Returns:
      TERM, CT, RV, R1 and R2