java.lang.Object
org.apache.lucene.analysis.ar.ArabicStemmer
Stemmer for Arabic.
Stemming is done in-place for efficiency, operating on a termbuffer.
Stemming is defined as:
- Removal of attached definite article, conjunction, and prepositions.
- Stemming of common suffixes.
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char[][]
(package private) static final char[][]
(package private) static final char
(package private) static final char
(package private) static final char
(package private) static final char
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) boolean
endsWithCheckLength
(char[] s, int len, char[] suffix) Returns true if the suffix matches and can be stemmed(package private) boolean
startsWithCheckLength
(char[] s, int len, char[] prefix) Returns true if the prefix matches and can be stemmed(package private) int
stem
(char[] s, int len) Stem an input buffer of Arabic text.(package private) int
stemPrefix
(char[] s, int len) Stem a prefix off an Arabic word.(package private) int
stemSuffix
(char[] s, int len) Stem suffix(es) off an Arabic word.
-
Field Details
-
ALEF
static final char ALEF- See Also:
-
BEH
static final char BEH- See Also:
-
TEH_MARBUTA
static final char TEH_MARBUTA- See Also:
-
TEH
static final char TEH- See Also:
-
FEH
static final char FEH- See Also:
-
KAF
static final char KAF- See Also:
-
LAM
static final char LAM- See Also:
-
NOON
static final char NOON- See Also:
-
HEH
static final char HEH- See Also:
-
WAW
static final char WAW- See Also:
-
YEH
static final char YEH- See Also:
-
prefixes
static final char[][] prefixes -
suffixes
static final char[][] suffixes
-
-
Constructor Details
-
ArabicStemmer
ArabicStemmer()
-
-
Method Details
-
stem
int stem(char[] s, int len) Stem an input buffer of Arabic text.- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- length of input buffer after normalization
-
stemPrefix
int stemPrefix(char[] s, int len) Stem a prefix off an Arabic word.- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- new length of input buffer after stemming.
-
stemSuffix
int stemSuffix(char[] s, int len) Stem suffix(es) off an Arabic word.- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- new length of input buffer after stemming
-
startsWithCheckLength
boolean startsWithCheckLength(char[] s, int len, char[] prefix) Returns true if the prefix matches and can be stemmed- Parameters:
s
- input bufferlen
- length of input bufferprefix
- prefix to check- Returns:
- true if the prefix matches and can be stemmed
-
endsWithCheckLength
boolean endsWithCheckLength(char[] s, int len, char[] suffix) Returns true if the suffix matches and can be stemmed- Parameters:
s
- input bufferlen
- length of input buffersuffix
- suffix to check- Returns:
- true if the suffix matches and can be stemmed
-