java.lang.Object
org.apache.lucene.util.automaton.StringsToAutomaton
Builds a minimal, deterministic
Automaton
that accepts a set of strings using the
algorithm described in Incremental Construction
of Minimal Acyclic Finite-State Automata by Daciuk, Mihov, Watson and Watson. This requires
sorted input data, but is very fast (nearly linear with the input size). Also offers the ability
to directly build a binary Automaton
representation. Users should access this
functionality through Automata
static methods.- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static final class
DFSA state withchar
labels on transitions. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate BytesRefBuilder
Used for input order checking (only through assertions right now)private final StringsToAutomaton.State
Root automaton state.A "registry" for state interning. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate void
(package private) static Automaton
Build a minimal, deterministic automaton from a sorted list ofBytesRef
representing strings in UTF-8.(package private) static Automaton
build
(BytesRefIterator input, boolean asBinary) Build a minimal, deterministic automaton from a sorted list ofBytesRef
representing strings in UTF-8.private Automaton
Called after adding all terms.private static int
convert
(Automaton.Builder a, StringsToAutomaton.State s, IdentityHashMap<StringsToAutomaton.State, Integer> visited) Internal recursive traversal for conversion.private void
Replace last child ofstate
with an already registered state or stateRegistry the last child state.private boolean
setPrevious
(BytesRef current) Copycurrent
into an internal buffer.
-
Field Details
-
stateRegistry
A "registry" for state interning. -
root
Root automaton state. -
previous
Used for input order checking (only through assertions right now)
-
-
Constructor Details
-
StringsToAutomaton
private StringsToAutomaton()The default constructor is private. Use static methods directly.
-
-
Method Details
-
setPrevious
Copycurrent
into an internal buffer. -
convert
private static int convert(Automaton.Builder a, StringsToAutomaton.State s, IdentityHashMap<StringsToAutomaton.State, Integer> visited) Internal recursive traversal for conversion. -
completeAndConvert
Called after adding all terms. Performs final minimization and converts to a standardAutomaton
instance. -
build
-
build
Build a minimal, deterministic automaton from a sorted list ofBytesRef
representing strings in UTF-8. These strings must be binary-sorted. Creates anAutomaton
with either UTF-8 codepoints as transition labels or binary (compiled) transition labels based onasBinary
.- Throws:
IOException
-
add
-
replaceOrRegister
Replace last child ofstate
with an already registered state or stateRegistry the last child state.
-