Class STUniformSplitTermsWriter
- All Implemented Interfaces:
Closeable
,AutoCloseable
UniformSplitTermsWriter
by sharing all the fields terms in the same dictionary
and by writing all the fields of a term in the same block line.
The block file
contains all the
term blocks for all fields. Each block line, for a single term, may have multiple fields TermState
. The block file also contains the fields metadata at the end
of the file.
The dictionary file
contains a
single trie (FST
bytes) for all fields.
This structure is adapted when there are lots of fields. In this case the shared-terms dictionary trie is much smaller.
This FieldsConsumer
requires a custom merge(MergeState, NormsProducer)
method for efficiency. The regular merge would scan all the
fields sequentially, which internally would scan the whole shared-terms dictionary as many times
as there are fields. Whereas the custom merge directly scans the internal shared-terms dictionary
of all segments to merge, thus scanning once whatever the number of fields is.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static class
private class
private class
(package private) final class
private class
private static interface
private class
private class
-
Field Summary
Fields inherited from class org.apache.lucene.codecs.uniformsplit.UniformSplitTermsWriter
blockEncoder, blockOutput, DEFAULT_DELTA_NUM_LINES, DEFAULT_TARGET_NUM_BLOCK_LINES, deltaNumLines, dictionaryOutput, fieldInfos, fieldMetadataWriter, MAX_NUM_BLOCK_LINES, maxDoc, postingsWriter, targetNumBlockLines
-
Constructor Summary
ConstructorsModifierConstructorDescriptionSTUniformSplitTermsWriter
(PostingsWriterBase postingsWriter, SegmentWriteState state, int targetNumBlockLines, int deltaNumLines, BlockEncoder blockEncoder) protected
STUniformSplitTermsWriter
(PostingsWriterBase postingsWriter, SegmentWriteState state, int targetNumBlockLines, int deltaNumLines, BlockEncoder blockEncoder, FieldMetadata.Serializer fieldMetadataWriter, String codecName, int versionCurrent, String termsBlocksExtension, String dictionaryExtension) STUniformSplitTermsWriter
(PostingsWriterBase postingsWriter, SegmentWriteState state, BlockEncoder blockEncoder) -
Method Summary
Modifier and TypeMethodDescriptionprivate void
combinePostingsPerField
(BytesRef term, Map<String, STUniformSplitTermsWriter.MergingFieldTerms> fieldTermsMap, Map<String, List<STUniformSplitTermsWriter.SegmentPostings>> fieldPostingsMap, List<STUniformSplitTermsWriter.MergingFieldTerms> groupedFieldTerms) private void
combineSegmentsFields
(List<STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.SegmentTerms>> groupedSegmentTerms, Map<String, List<STUniformSplitTermsWriter.SegmentPostings>> fieldPostingsMap) private List
<FieldMetadata> createFieldMetadataList
(Iterator<FieldInfo> fieldInfos, int maxDoc) createFieldTermsQueue
(Fields fields, List<FieldMetadata> fieldMetadataList) createMergingFieldTermsMap
(List<FieldMetadata> fieldMetadataList, int numSegments) createSegmentTermsQueue
(List<STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.SegmentTerms>> segmentTermsList) private <T> void
groupByTerm
(STUniformSplitTermsWriter.TermIteratorQueue<T> termIteratorQueue, STUniformSplitTermsWriter.TermIterator<T> topTermIterator, List<STUniformSplitTermsWriter.TermIterator<T>> groupedTermIterators) void
merge
(MergeState mergeState, NormsProducer normsProducer) Merges in the fields from the readers inmergeState
.private Collection
<FieldMetadata> mergeSegments
(MergeState mergeState, NormsProducer normsProducer, List<STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.SegmentTerms>> segmentTermsList, STBlockWriter blockWriter, IndexDictionary.Builder dictionaryBuilder) private <T> void
nextTermForIterators
(List<? extends STUniformSplitTermsWriter.TermIterator<T>> termIterators, STUniformSplitTermsWriter.TermIteratorQueue<T> termIteratorQueue) void
write
(Fields fields, NormsProducer normsProducer) Write all fields, terms and postings.protected void
writeDictionary
(int fieldsNumber, IndexDictionary.Builder dictionaryBuilder) private int
writeFieldMetadataList
(Collection<FieldMetadata> fieldMetadataList) private void
writePostingLines
(BytesRef term, List<? extends STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.FieldTerms>> groupedFieldTerms, NormsProducer normsProducer, List<FieldMetadataTermState> termStates) private void
writeSegment
(STUniformSplitTermsWriter.SharedTermsWriter termsWriter) Writes the new segment with the providedSTUniformSplitTermsWriter.SharedTermsWriter
, which can be either a single segment writer, or a multiple segment merging writer.private Collection
<FieldMetadata> writeSingleSegment
(Fields fields, NormsProducer normsProducer, STBlockWriter blockWriter, IndexDictionary.Builder dictionaryBuilder) Methods inherited from class org.apache.lucene.codecs.uniformsplit.UniformSplitTermsWriter
close, validateSettings, writeDictionary, writeEncodedFieldsMetadata, writeFieldsMetadata, writeFieldTerms, writePostingLine, writeUnencodedFieldsMetadata
-
Constructor Details
-
STUniformSplitTermsWriter
public STUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, BlockEncoder blockEncoder) throws IOException - Throws:
IOException
-
STUniformSplitTermsWriter
public STUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, int targetNumBlockLines, int deltaNumLines, BlockEncoder blockEncoder) throws IOException - Throws:
IOException
-
STUniformSplitTermsWriter
protected STUniformSplitTermsWriter(PostingsWriterBase postingsWriter, SegmentWriteState state, int targetNumBlockLines, int deltaNumLines, BlockEncoder blockEncoder, FieldMetadata.Serializer fieldMetadataWriter, String codecName, int versionCurrent, String termsBlocksExtension, String dictionaryExtension) throws IOException - Throws:
IOException
-
-
Method Details
-
write
Description copied from class:FieldsConsumer
Write all fields, terms and postings. This the "pull" API, allowing you to iterate more than once over the postings, somewhat analogous to using a DOM API to traverse an XML tree.Notes:
- You must compute index statistics, including each Term's docFreq and totalTermFreq, as well as the summary sumTotalTermFreq, sumTotalDocFreq and docCount.
- You must skip terms that have no docs and fields that have no terms, even though the provided Fields API will expose them; this typically requires lazily writing the field or term until you've actually seen the first term or document.
- The provided Fields instance is limited: you cannot call any methods that return statistics/counts; you cannot pass a non-null live docs when pulling docs/positions enums.
- Overrides:
write
in classUniformSplitTermsWriter
- Throws:
IOException
-
createFieldMetadataList
-
createFieldTermsQueue
private STUniformSplitTermsWriter.TermIteratorQueue<STUniformSplitTermsWriter.FieldTerms> createFieldTermsQueue(Fields fields, List<FieldMetadata> fieldMetadataList) throws IOException - Throws:
IOException
-
writePostingLines
private void writePostingLines(BytesRef term, List<? extends STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.FieldTerms>> groupedFieldTerms, NormsProducer normsProducer, List<FieldMetadataTermState> termStates) throws IOException - Throws:
IOException
-
writeFieldMetadataList
- Throws:
IOException
-
writeDictionary
protected void writeDictionary(int fieldsNumber, IndexDictionary.Builder dictionaryBuilder) throws IOException - Throws:
IOException
-
merge
Description copied from class:FieldsConsumer
Merges in the fields from the readers inmergeState
. The default implementation skips and maps around deleted documents, and callsFieldsConsumer.write(Fields,NormsProducer)
. Implementations can override this method for more sophisticated merging (bulk-byte copying, etc).- Overrides:
merge
in classFieldsConsumer
- Throws:
IOException
-
createMergingFieldTermsMap
private Map<String,STUniformSplitTermsWriter.MergingFieldTerms> createMergingFieldTermsMap(List<FieldMetadata> fieldMetadataList, int numSegments) -
createSegmentTermsQueue
private STUniformSplitTermsWriter.TermIteratorQueue<STUniformSplitTermsWriter.SegmentTerms> createSegmentTermsQueue(List<STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.SegmentTerms>> segmentTermsList) throws IOException - Throws:
IOException
-
combineSegmentsFields
private void combineSegmentsFields(List<STUniformSplitTermsWriter.TermIterator<STUniformSplitTermsWriter.SegmentTerms>> groupedSegmentTerms, Map<String, List<STUniformSplitTermsWriter.SegmentPostings>> fieldPostingsMap) -
combinePostingsPerField
private void combinePostingsPerField(BytesRef term, Map<String, STUniformSplitTermsWriter.MergingFieldTerms> fieldTermsMap, Map<String, List<STUniformSplitTermsWriter.SegmentPostings>> fieldPostingsMap, List<STUniformSplitTermsWriter.MergingFieldTerms> groupedFieldTerms)
-