Class MatchRegionRetriever
java.lang.Object
org.apache.lucene.search.matchhighlight.MatchRegionRetriever
Utility class to compute a list of "match regions" for a given query, searcher and document(s)
using
Matches
API.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static final record
static interface
Access to field values of the highlighted document.private static class
static interface
A callback invoked for each document selected by the query.private static class
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final List
<LeafReaderContext> private final Map
<String, OffsetsRetrievalStrategy> private final IndexSearcher
private final Weight
-
Constructor Summary
ConstructorsConstructorDescriptionMatchRegionRetriever
(IndexSearcher searcher, Query query, Analyzer analyzer, Predicate<String> fieldsToLoadUnconditionally, Predicate<String> fieldsToLoadIfWithHits) This constructor uses the default offset strategy supplier fromcomputeOffsetRetrievalStrategies(IndexReader, Analyzer)
.MatchRegionRetriever
(IndexSearcher searcher, Query query, OffsetsRetrievalStrategySupplier fieldOffsetStrategySupplier, Predicate<String> fieldsToLoadUnconditionally, Predicate<String> fieldsToLoadIfWithHits) -
Method Summary
Modifier and TypeMethodDescriptionprivate boolean
checkOrderConsistency
(List<LeafReaderContext> leaves) computeOffsetRetrievalStrategies
(IndexReader reader, Analyzer analyzer) Compute default strategies for retrieving offsets fromMatchesIterator
instances for a set of given fields.void
highlightDocument
(LeafReaderContext leafReaderContext, int contextDocId, MatchRegionRetriever.FieldValueProvider doc, ToIntFunction<String> maxHitsPerField, Map<String, List<OffsetRange>> outputHighlights) Low-level method for retrieving hit ranges for a single document.void
highlightDocuments
(PrimitiveIterator.OfInt docIds, MatchRegionRetriever.MatchOffsetsConsumer consumer, ToIntFunction<String> maxHitsPerField) Low-level, high-efficiency method for highlighting large numbers of documents at once.void
highlightDocuments
(PrimitiveIterator.OfInt docIds, MatchRegionRetriever.MatchOffsetsConsumer consumer, ToIntFunction<String> maxHitsPerField, int maxBlockSize, int maxBlocksProcessedInParallel) Low-level, high-efficiency method for highlighting large numbers of documents at once.void
highlightDocuments
(TopDocs topDocs, MatchRegionRetriever.MatchOffsetsConsumer consumer) ProcessesTopDocs
with reasonable defaults.private MatchRegionRetriever.DocHighlightData[]
prepareBlock
(int[] idBlock, ToIntFunction<String> maxHitsPerField) private void
processBlock
(MatchRegionRetriever.DocHighlightData[] docHighlightData, MatchRegionRetriever.MatchOffsetsConsumer consumer)
-
Field Details
-
leaves
-
weight
-
offsetStrategies
-
queryAffectedHighlightedFields
-
shouldLoadStoredField
-
searcher
-
-
Constructor Details
-
MatchRegionRetriever
public MatchRegionRetriever(IndexSearcher searcher, Query query, Analyzer analyzer, Predicate<String> fieldsToLoadUnconditionally, Predicate<String> fieldsToLoadIfWithHits) throws IOException This constructor uses the default offset strategy supplier fromcomputeOffsetRetrievalStrategies(IndexReader, Analyzer)
.- Parameters:
searcher
- TheIndexSearcher
used to execute the query. The index searcher's task executor is also used for computing highlights concurrently.query
- The query for which highlights should be returned.analyzer
- An analyzer that may be used to reprocess (retokenize) document fields in the absence of position offsets in the index. Note that the analyzer must return tokens (positions and offsets) identical to the ones stored in the index.fieldsToLoadUnconditionally
- A custom predicate that should returntrue
for any field that should be preloaded and made available throughMatchRegionRetriever.FieldValueProvider
, regardless of whether the query affected the field or not. This predicate can be used to load additional fields during field highlighting, making them available toMatchRegionRetriever.MatchOffsetsConsumer
s.fieldsToLoadIfWithHits
- A custom predicate that should returntrue
for fields that should be highlighted. Typically, this would always returntrue
indicating any field affected by the query should be highlighted. However, sometimes highlights may not be needed: for example, if they affect fields that are only used for filtering purposes. Returningfalse
for such fields saves the costs of loading those fields into memory and scanning through field matches.- Throws:
IOException
-
MatchRegionRetriever
public MatchRegionRetriever(IndexSearcher searcher, Query query, OffsetsRetrievalStrategySupplier fieldOffsetStrategySupplier, Predicate<String> fieldsToLoadUnconditionally, Predicate<String> fieldsToLoadIfWithHits) throws IOException - Parameters:
searcher
- TheIndexSearcher
used to execute the query. The index searcher's task executor is also used for computing highlights concurrently.query
- The query for which matches should be retrieved. The query should be rewritten against the provided searcher.fieldOffsetStrategySupplier
- A custom supplier of per-fieldOffsetsRetrievalStrategy
instances.fieldsToLoadUnconditionally
- A custom predicate that should returntrue
for any field that should be preloaded and made available throughMatchRegionRetriever.FieldValueProvider
, regardless of whether the query affected the field or not. This predicate can be used to load additional fields during field highlighting, making them available toMatchRegionRetriever.MatchOffsetsConsumer
s.fieldsToLoadIfWithHits
- A custom predicate that should returntrue
for fields that should be highlighted. Typically, this would always returntrue
indicating any field affected by the query should be highlighted. However, sometimes highlights may not be needed: for example, if they affect fields that are only used for filtering purposes. Returningfalse
for such fields saves the costs of loading those fields into memory and scanning through field matches.- Throws:
IOException
-
-
Method Details
-
highlightDocuments
public void highlightDocuments(TopDocs topDocs, MatchRegionRetriever.MatchOffsetsConsumer consumer) throws IOException ProcessesTopDocs
with reasonable defaults. See variants of this method for low-level tuning parameters.- Parameters:
topDocs
- Search results.consumer
- A streaming consumer for document-hits pairs.- Throws:
IOException
- See Also:
-
highlightDocuments
public void highlightDocuments(PrimitiveIterator.OfInt docIds, MatchRegionRetriever.MatchOffsetsConsumer consumer, ToIntFunction<String> maxHitsPerField) throws IOException Low-level, high-efficiency method for highlighting large numbers of documents at once.- Parameters:
docIds
- A stream of sorted document identifiers for which hit ranges should be returned.consumer
- A streaming consumer for document-hits pairs.maxHitsPerField
- A predicate that should, for the provided field, return the maximum number of hit regions to consider when scoring passages. The predicate should returnInteger.MAX_VALUE
for all hits to be considered, although typically 3-10 hits are sufficient and lead to performance savings in long fields with large numbers of hit ranges.- Throws:
IOException
- See Also:
-
highlightDocuments
public void highlightDocuments(PrimitiveIterator.OfInt docIds, MatchRegionRetriever.MatchOffsetsConsumer consumer, ToIntFunction<String> maxHitsPerField, int maxBlockSize, int maxBlocksProcessedInParallel) throws IOException Low-level, high-efficiency method for highlighting large numbers of documents at once.Document IDs are grouped into sequential "blocks". For each block, highlights are computed (this can use parallel threads, if
IndexSearcher.getTaskExecutor()
) can execute tasks in parallel. Finally, processed highlights are passed to theconsumer
.- Parameters:
docIds
- A stream of sorted document identifiers for which hit ranges should be returned.consumer
- A streaming consumer for document-query hits pairs. This consumer will be called sequentially, with document ordering corresponding to that of the query results.maxHitsPerField
- A predicate that should, for the provided field, return the maximum number of hit regions to consider when scoring passages. The predicate should returnInteger.MAX_VALUE
for all hits to be considered, although typically 3-10 hits are sufficient and lead to performance savings in long fields with large numbers of hit ranges.maxBlockSize
- The maximum size of a single contiguous "block" of documents. Each block can be processed in parallel, using the index searcher's task executor.maxBlocksProcessedInParallel
- Maximum number of queued document "blocks"; when reached, the queue is processed (possibly concurrently) and then passed to theconsumer
. Set this value to1
to process blocks sequentially.- Throws:
IOException
-
prepareBlock
private MatchRegionRetriever.DocHighlightData[] prepareBlock(int[] idBlock, ToIntFunction<String> maxHitsPerField) throws IOException - Throws:
IOException
-
processBlock
private void processBlock(MatchRegionRetriever.DocHighlightData[] docHighlightData, MatchRegionRetriever.MatchOffsetsConsumer consumer) throws IOException - Throws:
IOException
-
highlightDocument
public void highlightDocument(LeafReaderContext leafReaderContext, int contextDocId, MatchRegionRetriever.FieldValueProvider doc, ToIntFunction<String> maxHitsPerField, Map<String, List<OffsetRange>> outputHighlights) throws IOExceptionLow-level method for retrieving hit ranges for a single document. This method can be used with custom documentMatchRegionRetriever.FieldValueProvider
.- Throws:
IOException
-
checkOrderConsistency
-
computeOffsetRetrievalStrategies
public static OffsetsRetrievalStrategySupplier computeOffsetRetrievalStrategies(IndexReader reader, Analyzer analyzer) Compute default strategies for retrieving offsets fromMatchesIterator
instances for a set of given fields.
-