olaf.pipeline.pipeline_component.candidate_term_enrichment package

Submodules

olaf.pipeline.pipeline_component.candidate_term_enrichment.knowledge_based_enrichment module

class olaf.pipeline.pipeline_component.candidate_term_enrichment.knowledge_based_enrichment.KnowledgeBasedCTermEnrichment(knowledge_source: KnowledgeSource, use_synonyms: bool | None = True, enrichment_kinds: Set[str] | None = {'synonyms'})[source]

Bases: PipelineComponent

Pipeline component to enrich candidate terms based on an external source of knowledge, e.g., a KG.

Attributes

knowledge_sourceKnowledgeSource

The source of knowledge to use for enrichment.

use_synonyms: bool, optional

Wether to use the existing candidate terms synonyms, by default True.

enrichment_kinds: Set[str], optional

The kinds of enrichments to perform. Accepted values are: ‘synonyms’ (default), ‘antonyms’, ‘hypernyms’, and ‘hyponyms’. Other values will be ignored.

check_resources() None[source]

Method to check that the component has access to all its required resources.

get_performance_report() Dict[str, Any][source]
A getter for the pipeline component performance report.

If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the set parameters.

Returns

Dict[str, Any]

The pipeline component performance report.

optimise() None[source]

A method to optimise the pipeline component by tuning the options.

run(pipeline: Pipeline) None[source]

Method that is responsible for the execution of the component.

Parameters

pipelinePipeline

The pipeline running.

olaf.pipeline.pipeline_component.candidate_term_enrichment.llm_based_enrichment module

class olaf.pipeline.pipeline_component.candidate_term_enrichment.llm_based_enrichment.LLMBasedTermEnrichment(prompt_template: Callable[[str], List[Dict[str, str]]] | None = None, llm_generator: LLMGenerator | None = None)[source]

Bases: PipelineComponent

Enrich candidate terms using LLM knowledge.

Attributes

prompt_template: Callable[[str], List[Dict[str, str]]]

Prompt template used to give instructions and context to the LLM.

llm_generator: LLMGenerator

The LLM model used to enrich the candidate terms. By default, the zephyr-7b-beta HuggingFace model is used.

check_resources() None[source]

Method to check that the component has access to all its required resources.

get_performance_report() Dict[str, Any][source]

A getter for the pipeline component performance report. If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the set parameters.

Returns

Dict[str, Any]

The pipeline component performance report.

optimise(validation_terms: Set[str], option_values_map: Set[float]) None[source]

A method to optimise the pipeline component by tuning the configuration.

run(pipeline: Pipeline) None[source]

Method that is responsible for the execution of the component.

Parameters

pipelinePipeline

The pipeline running.

olaf.pipeline.pipeline_component.candidate_term_enrichment.semantic_based_enrichment module

class olaf.pipeline.pipeline_component.candidate_term_enrichment.semantic_based_enrichment.SemanticBasedEnrichment(threshold: float | None = None)[source]

Bases: PipelineComponent

Pipeline component to enrich candidate terms based on semantic meaning computed from embeddings similarity. The most similar words in the vocabulary are added as synonyms.

Attributes

thresholdfloat, optional

The threshold defines the minimum similarity score required to be synonymous. By default the threshold is set to 0.9.

check_resources() None[source]

Method to check that the component has access to all its required resources.

enrich_term(c_term: CandidateTerm, spacy_model: Language) None[source]

Enrich candidate term synonyms based on most similar words in the vocabulary. Similarity is computed based on vectors cosine similarity measure.

get_performance_report() Dict[str, Any][source]

A getter for the pipeline component performance report. If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the set parameters.

Returns

Dict[str, Any]

The pipeline component performance report.

optimise() None[source]

A method to optimise the pipeline component by tuning the options.

run(pipeline: Pipeline) None[source]

Method responsible for the component execution.

Parameters

pipelinePipeline

The pipeline running.

Module contents