olaf.pipeline.pipeline_component.candidate_term_enrichment package¶
Submodules¶
olaf.pipeline.pipeline_component.candidate_term_enrichment.knowledge_based_enrichment module¶
- class olaf.pipeline.pipeline_component.candidate_term_enrichment.knowledge_based_enrichment.KnowledgeBasedCTermEnrichment(knowledge_source: KnowledgeSource, use_synonyms: bool | None = True, enrichment_kinds: Set[str] | None = {'synonyms'})[source]¶
Bases:
PipelineComponent
Pipeline component to enrich candidate terms based on an external source of knowledge, e.g., a KG.
Attributes¶
- knowledge_sourceKnowledgeSource
The source of knowledge to use for enrichment.
- use_synonyms: bool, optional
Wether to use the existing candidate terms synonyms, by default True.
- enrichment_kinds: Set[str], optional
The kinds of enrichments to perform. Accepted values are: ‘synonyms’ (default), ‘antonyms’, ‘hypernyms’, and ‘hyponyms’. Other values will be ignored.
- check_resources() None [source]¶
Method to check that the component has access to all its required resources.
- get_performance_report() Dict[str, Any] [source]¶
- A getter for the pipeline component performance report.
If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the set parameters.
Returns¶
- Dict[str, Any]
The pipeline component performance report.
olaf.pipeline.pipeline_component.candidate_term_enrichment.llm_based_enrichment module¶
- class olaf.pipeline.pipeline_component.candidate_term_enrichment.llm_based_enrichment.LLMBasedTermEnrichment(prompt_template: Callable[[str], List[Dict[str, str]]] | None = None, llm_generator: LLMGenerator | None = None)[source]¶
Bases:
PipelineComponent
Enrich candidate terms using LLM knowledge.
Attributes¶
- prompt_template: Callable[[str], List[Dict[str, str]]]
Prompt template used to give instructions and context to the LLM.
- llm_generator: LLMGenerator
The LLM model used to enrich the candidate terms. By default, the zephyr-7b-beta HuggingFace model is used.
- check_resources() None [source]¶
Method to check that the component has access to all its required resources.
- get_performance_report() Dict[str, Any] [source]¶
A getter for the pipeline component performance report. If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the set parameters.
Returns¶
- Dict[str, Any]
The pipeline component performance report.
olaf.pipeline.pipeline_component.candidate_term_enrichment.semantic_based_enrichment module¶
- class olaf.pipeline.pipeline_component.candidate_term_enrichment.semantic_based_enrichment.SemanticBasedEnrichment(threshold: float | None = None)[source]¶
Bases:
PipelineComponent
Pipeline component to enrich candidate terms based on semantic meaning computed from embeddings similarity. The most similar words in the vocabulary are added as synonyms.
Attributes¶
- thresholdfloat, optional
The threshold defines the minimum similarity score required to be synonymous. By default the threshold is set to 0.9.
- check_resources() None [source]¶
Method to check that the component has access to all its required resources.
- enrich_term(c_term: CandidateTerm, spacy_model: Language) None [source]¶
Enrich candidate term synonyms based on most similar words in the vocabulary. Similarity is computed based on vectors cosine similarity measure.
- get_performance_report() Dict[str, Any] [source]¶
A getter for the pipeline component performance report. If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the set parameters.
Returns¶
- Dict[str, Any]
The pipeline component performance report.