olaf.pipeline.pipeline_component.concept_relation_extraction package¶

Submodules¶

olaf.pipeline.pipeline_component.concept_relation_extraction.agglomerative_clustering_concept_extraction module¶

class olaf.pipeline.pipeline_component.concept_relation_extraction.agglomerative_clustering_concept_extraction.AgglomerativeClusteringConceptExtraction(nb_clusters: int | None = None, metric: str | None = None, linkage: str | None = 'average', distance_threshold: float | None = None, embedding_model: str | None = None)[source]¶

Bases: PipelineComponent

Extract concept based candidate terms with agglomerative clustering.

Attributes¶

candidate_terms: List[CandidateTerm]: List of candidate terms to extract concepts from.
nb_clusters: int, optional: Number of clusters to find with the agglomerative clustering algorithm. It must be None if distance_threshold is not None, by default 2.
metric: str, optional: Metric used to compute the linkage. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or “precomputed”, by default “cosine”.
linkage: str, optional: Distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion. Can be “ward”, “complete”, “average”, “single”, by default “average”.
distance_threshold: float, optional: The linkage distance threshold at or above which clusters will not be merged. If not None, n_clusters must be None, by default None.
embedding_model: str, optional: Name of the embedding model to use. The list of available models can be found here : https://www.sbert.net/docs/pretrained_models.html, by default None.

check_resources() → None[source]¶: Method to check that the component has access to all its required resources.

get_performance_report() → Dict[str, Any][source]¶

A getter for the pipeline component performance report.: If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the parameters set.

Returns¶

Dict[str, Any]: The pipeline component performance report.

optimise() → None[source]¶: A method to optimise the pipeline component by tuning the options.

run(pipeline: Pipeline) → None[source]¶

Execution of the agglomerative clustering algorithm on candidate terms embedded. Concepts creation and candidate terms purge.

Parameters¶

pipelinePipeline: The pipeline running.

olaf.pipeline.pipeline_component.concept_relation_extraction.agglomerative_clustering_relation_extraction module¶

class olaf.pipeline.pipeline_component.concept_relation_extraction.agglomerative_clustering_relation_extraction.AgglomerativeClusteringRelationExtraction(nb_clusters: int | None = None, metric: str | None = None, linkage: str | None = 'average', distance_threshold: float | None = None, embedding_model: str | None = None, concept_max_distance: int | None = None, scope: str | None = 'doc')[source]¶

Bases: PipelineComponent

Extract relation based on candidate terms with agglomerative clustering.

Attributes¶

candidate_relations: List[CandidateRelations], optional: List of candidate relations to extract relations from, by default None.
nb_clusters: int, optional: Number of clusters to find with the agglomerative clustering algorithm. It must be None if distance_threshold is not None, by default 2.
metric: str, optional: Metric used to compute the linkage. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or “precomputed”, by default cosine.
linkage: str, optional: Distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion. Can be “ward”, “complete”, “average”, “single”, by default “average”.
distance_threshold: float, optional: The linkage distance threshold at or above which clusters will not be merged. If not None, n_clusters must be None, by default None.
embedding_model: str, optional: Name of the embedding model to use. The list of available models can be found here : https://www.sbert.net/docs/pretrained_models.html, by default None.
concept_max_distance: int, optional: The maximum distance between the candidate term and the concept sought, by defautl 5.
scope: str, optional: Scope used to search concepts. Can be “doc” for the entire document or “sent” for the candidate term “sentence”, by default “doc”.

check_resources() → None[source]¶: Method to check that the component has access to all its required resources.

get_performance_report() → Dict[str, Any][source]¶

A getter for the pipeline component performance report.: If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the parameters set.

Returns¶

Dict[str, Any]: The pipeline component performance report.

optimise() → None[source]¶: A method to optimise the pipeline component by tuning the options.

run(pipeline: Pipeline) → None[source]¶

Execution of the agglomerative clustering algorithm on candidate terms embedded. Relations creation and candidate terms purge.

Parameters¶

pipelinePipeline: The pipeline running.

olaf.pipeline.pipeline_component.concept_relation_extraction.candidate_terms_to_concepts module¶

class olaf.pipeline.pipeline_component.concept_relation_extraction.candidate_terms_to_concepts.CTsToConceptExtraction[source]¶

Bases: PipelineComponent

A pipeline component to create concepts directly from the candidate terms.

check_resources() → None[source]¶: Method to check that the component has access to all its required resources.

get_performance_report() → Dict[str, Any][source]¶

A getter for the pipeline component performance report.: If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the parameters set.

Returns¶

Dict[str, Any]: The pipeline component performance report.

optimise() → None[source]¶: A method to optimise the pipeline component by tuning the options.

run(pipeline: Pipeline) → None[source]¶

Execution of the concept extraction directly from existing candidate terms. The pipeline candidate terms are consumed.

Parameters¶

pipelinePipeline: The pipeline running.

olaf.pipeline.pipeline_component.concept_relation_extraction.candidate_terms_to_relations module¶

class olaf.pipeline.pipeline_component.concept_relation_extraction.candidate_terms_to_relations.CTsToRelationExtraction(concept_max_distance: int | None = None, scope: str | None = 'doc')[source]¶

Bases: PipelineComponent

A pipeline component to create relations directly from the candidate terms.

Attributes¶

concept_max_distance: int, optional: The maximum distance between the candidate term and the concept sought, by default 5.
scope: str, optional: Scope used to search concepts. Can be “doc” for the entire document or “sent” for the candidate term “sentence”, by default “doc”.

check_resources() → None[source]¶: Method to check that the component has access to all its required resources.

get_performance_report() → Dict[str, Any][source]¶

A getter for the pipeline component performance report.: If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the parameters set.

Returns¶

Dict[str, Any]: The pipeline component performance report.

optimise() → None[source]¶: A method to optimise the pipeline component by tuning the options.

run(pipeline: Pipeline) → None[source]¶

Execution of the relation extraction directly from existing candidate terms. Candidate terms are first converted into candidate relations. Then the candidate relations are converted into relations. The pipeline candidate terms are consumed.

Parameters¶

pipelinePipeline: The pipeline running.

olaf.pipeline.pipeline_component.concept_relation_extraction.concept_cooc_metarelation_extraction module¶

class olaf.pipeline.pipeline_component.concept_relation_extraction.concept_cooc_metarelation_extraction.ConceptCoocMetarelationExtraction(custom_metarelation_creation_metric: Callable[[int], bool] | None = None, window_size: int | None = None, threshold: int | None = None, scope: str | None = 'doc', metarelation_label: str | None = 'RELATED_TO', create_symmetric_metarelation: bool | None = False)[source]¶

Bases: PipelineComponent

A pipeline component to extract metarelations based on concept co-occurrence.

Attributes¶

metarelation_creation_metric: Callable[[int], bool], optional: The function to define based on the concept co-occurrence count whether or not to create a metarelation, by default co-occurrence count > self.threshold.
window_size: int, optional: The token window size to consider for concept co-occurrence. Minimum is 2, by default None.
threshold: int, optional: The co-occurrence minimum count threshold for metarelation construction, by default 0.
scope: str, optional: The corpus scope to consider. Either ‘doc’ or ‘sent’, by default ‘doc’.
metarelation_label: str, optional: The metarelation label to use, by default ‘RELATED_TO’.
create_symmetric_metarelation: bool, optional: Whether to create the symmetric metarelation, by default False. WARNING! this option can create a lot of metarelation that can easily be created in a later process.

check_resources() → None[source]¶: Method to check that the component has access to all its required resources.

get_performance_report() → Dict[str, Any][source]¶

A getter for the pipeline component performance report.: If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the parameters set.

Returns¶

Dict[str, Any]: The pipeline component performance report.

optimise() → None[source]¶: A method to optimise the pipeline component by tuning the options.

run(pipeline: Pipeline) → None[source]¶

Execution of the metarelation extraction based on concept co-occurrence. Metarelations are created and added to the pipeline knowledge representation.

Parameters¶

pipelinePipeline: The pipeline running.

olaf.pipeline.pipeline_component.concept_relation_extraction.knowledge_based_concept_extraction module¶

class olaf.pipeline.pipeline_component.concept_relation_extraction.knowledge_based_concept_extraction.KnowledgeBasedConceptExtraction(knowledge_source: KnowledgeSource, group_ct_on_synonyms: bool | None = True)[source]¶

Bases: PipelineComponent

Pipeline component to extract concepts based on an external source of knowledge, e.g., a KG.

Attributes¶

knowledge_sourceKnowledgeSource: The source of knowledge to use for concept matching.
group_ct_on_synonyms: bool, optional: Wether or not to group the candidate terms on synonyms before proceeding to the concept matching with the external source of knowledge, by default True.

c_terms_texts_to_match(ct_group: Set[CandidateTerm]) → Set[str][source]¶

Extract from a set of candidate terms the strings to use for concept matching.

Parameters¶

ct_groupSet[CandidateTerm]: The set of candidate terms.

Returns¶

Set[str]: The set of strings to use for concept matching.

check_resources() → None[source]¶: Method to check that the component has access to all its required resources.

get_performance_report() → Dict[str, Any][source]¶

A getter for the pipeline component performance report.: If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the set parameters.

Returns¶

Dict[str, Any]: The pipeline component performance report.

optimise() → None[source]¶: A method to optimise the pipeline component by tuning the options.

run(pipeline: Pipeline) → None[source]¶

Method that is responsible for the execution of the component.

Parameters¶

pipelinePipeline: The pipeline running.

olaf.pipeline.pipeline_component.concept_relation_extraction.knowledge_based_relation_extraction module¶

class olaf.pipeline.pipeline_component.concept_relation_extraction.knowledge_based_relation_extraction.KnowledgeBasedRelationExtraction(knowledge_source: KnowledgeSource, group_ct_on_synonyms: bool | None = True, concept_max_distance: int | None = None, scope: str | None = 'doc')[source]¶

Bases: PipelineComponent

Pipeline component to extract relations based on an external source of knowledge, e.g., a KG. Candidate terms are converted into candidate relations. Then, candidate relations are validated as relations if their labels match the external source of knowledge.

Attributes¶

knowledge_sourceKnowledgeSource: The source of knowledge to use for relation matching.
group_ct_on_synonyms: bool, optional: Whether or not to group the candidate terms on synonyms before proceeding to the relation matching with the external source of knowledge, by default True.
concept_max_distance: int, optional: The maximum distance between the candidate term and the concept sought, by default 5.
scope: str: Scope used to search concepts. Can be “doc” for the entire document or “sent” for the candidate term “sentence”, by default “doc”.

c_terms_texts_to_match(cr_group: Set[CandidateRelation]) → Set[str][source]¶

Extract from a set of candidate relations the strings to use for concept matching.

Parameters¶

cr_groupSet[CandidateRelation]: The set of candidate relations.

Returns¶

Set[str]: The set of strings to use for relation matching.

check_resources() → None[source]¶: Method to check that the component has access to all its required resources.

get_performance_report() → Dict[str, Any][source]¶

A getter for the pipeline component performance report.: If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the set parameters.

Returns¶

Dict[str, Any]: The pipeline component performance report.

optimise() → None[source]¶: A method to optimise the pipeline component by tuning the options.

run(pipeline: Pipeline) → None[source]¶

Method that is responsible for the execution of the component.

Parameters¶

pipelinePipeline: The pipeline running.

olaf.pipeline.pipeline_component.concept_relation_extraction.llm_based_concept_extraction module¶

class olaf.pipeline.pipeline_component.concept_relation_extraction.llm_based_concept_extraction.LLMBasedConceptExtraction(prompt_template: Callable[[str], List[Dict[str, str]]] | None = None, llm_generator: LLMGenerator | None = None, doc_context_max_len: int | None = 4000)[source]¶

Bases: PipelineComponent

LLM based concept extraction.

Attributes¶

prompt_template: Callable[[str], List[Dict[str, str]]]: Prompt template used to give instructions and context to the LLM.
llm_generator: LLMGenerator: The LLM model used to generate the concepts.
doc_context_max_len: int: Maximum number of characters for the document context in the prompt.

check_resources() → None[source]¶: Method to check that the component has access to all its required resources.

get_performance_report() → Dict[str, Any][source]¶

A getter for the pipeline component performance report. If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the set parameters.

Returns¶

Dict[str, Any]: The pipeline component performance report.

optimise(validation_terms: Set[str], option_values_map: Set[float]) → None[source]¶: A method to optimise the pipeline component by tuning the configuration.

run(pipeline: Pipeline) → None[source]¶

Method that is responsible for the execution of the component. Concepts are created and candidate terms are purged.

Parameters¶

pipeline: Pipeline: The pipeline to run the component with.

olaf.pipeline.pipeline_component.concept_relation_extraction.llm_based_relation_extraction module¶

class olaf.pipeline.pipeline_component.concept_relation_extraction.llm_based_relation_extraction.LLMBasedRelationExtraction(prompt_template: Callable[[str], List[Dict[str, str]]] | None = None, llm_generator: LLMGenerator | None = None, doc_context_max_len: int | None = 4000, concept_max_distance: int | None = None, scope: str | None = 'doc')[source]¶

Bases: PipelineComponent

LLM based relation extraction.

Attributes¶

prompt_template: Callable[[str], List[Dict[str, str]]], optional: Prompt template used to give instructions and context to the LLM, by default None.
llm_generator: LLMGenerator, optional: The LLM model used to generate the relation, by default None.
doc_context_max_len: int, optional: Maximum number of characters for the document context in the prompt, by default 4000.
concept_max_distance: int, optional: The maximum distance between the candidate term and the concept sought, by default 5.
scope: str, optional: Scope used to search concepts. Can be “doc” for the entire document or “sent” for the candidate term “sentence”, by default “doc”.

check_resources() → None[source]¶: Method to check that the component has access to all its required resources.

get_performance_report() → Dict[str, Any][source]¶

A getter for the pipeline component performance report. If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the set parameters.

Returns¶

Dict[str, Any]: The pipeline component performance report.

optimise(validation_terms: Set[str], option_values_map: Set[float]) → None[source]¶: A method to optimise the pipeline component by tuning the configuration.

run(pipeline: Pipeline) → None[source]¶

Method that is responsible for the execution of the component. Relations are created and candidate terms are purged.

Parameters¶

pipeline: Pipeline: The pipeline to run the component with.

olaf.pipeline.pipeline_component.concept_relation_extraction.synonym_concept_extraction module¶

class olaf.pipeline.pipeline_component.concept_relation_extraction.synonym_concept_extraction.SynonymConceptExtraction[source]¶

Bases: PipelineComponent

Extract concepts based on synonyms grouping.

check_resources() → None[source]¶: Method to check that the component has access to all its required resources.

get_performance_report() → Dict[str, Any][source]¶

A getter for the pipeline component performance report.: If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the parameters set.

Returns¶

Dict[str, Any]: The pipeline component performance report.

optimise() → None[source]¶: A method to optimise the pipeline component by tuning the options.

run(pipeline: Pipeline) → None[source]¶

Execution of the synonyms grouping for concept extraction on candidate terms. Concepts are created and candidate terms are purged.

Parameters¶

pipelinePipeline: The pipeline running.

olaf.pipeline.pipeline_component.concept_relation_extraction.synonym_relation_extraction module¶

class olaf.pipeline.pipeline_component.concept_relation_extraction.synonym_relation_extraction.SynonymRelationExtraction(concept_max_distance: int | None = None, scope: str | None = 'doc')[source]¶

Bases: PipelineComponent

Extract relations based on synonyms grouping.

Attributes¶

concept_max_distance: int, optional: The maximum distance between the candidate term and the concept sought, by default 5.
scope: str: Scope used to search concepts. Can be “doc” for the entire document or “sent” for the candidate term “sentence”, by default “doc”.

check_resources() → None[source]¶: Method to check that the component has access to all its required resources.

get_performance_report() → Dict[str, Any][source]¶

A getter for the pipeline component performance report.: If the component has been optimised, it only returns the best performance. Otherwise, it returns the results obtained with the parameters set.

Returns¶

Dict[str, Any]: The pipeline component performance report.

optimise() → None[source]¶: A method to optimise the pipeline component by tuning the options.

run(pipeline: Pipeline) → None[source]¶

Execution of the synonyms grouping for relation extraction on candidate terms. Candidate terms are converted into candidate relations. Candidate relations with same synonyms, source and destination concepts are grouped together as a new relation. Candidate terms are purged.

Parameters¶

pipelinePipeline: The pipeline running.

olaf.pipeline.pipeline_component.concept_relation_extraction package¶

Submodules¶

olaf.pipeline.pipeline_component.concept_relation_extraction.agglomerative_clustering_concept_extraction module¶

Attributes¶

Returns¶

Parameters¶

olaf.pipeline.pipeline_component.concept_relation_extraction.agglomerative_clustering_relation_extraction module¶

Attributes¶

Returns¶

Parameters¶

olaf.pipeline.pipeline_component.concept_relation_extraction.candidate_terms_to_concepts module¶

Returns¶

Parameters¶

olaf.pipeline.pipeline_component.concept_relation_extraction.candidate_terms_to_relations module¶

Attributes¶

Returns¶

Parameters¶

olaf.pipeline.pipeline_component.concept_relation_extraction.concept_cooc_metarelation_extraction module¶

Attributes¶

Returns¶

Parameters¶

olaf.pipeline.pipeline_component.concept_relation_extraction.knowledge_based_concept_extraction module¶

Attributes¶

Parameters¶

Returns¶

Returns¶

Parameters¶

olaf.pipeline.pipeline_component.concept_relation_extraction.knowledge_based_relation_extraction module¶

Attributes¶

Parameters¶

Returns¶

Returns¶

Parameters¶

olaf.pipeline.pipeline_component.concept_relation_extraction.llm_based_concept_extraction module¶

Attributes¶

Returns¶

Parameters¶

olaf.pipeline.pipeline_component.concept_relation_extraction.llm_based_relation_extraction module¶

Attributes¶

Returns¶

Parameters¶

olaf.pipeline.pipeline_component.concept_relation_extraction.synonym_concept_extraction module¶

Returns¶

Parameters¶

olaf.pipeline.pipeline_component.concept_relation_extraction.synonym_relation_extraction module¶

Attributes¶

Returns¶

Parameters¶

Module contents¶