olaf.commons package¶
Submodules¶
olaf.commons.candidate_term_tools module¶
- olaf.commons.candidate_term_tools.build_cts_from_strings(ct_label_strings: Set[str], spacy_model: Language, docs: List[Doc]) Set[CandidateTerm] [source]¶
Create candidate terms from a set of strings label.
Parameters¶
- ct_label_stringsSet[str]
The set of strings to use for candidate terms labels.
- spacy_modelspacy.language.Language
The spaCy model to retrieve the corpus occurrences.
- docsList[spacy.tokens.Doc]
The corpus in which to find the corpus occurrences.
Returns¶
- Set[CandidateTerm]
The set of created candidate terms.
- olaf.commons.candidate_term_tools.check_ct_belongs_to_group(candidate_term: CandidateTerm, ct_labels: Set[str], group_cts: Tuple[CandidateTerm], group_label: Set[str]) bool [source]¶
Check if a candidate term belongs to a group of candidate terms. Candidate must have common label or synonyms with the group. If the candidate is a candidate relation, it should have the same source and destination concept as well.
Parameters¶
- candidate_term: CandidateTerm
Candidate term to check.
- ct_labels: Set[str]
Candidate term label and synonyms.
- group_cts: Tuple[CandidateTerm]
Tuple of candidate terms to compare with.
- group_label: Set[str]
Group of candidate terms labels and synonyms.
Returns¶
- bool
True if the candidate term belongs to the group, False otherwise.
- olaf.commons.candidate_term_tools.cts_have_common_synonyms(c_term_1: CandidateTerm, c_term_2: CandidateTerm) bool [source]¶
Check if two terms have common synonyms.
Parameters¶
- c_term_1CandidateTerm
First candidate term to compare.
- c_term_2CandidateTerm
Second candidate term to compare.
Returns¶
- bool
True if the two candidate terms have common synonyms, False otherwise.
- olaf.commons.candidate_term_tools.cts_to_concept(concept_candidates: Set[CandidateTerm]) Concept [source]¶
Create a concept out of a set of candidate terms.
Parameters¶
- concept_candidatesSet[CandidateTerm]
Set of candidate terms to be merged in a same concept.
Returns¶
- Concept
The created concept.
- olaf.commons.candidate_term_tools.filter_cts_on_first_token_in_term(candidate_terms: Set[CandidateTerm], filtering_tokens: Set[str]) Set[CandidateTerm] [source]¶
Filter a set of candidate terms based on their first token.
Note: this function acts only at the candidate term label level.
Parameters¶
- candidate_terms: Set[CandidateTerm]
Set of candidate terms to filter.
- filtering_tokens: Set[str]
The set of token strings to use for filtering the candidate terms.
Returns¶
- Set[CandidateTerm]
The set of filtered candidate terms.
- olaf.commons.candidate_term_tools.filter_cts_on_last_token_in_term(candidate_terms: Set[CandidateTerm], filtering_tokens: Set[str]) Set[CandidateTerm] [source]¶
Filter a set of candidate terms based on their last token.
Note: this function acts only at the candidate term label level.
Parameters¶
- candidate_terms: Set[CandidateTerm]
Set of candidate terms to filter.
- filtering_tokens: Set[str]
The set of token strings to use for filtering the candidate terms.
Returns¶
- Set[CandidateTerm]
The set of filtered candidate terms.
- olaf.commons.candidate_term_tools.filter_cts_on_token_in_term(candidate_terms: Set[CandidateTerm], filtering_tokens: Set[str]) Set[CandidateTerm] [source]¶
Filter a set of candidate terms based on tokens appearing in them.
Note: this function acts only at the candidate term label level.
Parameters¶
- candidate_terms: Set[CandidateTerm]
Set of candidate terms to filter.
- filtering_tokens: Set[str]
The set of token strings to use for filtering the candidate terms.
Returns¶
- Set[CandidateTerm]
The set of filtered candidate terms.
- olaf.commons.candidate_term_tools.group_cts_on_synonyms(candidate_terms: Set[CandidateTerm]) List[Set[CandidateTerm]] [source]¶
Group candidate terms with commons labels or synonyms.
Parameters¶
- candidate_terms: Set[CandidateTerm]
Candidate terms to group by commons labels and synonyms.
Returns¶
- List[Set[CandidateTerm]]
Candidate terms grouped.
- olaf.commons.candidate_term_tools.split_cts_on_token(candidate_terms: Set[CandidateTerm], splitting_tokens: Set[str], spacy_model: Language, docs: List[Doc]) Set[CandidateTerm] [source]¶
Split candidate terms based on a set of token strings.
Note: this function acts only at the candidate term label level.
Parameters¶
- candidate_terms: Set[CandidateTerm]
The set of candidate terms to split.
- splitting_tokens: Set[str]
The token strings to split candidate terms on.
- spacy_modelspacy.language.Language
The spaCy model to retrieve the candidate terms’ corpus occurrences.
- docsList[spacy.tokens.Doc]
The corpus in which to find the candidate terms’ corpus occurrences.
Returns¶
- Set[CandidateTerm]
The new set of candidate terms.
olaf.commons.embedding_tools module¶
olaf.commons.errors module¶
- exception olaf.commons.errors.EmptyCorpusError[source]¶
Bases:
Exception
Exception raised when the text corpus represented as spacy documents is empty.
- exception olaf.commons.errors.FileOrDirectoryNotFoundError(path: str)[source]¶
Bases:
Exception
Exception raised when the corpus path is not a directory or a file.
- exception olaf.commons.errors.MissingEnvironmentVariable(component_name: str, env_var_name: str)[source]¶
Bases:
Exception
Exception raised when an environment variable is missing.
- exception olaf.commons.errors.NotCallableError(function_name: str)[source]¶
Bases:
Exception
Exception raised when the argument passed as a function is not callable.
- exception olaf.commons.errors.OptionError(component_name: str, option_name: str, error_type: str)[source]¶
Bases:
Exception
Exception raised when a required option is missing for a pipeline component to function.
- exception olaf.commons.errors.ParameterError(component_name: str, param_name: str, error_type: str)[source]¶
Bases:
Exception
Exception raised when a required parameter is missing for a pipeline component to function.
olaf.commons.kr_to_rdf_tools module¶
- olaf.commons.kr_to_rdf_tools.all_individuals_different(kr: KnowledgeRepresentation, base_uri: URIRef) Graph [source]¶
Create the RDF triples corresponding to making each KR concepts linguistic representation an OWL named instance and making each instance different.
Parameters¶
- krKnowledgeRepresentation
The Knowledge Representation containing the concepts.
- base_uriURIRef
The base URI to use when creating the class URIs.
Returns¶
- Graph
The constructed RDF triples.
- olaf.commons.kr_to_rdf_tools.concept_lrs_to_owl_individuals(kr: KnowledgeRepresentation, base_uri: URIRef) Graph [source]¶
Create the RDF triples corresponding to making each KR concepts an OWL class with each concept linguistic representations instances of the concept class.
Parameters¶
- krKnowledgeRepresentation
The Knowledge Representation containing the concepts.
- base_uriURIRef
The base URI to use when creating the class URIs.
Returns¶
- Graph
The constructed RDF triples.
- olaf.commons.kr_to_rdf_tools.create_obj_prop_all_restriction_triples(rel_uri: URIRef, dest_concept_uri: URIRef) Tuple[URIRef, Graph] [source]¶
Create the triples corresponding to an universal OWL property restriction part of the graph.
Parameters¶
- rel_uriURIRef
The URI or the relation the OWL property restriction is focusing on.
- dest_concept_uriURIRef
The URI of the concept (i.e., OWL class) involved in the OWL property restriction.
Returns¶
- Tuple[URIRef, Graph]
The blank node ID origin of the OWL property restriction and the corresponding graph.
- olaf.commons.kr_to_rdf_tools.create_obj_prop_some_restriction_triples(rel_uri: URIRef, dest_concept_uri: URIRef) Tuple[URIRef, Graph] [source]¶
Create the triples corresponding to an existential OWL property restriction part of the graph.
Parameters¶
- rel_uriURIRef
The URI or the relation the OWL property restriction is focusing on.
- dest_concept_uriURIRef
The URI of the concept (i.e., OWL class) involved in the OWL property restriction.
Returns¶
- Tuple[URIRef, Graph]
The blank node ID origin of the OWL property restriction and the corresponding graph.
- olaf.commons.kr_to_rdf_tools.kr_concepts_to_disjoint_classes(kr: KnowledgeRepresentation, base_uri: URIRef) Graph [source]¶
Create the RDF triples corresponding to making each KR concepts an OWL class and making each classes disjoint.
Parameters¶
- krKnowledgeRepresentation
The Knowledge Representation containing the concepts.
- base_uriURIRef
The base URI to use when creating the class URIs.
Returns¶
- Graph
The constructed RDF triples.
- olaf.commons.kr_to_rdf_tools.kr_concepts_to_owl_classes(kr: KnowledgeRepresentation, base_uri: URIRef) Graph [source]¶
Create the RDF triples corresponding to making each KR concepts an OWL class.
Parameters¶
- krKnowledgeRepresentation
The Knowledge Representation containing the concepts.
- base_uriURIRef
The base URI to use when creating the class URIs.
Returns¶
- Graph
The constructed RDF triples.
- olaf.commons.kr_to_rdf_tools.kr_metarelations_to_owl(kr: KnowledgeRepresentation, base_uri: URIRef) Graph [source]¶
Create the RDF triples corresponding to mapping the KR metarelations with OWL vocabulary.
The mapping depends on the providing dictionary. The KR metarelations not matching any keys in the mapping dictionary is created as an OWL object property.
Parameters¶
- krKnowledgeRepresentation
The Knowledge Representation containing the concepts.
- base_uriURIRef
The base URI to use when creating the class URIs.
Returns¶
- Graph
The constructed RDF triples.
- olaf.commons.kr_to_rdf_tools.kr_relations_to_anonymous_only_parent(kr: KnowledgeRepresentation, base_uri: URIRef) Graph [source]¶
- Create RDF triples corresponding to said in plain english:
‘Each source concept is A SUBSET OF the set of all the things that are related to ONLY instances of the destination concept by the relation.’
Parameters¶
- krKnowledgeRepresentation
The Knowledge Representation containing the concepts and relations.
- base_uriURIRef
The base URI to use when creating the URIs.
Returns¶
- Graph
The constructed RDF triples.
- olaf.commons.kr_to_rdf_tools.kr_relations_to_anonymous_some_equivalent(kr: KnowledgeRepresentation, base_uri: URIRef) Graph [source]¶
- Create RDF triples corresponding to said in plain english:
‘Each source concept is EQUIVALENT TO the set of all the things that are related to SOME instances of the destination concept by the relation.’
Parameters¶
- krKnowledgeRepresentation
The Knowledge Representation containing the concepts and relations.
- base_uriURIRef
The base URI to use when creating the URIs.
Returns¶
- Graph
The constructed RDF triples.
- olaf.commons.kr_to_rdf_tools.kr_relations_to_anonymous_some_parent(kr: KnowledgeRepresentation, base_uri: URIRef) Graph [source]¶
- Create RDF triples corresponding to saying in plain english:
‘Each source concept is A SUBSET OF the set of all the things that are related to SOME instances of the destination concept by the relation.’
Parameters¶
- krKnowledgeRepresentation
The Knowledge Representation containing the concepts and relations.
- base_uriURIRef
The base URI to use when creating the URIs.
Returns¶
- Graph
The constructed RDF triples.
- olaf.commons.kr_to_rdf_tools.kr_relations_to_domain_range_obj_props(kr: KnowledgeRepresentation, base_uri: URIRef) Graph [source]¶
Create the RDF triples corresponding to making each KR relations OWL object properties with domain and range their source and destination concepts.
Source and destination concepts will be created as OWL classes.
Parameters¶
- krKnowledgeRepresentation
The Knowledge Representation containing the concepts.
- base_uriURIRef
The base URI to use when creating the class URIs.
Returns¶
- Graph
The constructed RDF triples.
- olaf.commons.kr_to_rdf_tools.kr_relations_to_owl_obj_props(kr: KnowledgeRepresentation, base_uri: URIRef) Graph [source]¶
Create the RDF triples corresponding to making each KR relations an OWL object property.
Parameters¶
- krKnowledgeRepresentation
The Knowledge Representation containing the concepts.
- base_uriURIRef
The base URI to use when creating the class URIs.
Returns¶
- Graph
The constructed RDF triples.
- olaf.commons.kr_to_rdf_tools.owl_class_uri(label: str, base_uri: URIRef) URIRef [source]¶
Build an OWL class URI.
Parameters¶
- labelstr
The label to use in the URI.
- base_uriURIRef
The base URI to use.
Returns¶
- URIRef
The OWL class URI.
olaf.commons.llm_tools module¶
- class olaf.commons.llm_tools.HuggingFaceGenerator(api_url: str | None = 'https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta')[source]¶
Bases:
LLMGenerator
Text generator base on Hugging Face inference API.
- class olaf.commons.llm_tools.LLMGenerator[source]¶
Bases:
ABC
Text generator based on LLM.
- class olaf.commons.llm_tools.MistralAIGenerator(model_name: str | None = 'mistral-tiny')[source]¶
Bases:
LLMGenerator
Text generator based on MiastralAI models.
- class olaf.commons.llm_tools.OpenAIGenerator[source]¶
Bases:
LLMGenerator
Text generator based on OpenAI gpt-3.5-turbo model.
olaf.commons.logging_config module¶
olaf.commons.prompts module¶
- olaf.commons.prompts.hf_prompt_concept_extraction(doc_context: str, ct_labels: str) str [source]¶
Prompt template for concept extraction with Hugging Face inference API.
Parameters¶
- doc_context: str
Extract of document contents to use as context.
- ct_labels: str
The candidate terms to group into concepts.
Returns¶
- str
Completion prompt template.
- olaf.commons.prompts.hf_prompt_concept_term_extraction(context: str) str [source]¶
Prompt template for concept term extraction with Hugging Face inference API.
Parameters¶
- context: str
The context to add in the prompt template.
Returns¶
- str
Completion prompt template.
- olaf.commons.prompts.hf_prompt_hierarchisation(doc_context: str, concepts_description: str) str [source]¶
Prompt template for hierarchisation with Hugging Face inference API.
Parameters¶
- doc_context: str
Extract of document contents where concepts appear to use as context.
- concepts_description: str
Textual description of the concepts.
Returns¶
- str
Completion prompt template.
- olaf.commons.prompts.hf_prompt_owl_axiom_extraction(kr_description: str, namespace: str) str [source]¶
Prompt template for axiom extraction with Hugging Face inference API.
Parameters¶
- kr_description: str
Textual description of the knowledge representation.
- namespace: str
The name space used for axiom generation.
Returns¶
- str
Completion prompt template.
- olaf.commons.prompts.hf_prompt_relation_extraction(doc_context: str, ct_labels: str) str [source]¶
Prompt template for relation extraction with Hugging Face inference API.
Parameters¶
- doc_context: str
Extract of document contents to use as context.
- ct_labels: str
The candidate terms to group into relations.
Returns¶
- str
Completion prompt template.
- olaf.commons.prompts.hf_prompt_relation_term_extraction(context: str) str [source]¶
Prompt template for relation term extraction with Hugging Face inference API.
Parameters¶
- context: str
The context to add in the prompt template.
Returns¶
- str
Completion prompt template.
- olaf.commons.prompts.hf_prompt_term_enrichment(context: str) str [source]¶
Prompt template for term enrichment with Hugging Face inference API.
Parameters¶
- context: str
The context to add in the prompt template.
Returns¶
- str
Completion prompt template.
- olaf.commons.prompts.openai_prompt_concept_extraction(doc_context: str, ct_labels: str) List[Dict[str, str]] [source]¶
Prompt template for concept extraction with ChatCompletion OpenAI model.
Parameters¶
- doc_context: str
Extract of document contents to use as context.
- ct_labels: str
The candidate terms to group into concepts.
Returns¶
- List[Dict[str, str]]
ChatCompletion prompt template.
- olaf.commons.prompts.openai_prompt_concept_term_extraction(context: str) List[Dict[str, str]] [source]¶
Prompt template for concept term extraction with ChatCompletion OpenAI model.
Parameters¶
- context: str
The context to add in the prompt template.
Returns¶
- List[Dict[str, str]]
ChatCompletion prompt template.
- olaf.commons.prompts.openai_prompt_hierarchisation(doc_context: str, concepts_description: str) List[Dict[str, str]] [source]¶
Prompt template for hierarchisation with ChatCompletion OpenAI model.
Parameters¶
- doc_context: str
Extract of document contents where concepts appear to use as context.
- concepts_description: str
Textual description of the concepts.
Returns¶
- List[Dict[str, str]]
ChatCompletion prompt template.
- olaf.commons.prompts.openai_prompt_owl_axiom_extraction(kr_description: str, namespace: str) List[Dict[str, str]] [source]¶
Prompt template for axiom extraction with ChatCompletion OpenAI model.
Parameters¶
- kr_description: str
Textual description of the knowledge representation.
- namespace: str
The name space used for axiom generation.
Returns¶
- List[Dict[str, str]]
ChatCompletion prompt template.
- olaf.commons.prompts.openai_prompt_relation_extraction(doc_context: str, ct_labels: str) List[Dict[str, str]] [source]¶
Prompt template for relation extraction with ChatCompletion OpenAI model.
Parameters¶
- doc_context: str
Extract of document contents to use as context.
- ct_labels: str
The candidate terms to group as relations.
Returns¶
- List[Dict[str, str]]
ChatCompletion prompt template.
- olaf.commons.prompts.openai_prompt_relation_term_extraction(context: str) List[Dict[str, str]] [source]¶
Prompt template for relation term extraction with ChatCompletion OpenAI model.
Parameters¶
- context: str
The context to add in the prompt template.
Returns¶
- List[Dict[str, str]]
ChatCompletion prompt template.
olaf.commons.relation_tools module¶
- olaf.commons.relation_tools.crs_to_relation(candidate_relations: Set[CandidateRelation]) Relation [source]¶
Convert a set of candidate relations to a new relation. Each candidate relation represents a different linguistic realisation in the relation created.
Parameters¶
- candidate_relationsSet[CandidateRelation]
Set of candidate relations to convert into a relation.
Returns¶
- Relation
The relation created from the candidate relations.
- olaf.commons.relation_tools.cts_to_crs(candidate_terms: Set[CandidateTerm], concepts_labels_map: Dict[str, Concept], spacy_model: Language, concept_max_distance: int, scope: str) Set[CandidateRelation] [source]¶
Convert candidate terms into candidate relations. Concepts are searched around the candidate term within a given distance. If source and destination concepts are found, candidate relation as triple is created. Otherwise, candidate relation has no source and destination concepts.
Parameters¶
- candidate_termsSet[CandidateTerm]
Set of candidate terms to convert into candidate relations.
- concepts_labels_mapDict[str,Concept]
Dictionary with concept labels as keys and concepts corresponding as values.
- spacy_modelspacy.language.Language
SpaCy model to use.
- concept_max_distanceint
The maximum distance between the candidate term and the concept sought.
- scopestr
Scope used to search concepts. Can be “doc” for the entire document or “sent” for the candidate term sentence.
Returns¶
- Set[CandidateRelation]
Set of candidate relations found from the candidate terms.
- olaf.commons.relation_tools.group_cr_by_concepts(candidate_relations: List[CandidateRelation]) List[Set[CandidateRelation]] [source]¶
Group relation candidates with same source and destination concepts Parameters ———- candidate_relations: List[CandidateRelation]
Candidate relations to group by their concepts
Returns¶
- List[Set[CandidateRelation]]
Groups of candidate relations with same source and destination concepts.
olaf.commons.spacy_processing_tools module¶
- olaf.commons.spacy_processing_tools.is_not_num(token: Token) bool [source]¶
Return True if the Spacy Token is NOT a numerical value.
Parameters¶
- tokenspacy.tokens.Token
The Spacy token to test.
Returns¶
- bool
Whether the Token Shape is NOT a numerical value or it is.
- olaf.commons.spacy_processing_tools.is_not_punct(token: Token) bool [source]¶
Return True if the Spacy Token is NOT a punctuation symbol.
Parameters¶
- tokenspacy.tokens.Token
The Spacy token to test.
Returns¶
- bool
Whether the Token Shape is NOT a punctuation symbol or it is.
- olaf.commons.spacy_processing_tools.is_not_stopword(token: Token) bool [source]¶
Return True if the Spacy Token is NOT a stopword.
Parameters¶
- tokenspacy.tokens.Token
The Spacy token to test.
Returns¶
- bool
Whether the Token Shape is NOT a stopword or it is.
- olaf.commons.spacy_processing_tools.is_not_url(token: Token) bool [source]¶
Return True if the Spacy Token is NOT a url.
Parameters¶
- tokenspacy.tokens.Token
The Spacy token to test.
Returns¶
- bool
Whether the Token Shape is NOT a url or it is.
- olaf.commons.spacy_processing_tools.select_on_pos(token: Token, pos_to_select: List[str]) bool [source]¶
Return true if the Spacy Token POS string is in the pos_to_select list.
Parameters¶
- tokenspacy.tokens.Token
The Spacy token to test
- pos_to_selectList[str]
The list of strings corresponding to the POS tags to keep.
Returns¶
- bool
Whether the Token POS tag is in pos_to_select or not
- olaf.commons.spacy_processing_tools.spacy_span_ngrams(span: Span, gram_size: int) List[Span] [source]¶
Adapt the NTLK ngrams function to work with spaCy Span objects.
Parameters¶
- spanspacy.tokens.span.Span
The spaCy Span object to extract the ngrams from.
- gram_sizeint
The gram size.
Returns¶
- List[spacy.tokens.span.Span]
The list of ngrams as spaCy Span objects.
- olaf.commons.spacy_processing_tools.spans_overlap(span1: Span, span2: Span) bool [source]¶
Return true is the spans are overlapping, else False.
Parameters¶
- span1spacy.tokens.Span
The first spaCy span.
- span2spacy.tokens.Span
The second spaCy span.
Returns¶
- bool
Whether or not the spans are overlapping.
olaf.commons.string_tools module¶
olaf.commons.wordnet_tools module¶
- olaf.commons.wordnet_tools.fetch_wordnet_lang(lang: str) str [source]¶
- Tool function to map a Spacy language tag to the corresponding WordNet one.
Return None if no mapping is found. Adapted from project <https://github.com/argilla-io/spacy-wordnet>.
Parameters¶
- langstr
The spaCy language tag.
Returns¶
- str
The WordNet language tag.
Raises¶
- Exception
An exception to spot a language not existing.
- olaf.commons.wordnet_tools.load_enrichment_wordnet_domains_from_file(enrichment_domains_path: str) Set[str] [source]¶
- Load a set of domains (strings) from a file.
The file is expected to contain one domain string per line.
Parameters¶
- enrichment_domains_pathstr
The full or relative path to the file containing wordnet domains to use for enrichment.
Returns¶
- Set[str]
The set of domains.
- olaf.commons.wordnet_tools.load_wordnet_domains(wordnet_domains_path: str) Dict[str, Set[str]] [source]¶
Load the mapping of WordNet Synsets to domains from a file. The file should have the structure: synset_code domain1 domain2. Function inspired from project <https://github.com/argilla-io/spacy-wordnet>
Parameters¶
- wordnet_domains_pathstr
The full or relative path to wordnet domains synsets mapping file.
Returns¶
- Dict[str, List[str]]
The mapping of WordNet Synsets to domains.
- olaf.commons.wordnet_tools.spacy2wordnet_pos(spacy_pos: str) str | None [source]¶
- Tool function to map a spaCy POS tag to the corresponding WordNet one.
Return None if no mapping is found. Adapted from project <https://github.com/argilla-io/spacy-wordnet>.
Parameters¶
- spacy_posstr
The spaCy POS tag.
Returns¶
- str, optional
The WordNet POS tag.