olaf.data_container package

Submodules

olaf.data_container.candidate_term_schema module

class olaf.data_container.candidate_term_schema.CandidateRelation(label: str, corpus_occurrences: Set[Tuple[Span]], source_concept: Concept | None = None, destination_concept: Concept | None = None, enrichment: Enrichment | None = None)[source]

Bases: CandidateTerm

Candidate relations are created from candidate terms by the ct_to_cr function. They represent the words of interest for relation in the corpus. They are used by the relation extraction methods and transformed into linguistic realisations.

add_corpus_occurrence(new_corpus_occurrence: Tuple[Span]) None[source]

Add new corpus occurrence for the candidate relation.

Parameters

new_corpus_occurrenceTuple[spacy.tokens.Span]

New corpus occurrence to add for the candidate relation.

add_corpus_occurrences(new_corpus_occurrences: Set[Tuple[Span]]) None[source]

Add new corpus occurrences for the candidate relations.

Parameters

new_corpus_occurrencesSet[Tuple[spacy.tokensSpan]]

New corpus occurrences to add for the candidate relation.

class olaf.data_container.candidate_term_schema.CandidateTerm(label: str, corpus_occurrences: Set[Span], enrichment: Enrichment | None = None)[source]

Bases: object

Candidate terms are created by the term extraction methods. They represent the words of interest in the corpus. They are used by the concept and relation extraction methods and transformed into linguistic realisations.

add_corpus_occurrences(new_corpus_occurrences: Set[Span]) None[source]

Add new corpus occurrences for the candidate terms.

Parameters

new_corpus_occurrencesSet[spacy.tokensSpan]

New corpus occurrences to add for the candidate term.

olaf.data_container.concept_schema module

class olaf.data_container.concept_schema.Concept(label: str, external_uids: Set[str] | None = None, linguistic_realisations: Set[LinguisticRealisation] | None = None)[source]

Bases: DataContainer

Concept is a particular DataContainer. It denotes a thing in the conceptual word and it is created by concept extraction processes.

Parameters

uidstr

The concept unique identifier.

external_uidsSet[str], optional

External unique identifiers found for the concept, by default None.

labelstr

The concept human readable label.

linguistic_realisationsSet[LinguisticRealisation]

The concept linguistic realisations, i.e. instances of the concept in the text corpus, by default None.

add_linguistic_realisation(linguistic_realisation: LinguisticRealisation) None[source]

Add a new linguistic realisation to the concept.

Parameters

linguistic_realisationLinguisticRealisation

The linguistic realisation instance to add.

remove_linguistic_realisation(linguistic_realisation: LinguisticRealisation) None[source]

Delete a linguistic realisation of the concept.

Parameters

linguistic_realisationLinguisticRealisation

The linguistic realisation instance to remove.

olaf.data_container.data_container_schema module

class olaf.data_container.data_container_schema.DataContainer(label: str | None = None, external_uids: Set[str] | None = None, linguistic_realisations: Set[LinguisticRealisation] | None = None)[source]

Bases: ABC

Data structure for Concept, Relation and Metarelation.

Parameters

uidstr

The unique identifier for the container.

external_uidsSet[str], optional

External unique identifiers found for the data container.

labelstr

The human readable label for the data container.

linguistic_realisationsSet[LinguisticRealisation]

Instances or realisations of a concept or a relation in the text corpus.

abstract add_linguistic_realisation(linguistic_realisation: LinguisticRealisation) None[source]

Add a new linguistic realisation to the data container.

Parameters

linguistic_realisationLinguisticRealisation

The linguistic realisation instance to add.

abstract remove_linguistic_realisation(linguistic_realisation: LinguisticRealisation) None[source]

Delete a linguistic realisation of the data container.

Parameters

linguistic_realisationLinguisticRealisation

The linguistic realisation instance to remove.

olaf.data_container.enrichment_schema module

class olaf.data_container.enrichment_schema.Enrichment(synonyms: ~typing.Set[str] = <factory>, hypernyms: ~typing.Set[str] = <factory>, hyponyms: ~typing.Set[str] = <factory>, antonyms: ~typing.Set[str] = <factory>)[source]

Bases: object

A dataclass to contain any information enriching a candidate term.

Instances are typically created by the candidate term enrichment processes. By default we define at minimum synonyms but we can extend the class with any possible useful information, e.g., antonyms, or hypernyms.

Note that the definition given to synonyms here largely depends on the knowledge representation application.It could be strict synonyms or closely related terms with regards to a specific context domain.

Parameters

synonyms: Set[str]

The set of synonyms. Empty set by default if it is initialised without terms.

hypernyms: Set[str]

The set of hypernyms. Empty set by default if it is initialised without terms.

hyponyms: Set[str]

The set of hyponyms. Empty set by default if it is initialised without terms.

antonyms: Set[str]

The set of antonyms. Empty set by default if it is initialised without terms.

add_antonyms(antonyms: Set[str]) None[source]

Add new antonyms for the enrichment.

Parameters

antonymsSet[str]

New antonyms to add on the enrichment.

add_hypernyms(hypernyms: Set[str]) None[source]

Add new hypernyms for the enrichment.

Parameters

hypernymsSet[str]

New hypernyms to add on the enrichment.

add_hyponyms(hyponyms: Set[str]) None[source]

Add new hyponyms for the enrichment.

Parameters

hyponymsSet[str]

New hyponyms to add on the enrichment.

add_synonyms(synonyms: Set[str]) None[source]

Add new synonyms for the enrichment.

Parameters

synonymsSet[str]

New synonyms to add on the enrichment.

antonyms: Set[str]
hypernyms: Set[str]
hyponyms: Set[str]
merge_with_enrichment(enrichment_to_integrate) None[source]

Merge the enrichment into another one. The enrichment is updated in place with the enrichment provided.

Parameters

enrichment_to_integrateEnrichment

The enrichment to merge the current one with.

synonyms: Set[str]

olaf.data_container.knowledge_representation_schema module

class olaf.data_container.knowledge_representation_schema.KnowledgeRepresentation(concepts: ~typing.Set[~olaf.data_container.concept_schema.Concept] = <factory>, relations: ~typing.Set[~olaf.data_container.relation_schema.Relation] = <factory>, metarelations: ~typing.Set[~olaf.data_container.metarelation_schema.Metarelation] = <factory>, rdf_graph: ~rdflib.graph.Graph = <Graph identifier=N5abb8b63a7ab40d6b14c607a849ac37b (<class 'rdflib.graph.Graph'>)>)[source]

Bases: object

The knowledge representation is the data structure that contains the information learned from the text corpus. It is composed of concepts, relations and metarelations.

Attributes

conceptsSet[Concept]

The set of concepts under interest. Empty set by default if it is initialised without concept.

relationsSet[Relation]

The set of relations under interest. Empty set by default if it is initialised without relation.

metarelationsSet[Metarelation]

The set of metarelations under interest. Empty set by default if it is initialised without metarelation.

rdf_graph: Graph

An RDF graph corresponding to the knowledge representation. Default to an empty graph.

concepts: Set[Concept]
metarelations: Set[Metarelation]
rdf_graph: Graph = <Graph identifier=N5abb8b63a7ab40d6b14c607a849ac37b (<class 'rdflib.graph.Graph'>)>
relations: Set[Relation]

olaf.data_container.linguistic_realisation_schema module

class olaf.data_container.linguistic_realisation_schema.ConceptLR(label: str, corpus_occurrences: Set[Span] | None = None)[source]

Bases: LinguisticRealisation

Linguistic Realisation specific to a concept. Corpus occurrences are represented as single span.

Parameters

labelstr

The linguistic realisation human readable label. The string should appear in one of the corpus occurrences.

corpus_occurrencesSet[spacy.tokens.Span]

The concept linguistic realisation occurrences in the corpus which is a span.

add_corpus_occurrences(new_corpus_occurrences: Set[Span]) None[source]

Add new corpus occurrences for the linguistic realisation.

Parameters

new_corpus_occurrencesSet[spacy.tokens.Span]

New corpus occurrences to add for the linguistic realisation.

class olaf.data_container.linguistic_realisation_schema.LinguisticRealisation(label: str, corpus_occurrences: Set[Any] | None = None)[source]

Bases: ABC

We distinguish between concept, relation and metarelation and their representations in text. The text denoting a concept, relation or metarelation is referred to as a linguistic realisation. The LinguisticRealisation class define a string (label) which is the text denoting the concept, relation or meta relation and keep an index of all occurrences in the corpus, i.e. corpus_occurrences.

Parameters

labelstr

The linguistic realisation human readable label. The string should appear in one of the corpus occurrences or be a metarelation type.

corpus_occurrencesAny

The LinguisticRealisation occurrences in the corpus.

add_corpus_occurrences(new_corpus_occurrences: Set[Any]) None[source]

Add new corpus occurrences for the linguistic realisation.

Parameters

new_corpus_occurrencesSet[Any]

New corpus occurrences to add for the linguistic realisation.

get_docs() Set[Doc][source]

Fetch all the corpus documents contained in the corpus occurrences.

Returns

Set[spacy.tokens.doc.Doc]

The set of corpus documents contained in the corpus occurrences.

class olaf.data_container.linguistic_realisation_schema.MetaRelationLR(label: str, corpus_occurrences: Set[Tuple[Span, Span]] | None = None)[source]

Bases: LinguisticRealisation

Linguistic Realisation specific to a metarelation. Corpus occurrences are represented as tuple of two spans. One for the source concept at position 0. One for the destination concept at position 1.

Parameters

labelstr

The linguistic realisation human readable label. The string should be a metarelation type.

corpus_occurrencesSet[Tuple[spacy.tokens.Span, spacy.tokens.Span, spacy.tokens.Span]]

The metarelation linguistic realisation occurrences in the corpus which is tuple of two spans.

add_corpus_occurrences(new_corpus_occurrences: Set[Tuple[Span, Span]]) None[source]

Add new corpus occurrences for the linguistic realisation.

Parameters

new_corpus_occurrencesSet[Tuple[spacy.tokens.Span, spacy.tokens.Span]]

New corpus occurrences to add for the linguistic realisation.

class olaf.data_container.linguistic_realisation_schema.RelationLR(label: str, corpus_occurrences: Set[Tuple[Span, Span, Span]] | None = None)[source]

Bases: LinguisticRealisation

Linguistic Realisation specific to a relation. Corpus occurrences are represented as tuple of three spans. One for the source concept at position 0. One for the relation label at position 1. One for the destination concept at position 2.

Parameters

labelstr

The linguistic realisation human readable label. The string should appear in one of the corpus occurrences.

corpus_occurrencesSet[Tuple[spacy.tokens.Span, spacy.tokens.Span, spacy.tokens.Span]]

The relation linguistic realisation occurrences in the corpus which is tuple of three spans.

add_corpus_occurrences(new_corpus_occurrences: Set[Tuple[Span, Span, Span]]) None[source]

Add new corpus occurrences for the linguistic realisation.

Parameters

new_corpus_occurrencesSet[Tuple[spacy.tokens.Span, spacy.tokens.Span, spacy.tokens.Span]]

New corpus occurrences to add for the linguistic realisation.

olaf.data_container.metarelation_schema module

class olaf.data_container.metarelation_schema.Metarelation(source_concept: Concept, destination_concept: Concept, label: str, external_uids: Set[str] | None = None, linguistic_realisations: Set[LinguisticRealisation] | None = None)[source]

Bases: Relation

We distinguish between Relations and Metarelations. A Metarelation is a link between concepts that is not an action or a state, it is implicitly expressed thanks to syntax or particular formulations. Metarelations can correspond (but are not restricted to) to the common taxonomic hierarchical relations in broad sense, i.e., including generic-specific, instance, and whole-part. It is a Relation and as such is oriented and its label is its type. The metarelation is defined by its triple (source, label, destination).

Parameters

uidstr

The metarelation unique identifier.

source_conceptConcept

The source concept in the metarelation triple.

destination_conceptConcept

The destination concept in the metarelation triple.

labelMETARELATION_TYPE

The metarelation type.

external_uidsSet[str], optional

An external unique identifier for the metarelation, by default None.

linguistic_realisationsSet[LinguisticRealisation], optional

The metarelation linguistic realisations, i.e. instances of the metarelation in the text corpus, by default None.

olaf.data_container.relation_schema module

class olaf.data_container.relation_schema.Relation(label: str, source_concept: Concept | None = None, destination_concept: Concept | None = None, external_uids: Set[str] | None = None, linguistic_realisations: Set[LinguisticRealisation] | None = None)[source]

Bases: DataContainer

A Relation is the explicit link between two concepts which is the result of an action or a state and which could itself be a concept. It is oriented and has a label. The relation is defined by its triple (source, label, destination).

Parameters

labelstr

The relation human readable label

source_conceptConcept, optional

The source concept in the relation triple, by default None.

destination_conceptConcept, optional

The destination concept in the relation triple, by default None.

external_uidsSet[str], optional

External unique identifiers found for the relation, by default None.

linguistic_realisationsSet[LinguisticRealisation], optional

The relation linguistic realisations, i.e. instances of the relation in the text corpus, by default None.

add_linguistic_realisation(linguistic_realisation: LinguisticRealisation) None[source]

Add a new linguistic realisation to the relation.

Parameters

linguistic_realisationLinguisticRealisation

The linguistic realisation instance to add.

add_linguistic_realisations(linguistic_realisations: Set[LinguisticRealisation]) None[source]

Add new linguistic realisations to the relation.

Parameters

linguistic_realisationsSet[LinguisticRealisation]

The set of linguistic realisation instances to add.

remove_linguistic_realisation(linguistic_realisation: LinguisticRealisation) None[source]

Delete a linguistic realisation of the relation.

Parameters

linguistic_realisationLinguisticRealisation

The LinguisticRealisation instance to remove.

Module contents