olaf.data_container package¶
Submodules¶
olaf.data_container.candidate_term_schema module¶
- class olaf.data_container.candidate_term_schema.CandidateRelation(label: str, corpus_occurrences: Set[Tuple[Span]], source_concept: Concept | None = None, destination_concept: Concept | None = None, enrichment: Enrichment | None = None)[source]¶
Bases:
CandidateTerm
Candidate relations are created from candidate terms by the ct_to_cr function. They represent the words of interest for relation in the corpus. They are used by the relation extraction methods and transformed into linguistic realisations.
- class olaf.data_container.candidate_term_schema.CandidateTerm(label: str, corpus_occurrences: Set[Span], enrichment: Enrichment | None = None)[source]¶
Bases:
object
Candidate terms are created by the term extraction methods. They represent the words of interest in the corpus. They are used by the concept and relation extraction methods and transformed into linguistic realisations.
olaf.data_container.concept_schema module¶
- class olaf.data_container.concept_schema.Concept(label: str, external_uids: Set[str] | None = None, linguistic_realisations: Set[LinguisticRealisation] | None = None)[source]¶
Bases:
DataContainer
Concept is a particular DataContainer. It denotes a thing in the conceptual word and it is created by concept extraction processes.
Parameters¶
- uidstr
The concept unique identifier.
- external_uidsSet[str], optional
External unique identifiers found for the concept, by default None.
- labelstr
The concept human readable label.
- linguistic_realisationsSet[LinguisticRealisation]
The concept linguistic realisations, i.e. instances of the concept in the text corpus, by default None.
- add_linguistic_realisation(linguistic_realisation: LinguisticRealisation) None [source]¶
Add a new linguistic realisation to the concept.
Parameters¶
- linguistic_realisationLinguisticRealisation
The linguistic realisation instance to add.
- remove_linguistic_realisation(linguistic_realisation: LinguisticRealisation) None [source]¶
Delete a linguistic realisation of the concept.
Parameters¶
- linguistic_realisationLinguisticRealisation
The linguistic realisation instance to remove.
olaf.data_container.data_container_schema module¶
- class olaf.data_container.data_container_schema.DataContainer(label: str | None = None, external_uids: Set[str] | None = None, linguistic_realisations: Set[LinguisticRealisation] | None = None)[source]¶
Bases:
ABC
Data structure for Concept, Relation and Metarelation.
Parameters¶
- uidstr
The unique identifier for the container.
- external_uidsSet[str], optional
External unique identifiers found for the data container.
- labelstr
The human readable label for the data container.
- linguistic_realisationsSet[LinguisticRealisation]
Instances or realisations of a concept or a relation in the text corpus.
- abstract add_linguistic_realisation(linguistic_realisation: LinguisticRealisation) None [source]¶
Add a new linguistic realisation to the data container.
Parameters¶
- linguistic_realisationLinguisticRealisation
The linguistic realisation instance to add.
- abstract remove_linguistic_realisation(linguistic_realisation: LinguisticRealisation) None [source]¶
Delete a linguistic realisation of the data container.
Parameters¶
- linguistic_realisationLinguisticRealisation
The linguistic realisation instance to remove.
olaf.data_container.enrichment_schema module¶
- class olaf.data_container.enrichment_schema.Enrichment(synonyms: ~typing.Set[str] = <factory>, hypernyms: ~typing.Set[str] = <factory>, hyponyms: ~typing.Set[str] = <factory>, antonyms: ~typing.Set[str] = <factory>)[source]¶
Bases:
object
- A dataclass to contain any information enriching a candidate term.
Instances are typically created by the candidate term enrichment processes. By default we define at minimum synonyms but we can extend the class with any possible useful information, e.g., antonyms, or hypernyms.
Note that the definition given to synonyms here largely depends on the knowledge representation application.It could be strict synonyms or closely related terms with regards to a specific context domain.
Parameters¶
- synonyms: Set[str]
The set of synonyms. Empty set by default if it is initialised without terms.
- hypernyms: Set[str]
The set of hypernyms. Empty set by default if it is initialised without terms.
- hyponyms: Set[str]
The set of hyponyms. Empty set by default if it is initialised without terms.
- antonyms: Set[str]
The set of antonyms. Empty set by default if it is initialised without terms.
- add_antonyms(antonyms: Set[str]) None [source]¶
Add new antonyms for the enrichment.
Parameters¶
- antonymsSet[str]
New antonyms to add on the enrichment.
- add_hypernyms(hypernyms: Set[str]) None [source]¶
Add new hypernyms for the enrichment.
Parameters¶
- hypernymsSet[str]
New hypernyms to add on the enrichment.
- add_hyponyms(hyponyms: Set[str]) None [source]¶
Add new hyponyms for the enrichment.
Parameters¶
- hyponymsSet[str]
New hyponyms to add on the enrichment.
- add_synonyms(synonyms: Set[str]) None [source]¶
Add new synonyms for the enrichment.
Parameters¶
- synonymsSet[str]
New synonyms to add on the enrichment.
- antonyms: Set[str]¶
- hypernyms: Set[str]¶
- hyponyms: Set[str]¶
- merge_with_enrichment(enrichment_to_integrate) None [source]¶
Merge the enrichment into another one. The enrichment is updated in place with the enrichment provided.
Parameters¶
- enrichment_to_integrateEnrichment
The enrichment to merge the current one with.
- synonyms: Set[str]¶
olaf.data_container.knowledge_representation_schema module¶
- class olaf.data_container.knowledge_representation_schema.KnowledgeRepresentation(concepts: ~typing.Set[~olaf.data_container.concept_schema.Concept] = <factory>, relations: ~typing.Set[~olaf.data_container.relation_schema.Relation] = <factory>, metarelations: ~typing.Set[~olaf.data_container.metarelation_schema.Metarelation] = <factory>, rdf_graph: ~rdflib.graph.Graph = <Graph identifier=N5abb8b63a7ab40d6b14c607a849ac37b (<class 'rdflib.graph.Graph'>)>)[source]¶
Bases:
object
The knowledge representation is the data structure that contains the information learned from the text corpus. It is composed of concepts, relations and metarelations.
Attributes¶
- conceptsSet[Concept]
The set of concepts under interest. Empty set by default if it is initialised without concept.
- relationsSet[Relation]
The set of relations under interest. Empty set by default if it is initialised without relation.
- metarelationsSet[Metarelation]
The set of metarelations under interest. Empty set by default if it is initialised without metarelation.
- rdf_graph: Graph
An RDF graph corresponding to the knowledge representation. Default to an empty graph.
- metarelations: Set[Metarelation]¶
- rdf_graph: Graph = <Graph identifier=N5abb8b63a7ab40d6b14c607a849ac37b (<class 'rdflib.graph.Graph'>)>¶
olaf.data_container.linguistic_realisation_schema module¶
- class olaf.data_container.linguistic_realisation_schema.ConceptLR(label: str, corpus_occurrences: Set[Span] | None = None)[source]¶
Bases:
LinguisticRealisation
Linguistic Realisation specific to a concept. Corpus occurrences are represented as single span.
Parameters¶
- labelstr
The linguistic realisation human readable label. The string should appear in one of the corpus occurrences.
- corpus_occurrencesSet[spacy.tokens.Span]
The concept linguistic realisation occurrences in the corpus which is a span.
- class olaf.data_container.linguistic_realisation_schema.LinguisticRealisation(label: str, corpus_occurrences: Set[Any] | None = None)[source]¶
Bases:
ABC
We distinguish between concept, relation and metarelation and their representations in text. The text denoting a concept, relation or metarelation is referred to as a linguistic realisation. The LinguisticRealisation class define a string (label) which is the text denoting the concept, relation or meta relation and keep an index of all occurrences in the corpus, i.e. corpus_occurrences.
Parameters¶
- labelstr
The linguistic realisation human readable label. The string should appear in one of the corpus occurrences or be a metarelation type.
- corpus_occurrencesAny
The LinguisticRealisation occurrences in the corpus.
- class olaf.data_container.linguistic_realisation_schema.MetaRelationLR(label: str, corpus_occurrences: Set[Tuple[Span, Span]] | None = None)[source]¶
Bases:
LinguisticRealisation
Linguistic Realisation specific to a metarelation. Corpus occurrences are represented as tuple of two spans. One for the source concept at position 0. One for the destination concept at position 1.
Parameters¶
- labelstr
The linguistic realisation human readable label. The string should be a metarelation type.
- corpus_occurrencesSet[Tuple[spacy.tokens.Span, spacy.tokens.Span, spacy.tokens.Span]]
The metarelation linguistic realisation occurrences in the corpus which is tuple of two spans.
- class olaf.data_container.linguistic_realisation_schema.RelationLR(label: str, corpus_occurrences: Set[Tuple[Span, Span, Span]] | None = None)[source]¶
Bases:
LinguisticRealisation
Linguistic Realisation specific to a relation. Corpus occurrences are represented as tuple of three spans. One for the source concept at position 0. One for the relation label at position 1. One for the destination concept at position 2.
Parameters¶
- labelstr
The linguistic realisation human readable label. The string should appear in one of the corpus occurrences.
- corpus_occurrencesSet[Tuple[spacy.tokens.Span, spacy.tokens.Span, spacy.tokens.Span]]
The relation linguistic realisation occurrences in the corpus which is tuple of three spans.
- add_corpus_occurrences(new_corpus_occurrences: Set[Tuple[Span, Span, Span]]) None [source]¶
Add new corpus occurrences for the linguistic realisation.
Parameters¶
- new_corpus_occurrencesSet[Tuple[spacy.tokens.Span, spacy.tokens.Span, spacy.tokens.Span]]
New corpus occurrences to add for the linguistic realisation.
olaf.data_container.metarelation_schema module¶
- class olaf.data_container.metarelation_schema.Metarelation(source_concept: Concept, destination_concept: Concept, label: str, external_uids: Set[str] | None = None, linguistic_realisations: Set[LinguisticRealisation] | None = None)[source]¶
Bases:
Relation
We distinguish between Relations and Metarelations. A Metarelation is a link between concepts that is not an action or a state, it is implicitly expressed thanks to syntax or particular formulations. Metarelations can correspond (but are not restricted to) to the common taxonomic hierarchical relations in broad sense, i.e., including generic-specific, instance, and whole-part. It is a Relation and as such is oriented and its label is its type. The metarelation is defined by its triple (source, label, destination).
Parameters¶
- uidstr
The metarelation unique identifier.
- source_conceptConcept
The source concept in the metarelation triple.
- destination_conceptConcept
The destination concept in the metarelation triple.
- labelMETARELATION_TYPE
The metarelation type.
- external_uidsSet[str], optional
An external unique identifier for the metarelation, by default None.
- linguistic_realisationsSet[LinguisticRealisation], optional
The metarelation linguistic realisations, i.e. instances of the metarelation in the text corpus, by default None.
olaf.data_container.relation_schema module¶
- class olaf.data_container.relation_schema.Relation(label: str, source_concept: Concept | None = None, destination_concept: Concept | None = None, external_uids: Set[str] | None = None, linguistic_realisations: Set[LinguisticRealisation] | None = None)[source]¶
Bases:
DataContainer
A Relation is the explicit link between two concepts which is the result of an action or a state and which could itself be a concept. It is oriented and has a label. The relation is defined by its triple (source, label, destination).
Parameters¶
- labelstr
The relation human readable label
- source_conceptConcept, optional
The source concept in the relation triple, by default None.
- destination_conceptConcept, optional
The destination concept in the relation triple, by default None.
- external_uidsSet[str], optional
External unique identifiers found for the relation, by default None.
- linguistic_realisationsSet[LinguisticRealisation], optional
The relation linguistic realisations, i.e. instances of the relation in the text corpus, by default None.
- add_linguistic_realisation(linguistic_realisation: LinguisticRealisation) None [source]¶
Add a new linguistic realisation to the relation.
Parameters¶
- linguistic_realisationLinguisticRealisation
The linguistic realisation instance to add.
- add_linguistic_realisations(linguistic_realisations: Set[LinguisticRealisation]) None [source]¶
Add new linguistic realisations to the relation.
Parameters¶
- linguistic_realisationsSet[LinguisticRealisation]
The set of linguistic realisation instances to add.
- remove_linguistic_realisation(linguistic_realisation: LinguisticRealisation) None [source]¶
Delete a linguistic realisation of the relation.
Parameters¶
- linguistic_realisationLinguisticRealisation
The LinguisticRealisation instance to remove.