olaf.pipeline package

Subpackages

Submodules

olaf.pipeline.pipeline_schema module

class olaf.pipeline.pipeline_schema.Pipeline(spacy_model: Language, pipeline_components: List[PipelineComponent] | None = None, preprocessing_components: List[DataPreprocessing] | None = None, corpus_loader: CorpusLoader | None = None, corpus: List[Doc] | None = None, seed_kr: KnowledgeRepresentation | None = None)[source]

Bases: object

A Pipeline is the library main class. It orchestrates the pipeline starting from raw texts to build the final knowledge representation.

The corpus loader is responsible for the conversion for raw text to spacy document. We separate data preprocessing to explicitly enable pipelines without preprocessing.

Parameters

spacy_model: spacy.language.Language

The spacy model used to represent text corpus.

pipeline_components: List[PipelineComponent]

The ontology learning pipeline components that build the knowledge representation from the corpus.

preprocessing_components: List[DataPreprocessing]

The pipeline components specific to preprocessing.

corpus_loader: CorpusLoader

The component that loads the text corpus in the format used by the framework, i.e., a List[spacy.tokens.doc.Doc].

corpus: List[spacy.tokens.doc.Doc]

The preprocessed corpus the knowledge representation is built from.

kr: KnowledgeRepresentation

The knowledge extracted from the corpus.

candidate_terms: Set[CandidateTerms]

The candidate terms extracted and processed to create concept and relations.

add_pipeline_component(pipeline_component: PipelineComponent) None[source]

Add a component to the pipeline.

Parameters

pipeline_componentPipelineComponent

The pipeline component to add.

add_preprocessing_component(preprocessing_component: DataPreprocessing) None[source]

Add a preprocessing component to the pipeline.

Parameters

preprocessing_componentDataPreprocessing

The preprocessing pipeline component to add.

build() None[source]

Effectively build the pipeline, making the instance runnable. This method check each components and the constrained order.

remove_pipeline_component(pipeline_component: PipelineComponent) None[source]

Remove a component from the pipeline.

Parameters

pipeline_componentPipelineComponent

The pipeline component to remove.

remove_preprocessing_component(preprocessing_component: DataPreprocessing) None[source]

Remove a preprocessing component from the pipeline.

Parameters

preprocessing_componentDataPreprocessing

The preprocessing pipeline component to remove.

run() None[source]

Run the pipeline. The method hence run each pipeline components in the determined order filling the Knowledge Representation.

Module contents