olaf.repository.corpus_loader package

Submodules

olaf.repository.corpus_loader.corpus_loader_schema module

class olaf.repository.corpus_loader.corpus_loader_schema.CorpusLoader(corpus_path: str)[source]

Bases: ABC

Component to load a text corpus and encode it into spacy representation.

Parameters

corpus_pathstr

Path of the text corpus to use.

olaf.repository.corpus_loader.csv_corpus_loader module

class olaf.repository.corpus_loader.csv_corpus_loader.CsvCorpusLoader(corpus_path: str, column_name: str)[source]

Bases: CorpusLoader

Corpus loader for csv file.

Parameters

corpus_pathstr

Path of the text corpus to use.

column_namestr

Name of the column to use in the csv file.

olaf.repository.corpus_loader.json_corpus_loader module

class olaf.repository.corpus_loader.json_corpus_loader.JsonCorpusLoader(corpus_path: str, json_field: str)[source]

Bases: CorpusLoader

Corpus loader for json files in a same folder.

Parameters

corpus_pathstr

Path of the text corpus to use.

json_fieldstr

Name of the field to use in json files.

olaf.repository.corpus_loader.text_corpus_loader module

class olaf.repository.corpus_loader.text_corpus_loader.TextCorpusLoader(corpus_path: str)[source]

Bases: CorpusLoader

Corpus loader for text files in a same folder.

If the corpus path is a folder, each text file in the folder is considered one document. If the corpus path is a text file, each line in the text file is considered one document.

Parameters

corpus_pathstr

Path of the text corpus to use. It can be a folder or a file.

Module contents