olaf.pipeline.data_preprocessing package¶
Submodules¶
olaf.pipeline.data_preprocessing.data_preprocessing_schema module¶
olaf.pipeline.data_preprocessing.token_selector_data_preprocessing module¶
- class olaf.pipeline.data_preprocessing.token_selector_data_preprocessing.TokenSelectorDataPreprocessing(selector: Callable[[Token], bool], token_sequence_doc_attribute: str | None = None)[source]¶
Bases:
DataPreprocessing
Preprocess data with token selector method.
Attributes¶
- corpus: spacy.tokens.Doc
spaCy corpus to process.
- token_selector: Callable[[spacy.tokens.Token], bool]
Callable function that implements the token selection criterion.
- token_sequence_doc_attribute: str, Optional
Name of the spaCy doc attribute containing the selected tokens, by default “selected_tokens”.