olaf.algorithm package¶
Submodules¶
olaf.algorithm.agglomerative_clustering module¶
- class olaf.algorithm.agglomerative_clustering.AgglomerativeClustering(training_instances: List[Any], nb_clusters: int | None = 2, metric: str | None = 'cosine', linkage: str | None = 'average', distance_threshold: float | None = None)[source]¶
Bases:
object
Implementation of agglomerative clustering algorithm.
olaf.algorithm.c_value module¶
- class olaf.algorithm.c_value.Cvalue(corpus_terms: List[str], max_term_token_length: int | None = None, stop_list: Set[str] | None = {}, c_value_threshold: float | None = 0.0)[source]¶
Bases:
object
- A class to extract terms form a list of strings and compute the C-values of those.
The C-values are computed based on <https://doi.org/10.1007/s007999900023>.
Attributes¶
- corpus_terms: List[str]
The list of strings extracted from the corpus to extract the terms from. The strings should be space-tokenised.
- max_term_token_length: int
The maximum number of tokens a term can have.
- stop_list: Set[str]
A set of stop words that should not appear in a term.
- _terms_string_tokens: Tuple[Tuple[str]]
Tuple of terms string tokens to compute the C-values with.
- c_value_threshold: float
A threshold to decide wether or not a term should be added to the candidate terms.
- candidate_terms: Tuple[str]
The tuple of selected candidate terms.
- _terms_counter: Counter
A mapping of terms to their occurrences in the corpus.
- _term_stat_triples: Dict[str, Tuple[int, int, int]]
Tuple of term occurrences values used for computing C-values.
- c_values: Tuple[Tuple[float, str]]
An ordered tuple of candidate terms with their C-values.
- property c_values: Tuple[Tuple[float, str]]¶
- Getter for the c_values attribute.
Should log a warning in case the user forgot to run the C-value computation.
Returns¶
- Tuple[Tuple[float, str]]
The tuple of terms and their C-values.
- property candidate_terms: Tuple[str]¶
- Getter for the candidate_terms attribute.
Should log a warning in case the user forgot to run the C-value computation.
Returns¶
- Tuple[str]
The tuple of candidate terms.