olaf.algorithm package¶

Submodules¶

olaf.algorithm.agglomerative_clustering module¶

class olaf.algorithm.agglomerative_clustering.AgglomerativeClustering(training_instances: List[Any], nb_clusters: int | None = 2, metric: str | None = 'cosine', linkage: str | None = 'average', distance_threshold: float | None = None)[source]¶

Bases: object

Implementation of agglomerative clustering algorithm.

property clustering_labels: List[int]¶

Getter to return the labels found for each training instance.

Returns¶

List[int]: List of cluster labels found for each training instance.

compute_agglomerative_clustering() → None[source]¶: Method used to compute the agglomerative clustering on the training instances.

olaf.algorithm.c_value module¶

class olaf.algorithm.c_value.Cvalue(corpus_terms: List[str], max_term_token_length: int | None = None, stop_list: Set[str] | None = {}, c_value_threshold: float | None = 0.0)[source]¶

Bases: object

A class to extract terms form a list of strings and compute the C-values of those.: The C-values are computed based on <https://doi.org/10.1007/s007999900023>.

Attributes¶

corpus_terms: List[str]: The list of strings extracted from the corpus to extract the terms from. The strings should be space-tokenised.
max_term_token_length: int: The maximum number of tokens a term can have.
stop_list: Set[str]: A set of stop words that should not appear in a term.
_terms_string_tokens: Tuple[Tuple[str]]: Tuple of terms string tokens to compute the C-values with.
c_value_threshold: float: A threshold to decide wether or not a term should be added to the candidate terms.
candidate_terms: Tuple[str]: The tuple of selected candidate terms.
_terms_counter: Counter: A mapping of terms to their occurrences in the corpus.
_term_stat_triples: Dict[str, Tuple[int, int, int]]: Tuple of term occurrences values used for computing C-values.
c_values: Tuple[Tuple[float, str]]: An ordered tuple of candidate terms with their C-values.

property c_values: Tuple[Tuple[float, str]]¶

Getter for the c_values attribute.: Should log a warning in case the user forgot to run the C-value computation.

Returns¶

Tuple[Tuple[float, str]]: The tuple of terms and their C-values.

property candidate_terms: Tuple[str]¶

Getter for the candidate_terms attribute.: Should log a warning in case the user forgot to run the C-value computation.

Returns¶

Tuple[str]: The tuple of candidate terms.

compute_c_values() → None[source]¶

Compute the C-value scores.

The method sets the following attributes: - self._c_values - self._candidate_terms

property max_term_token_length: int¶

Getter for the max token length of terms.

Returns¶

int: The terms max token length.

olaf.algorithm package¶

Submodules¶

olaf.algorithm.agglomerative_clustering module¶

Returns¶

olaf.algorithm.c_value module¶

Attributes¶

Returns¶

Returns¶

Returns¶

Module contents¶