olaf.algorithm package

Submodules

olaf.algorithm.agglomerative_clustering module

class olaf.algorithm.agglomerative_clustering.AgglomerativeClustering(training_instances: List[Any], nb_clusters: int | None = 2, metric: str | None = 'cosine', linkage: str | None = 'average', distance_threshold: float | None = None)[source]

Bases: object

Implementation of agglomerative clustering algorithm.

property clustering_labels: List[int]

Getter to return the labels found for each training instance.

Returns

List[int]

List of cluster labels found for each training instance.

compute_agglomerative_clustering() None[source]

Method used to compute the agglomerative clustering on the training instances.

olaf.algorithm.c_value module

class olaf.algorithm.c_value.Cvalue(corpus_terms: List[str], max_term_token_length: int | None = None, stop_list: Set[str] | None = {}, c_value_threshold: float | None = 0.0)[source]

Bases: object

A class to extract terms form a list of strings and compute the C-values of those.

The C-values are computed based on <https://doi.org/10.1007/s007999900023>.

Attributes

corpus_terms: List[str]

The list of strings extracted from the corpus to extract the terms from. The strings should be space-tokenised.

max_term_token_length: int

The maximum number of tokens a term can have.

stop_list: Set[str]

A set of stop words that should not appear in a term.

_terms_string_tokens: Tuple[Tuple[str]]

Tuple of terms string tokens to compute the C-values with.

c_value_threshold: float

A threshold to decide wether or not a term should be added to the candidate terms.

candidate_terms: Tuple[str]

The tuple of selected candidate terms.

_terms_counter: Counter

A mapping of terms to their occurrences in the corpus.

_term_stat_triples: Dict[str, Tuple[int, int, int]]

Tuple of term occurrences values used for computing C-values.

c_values: Tuple[Tuple[float, str]]

An ordered tuple of candidate terms with their C-values.

property c_values: Tuple[Tuple[float, str]]
Getter for the c_values attribute.

Should log a warning in case the user forgot to run the C-value computation.

Returns

Tuple[Tuple[float, str]]

The tuple of terms and their C-values.

property candidate_terms: Tuple[str]
Getter for the candidate_terms attribute.

Should log a warning in case the user forgot to run the C-value computation.

Returns

Tuple[str]

The tuple of candidate terms.

compute_c_values() None[source]

Compute the C-value scores.

The method sets the following attributes: - self._c_values - self._candidate_terms

property max_term_token_length: int

Getter for the max token length of terms.

Returns

int

The terms max token length.

Module contents