`feature_grouper` API reference¶

A set of functions and an sklearn transformer class for finding clusters of correlated features and grouping them together into feature groups.

class FeatureGrouper(threshold=0.5, copy=True)[source]¶

Hierarchical clustering-based dimensionality reduction.

Calculates correlation matrix of all features in X, applies hierarchical clustering to create flat clusters of highly correlated features, then generates and applies a loading matrix that evenly weights the input features within each cluster.

Input features should be normalized (i.e. z-scores).

Parameters:	threshold – float The minimum correlation similarity threshold to group descendants of a cluster node into the same flat cluster. copy – bool If False, data passed to transform are overwritten.
Variables:	components_ – array, shape (n_components, n_features) The loading matrix obtained from clustering and weighting correlated features. n_components_ – int The number of components that were estimated from the data.

fit(X, y=None)[source]¶

Fit the model with X.

Parameters:	X – array-like, shape (n_samples, n_features) New data, where n_samples is the number of samples and n_features is the number of features.

inverse_transform(X)[source]¶

Transform data back to its original space. In other words, return an input X_original whose transform would be X.

Parameters:	X – array-like, shape (n_samples, n_components) New data, where n_samples is the number of samples and n_components is the number of components.

transform(X)[source]¶

Apply dimensionality reduction on X.

Parameters:	X – array-like, shape (n_samples, n_features) New data, where n_samples is the number of samples and n_features is the number of features.

cluster(X, threshold=0.5)[source]¶

Find clusters of correlated features from a correlation matrix using hierarchical clustering.

Parameters:	X – array-like, shape (n_samples, n_features) New data, where n_samples is the number of samples and n_features is the number of features. threshold – float The minimum correlation similarity threshold to group descendants of a cluster node into the same flat cluster.

make_loadings(labels, threshold=0.5)[source]¶

Generate a loading matrix from the feature cluster labels, given a minimum correlation similarity threshold.

Apply the loading matrix to the original data with np.matmul or the @ operator.

Example:

>>> import numpy as np
>>> import feature_grouper
>>> threshold = 0.5
>>> clusters = feature_grouper.cluster(X, threshold)
>>> loading_matrix = feature_grouper.make_loading_matrix(clusters, threshold)
>>> X_transformed = X @ loading_matrix

Parameters:	labels – array-like, shape (n,) A numpy 1d array containing the cluster number label for each column in the original dataset. threshold – float The minimum correlation similarity threshold that was used to cluster the features.

feature_grouper API reference¶

`feature_grouper` API reference¶