Clustering#

aitlas.clustering.kmeans module#

class Kmeans(k)[source]#

Bases: object

cluster(data, verbose=False)[source]#

Performs k-means clustering.

Parameters:: x_data (np.array (N * dim)) – data to cluster

aitlas.clustering.pic module#

class PIC(args=None, sigma=0.2, nnn=5, alpha=0.001, distribute_singletons=True)[source]#

Bases: object

Class to perform Power Iteration Clustering on a graph of nearest neighbors. Arguments for consistency with k-means init:

Parameters:

sigma (float) – bandwith of the Gaussian kernel (default 0.2)
nnn (int) – number of nearest neighbors (default 5)
alpha (float) – parameter in PIC (default 0.001)
distribute_singletons (bool) – If True, reassign each singleton to the cluster of its closest nonsingleton nearest neighbors (up to nnn nearest neighbors).
images_lists (list of lists of ints) – for each cluster, the list of image indexes belonging to this cluster

cluster(data, verbose=False)[source]#

aitlas.clustering.utils module#

preprocess_features(npdata, pca=256)[source]#

Preprocess an array of features.

Parameters:

npdata (np.array (N * dim)) – features to preprocess
pca (int) – dim of output

Returns:

data PCA-reduced, whitened and L2-normalized

Return type:

np.array (N * pca)

make_graph(xb, nnn)[source]#

Builds a graph of nearest neighbors.

Parameters:

xb (np.array (N * dim)) – data
nnn (int) – number of nearest neighbors

Returns:

list for each data the list of ids to its nnn nearest neighbors

Returns:

list for each data the list of distances to its nnn NN

Return type:

np.array (N * nnn)

class ReassignedDataset(image_indexes, pseudolabels, dataset)[source]#

Bases: Dataset

A dataset where the new images labels are given in argument.

Parameters:

image_indexes (list of ints) – list of data indexes
pseudolabels (list of ints) – list of labels for each data
dataset (list of tuples with paths to images) – initial dataset
transform (callable, optional) – a function/transform that takes in an PIL image and returns a transformed version

make_dataset(image_indexes, pseudolabels)[source]#

cluster_assign(images_lists, dataset)[source]#

Creates a dataset from clustering, with clusters as labels.

Params images_lists:: for each cluster, the list of image indexes belonging to this cluster
Params dataset:: initial dataset
Returns:: dataset with clusters as labels
Return type:: ReassignedDataset(torch.utils.data.Dataset)

run_kmeans(x, nmb_clusters, verbose=False)[source]#: Runs kmeans on 1 GPU. :param x: data :type x: np.array (N * dim) :param nmb_clusters: number of clusters :type nmb_clusters: int :return: list of ids for each data to its nearest cluster :rtype: list of ints

arrange_clustering(images_lists)[source]#

make_adjacencyW(I, D, sigma)[source]#

Create adjacency matrix with a Gaussian kernel.

Parameters:

I (numpy array) – for each vertex the ids to its nnn linked vertices + first column of identity.
D (numpy array) – for each data the l2 distances to its nnn linked vertices + first column of zeros.
sigma (float) – bandwith of the Gaussian kernel.

Returns:

affinity matrix of the graph.

Return type:

scipy.sparse.csr_matrix

run_pic(I, D, sigma, alpha)[source]#: Run PIC algorithm

find_maxima_cluster(W, v)[source]#

Clustering

Contents

Clustering#

aitlas.clustering.kmeans module#

aitlas.clustering.pic module#

aitlas.clustering.utils module#