Clustering#
aitlas.clustering.kmeans module#
aitlas.clustering.pic module#
- class PIC(args=None, sigma=0.2, nnn=5, alpha=0.001, distribute_singletons=True)[source]#
Bases:
object
Class to perform Power Iteration Clustering on a graph of nearest neighbors. Arguments for consistency with k-means init:
- Parameters:
sigma (float) – bandwith of the Gaussian kernel (default 0.2)
nnn (int) – number of nearest neighbors (default 5)
alpha (float) – parameter in PIC (default 0.001)
distribute_singletons (bool) – If True, reassign each singleton to the cluster of its closest nonsingleton nearest neighbors (up to nnn nearest neighbors).
images_lists (list of lists of ints) – for each cluster, the list of image indexes belonging to this cluster
aitlas.clustering.utils module#
- preprocess_features(npdata, pca=256)[source]#
Preprocess an array of features.
- Parameters:
npdata (np.array (N * dim)) – features to preprocess
pca (int) – dim of output
- Returns:
data PCA-reduced, whitened and L2-normalized
- Return type:
np.array (N * pca)
- make_graph(xb, nnn)[source]#
Builds a graph of nearest neighbors.
- Parameters:
xb (np.array (N * dim)) – data
nnn (int) – number of nearest neighbors
- Returns:
list for each data the list of ids to its nnn nearest neighbors
- Returns:
list for each data the list of distances to its nnn NN
- Return type:
np.array (N * nnn)
- class ReassignedDataset(image_indexes, pseudolabels, dataset)[source]#
Bases:
Dataset
A dataset where the new images labels are given in argument.
- Parameters:
- cluster_assign(images_lists, dataset)[source]#
Creates a dataset from clustering, with clusters as labels.
- Params images_lists:
for each cluster, the list of image indexes belonging to this cluster
- Params dataset:
initial dataset
- Returns:
dataset with clusters as labels
- Return type:
- run_kmeans(x, nmb_clusters, verbose=False)[source]#
Runs kmeans on 1 GPU. :param x: data :type x: np.array (N * dim) :param nmb_clusters: number of clusters :type nmb_clusters: int :return: list of ids for each data to its nearest cluster :rtype: list of ints
- make_adjacencyW(I, D, sigma)[source]#
Create adjacency matrix with a Gaussian kernel.
- Parameters:
I (numpy array) – for each vertex the ids to its nnn linked vertices + first column of identity.
D (numpy array) – for each data the l2 distances to its nnn linked vertices + first column of zeros.
sigma (float) – bandwith of the Gaussian kernel.
- Returns:
affinity matrix of the graph.
- Return type: