← Tutorials

Intrinsic Dimension Estimation

Tensorized (and parallelizable) pytorch implementation of the algorithm for intrinsic dimension estimation

Code : https://github.com/Tikquuss/intrinsics_dimension

1. Maximum Likelihood Estimation appoach

Calculates intrinsic dimension of the provided data points with the Maximum Likelihood Estimation appoach.

References:

One of the main approaches to intrinsic dimension estimation is to examine a neighborhood around each point in the dataset, and compute the Euclidean distance to the kthk^{th} nearest neighbor. Assuming that density is constant within small neighborhoods, the Maximum Likelihood Estimation (MLE) of [1] uses a Poisson process to model the number of points found by random sampling within a given radius around each sample point. By relating the rate of this process to the surface area of the sphere, the likelihood equations yield an estimate of the ID at a given point xx as: m^k(x)=[1k1j=1k1logTk(x)Tj(x)]1\hat{m}_k(x) = \bigg[ \frac{1}{k-1} \sum_{j=1}^{k-1} log \frac{T_k(x)}{T_j(x)} \bigg]^{-1} where Tj(x)T_j(x) is the Euclidean (l2l_2) distance from xx to its jthj^{th} nearest neighbor. [1] propose to average the local estimates at each point to obtain a global estimate (nn is the number of sample) m^k=1ni=1nm^k(xi)\hat{m}_k = \frac{1}{n} \sum_{i=1}^{n} \hat{m}_k(x_i) [2] suggestion a correction based on averaging of inverses m^k=[1ni=1nm^k(xi)1]1=[1n(k1)i=1nj=1k1logTk(xi)Tj(xi)]1\hat{m}_k = \bigg[ \frac{1}{n} \sum_{i=1}^{n} \hat{m}_k(x_i)^{-1} \bigg]^{-1} = \bigg[ \frac{1}{n(k-1)} \sum_{i=1}^{n} \sum_{j=1}^{k-1} log \frac{T_k(x_i)}{T_j(x_i)} \bigg]^{-1}

2. TWO-NN

Calculates intrinsic dimension of the provided data points with the TWO-NN algorithm.

References:

2-NN algorithm :

  1. Compute the pairwise distances for each point in the dataset i=1,,Ni = 1, …, N.
  2. For each point ii find the two shortest distances r1r_1 and r2r_2.
  3. For each point ii compute μi=r2r1\mu_i = \frac{r_2}{r_1}
  4. Compute the empirical cumulate Femp(μ)F^{emp}(\mu) by sorting the values of μ\mu in an ascending order through a permutation σ\sigma, then define Femp(μσ(i))=1NF^{emp}(\mu_{\sigma(i)}) = \frac{1}{N}
  5. Fit the points of the plane given by coordinates {(log(μi),log(1Femp(μi)))  i=1,...,N}\{(log(\mu_i), −log(1−F^{emp}(\mu_i))) \ \| \ i=1,..., N\} with a straight line passing through the origin.

3. Installation

pip install intrinsics_dimension

4. Get started


from intrinsics_dimension import mle_id, twonn_numpy, twonn_pytorch

n, dim = 512, 1024
data = torch.randn(n, dim)
d1 = mle_id(data, k=2, averaging_of_inverses = False)
d2 = mle_id(data, k=2, averaging_of_inverses = True)
d3 = twonn_numpy(data.numpy(), return_xy=False)
d4 = twonn_pytorch(data, return_xy=False)
print(d1, d2, d3, d4)

data[1] = data[0] # make distance(data[1], data[0]) = 0
d1 = mle_id(data, k=2, averaging_of_inverses = False)
d2 = mle_id(data, k=2, averaging_of_inverses = True)
d3 = twonn_numpy(data.numpy(), return_xy=False)
d4 = twonn_pytorch(data, return_xy=False)
print(d1, d2, d3, d4)