The Mahalanobis distance is a measure of the distance between a point P and a distribution D, introduced by P. C. Mahalanobis in 1936. Mahalanobis distance is an effective multivariate distance metric that measures the distance between a point (vector) and a distribution. The Mahalanobis distance is simply quadratic multiplication of mean difference and inverse of pooled covariance matrix. After that, multiply the result with the mean difference again and you take the square root. Mahalanobis distance is a way of measuring distance that accounts for correlation between variables. The origin will be at the centroid of the points (the point of their averages). Without the inverse of the covariance matrix, this is the Euclidean distance. R's mahalanobis function provides a simple means of detecting outliers in multidimensional data. function d=MahalanobisDistance(A, B)
% Return mahalanobis distance of two data matrices
% A and B (row = object, column = feature)
% @author: Kardi Teknomo
% http://people.revoledu.com/kardi/index.html
[n1, k1]=size(A);
[n2, k2]=size(B);
n=n1+n2;
if(k1~=k2)
    disp('number of columns of A and B must be the same')
else
    xDiff=mean(A)-mean(B); % mean diff row The Mahalanobis distance allows computing the distance between two points in a p-dimensional space, while taking into account the covariance structure across the p dimensions. In multivariate hypothesis testing, the Mahalanobis distance is used to construct test statistics. The square of the Mahalanobis distance writes: dM² = (x1 - x2) ∑-1 (x1 - x2) where xi is the vector x1 and ∑ is the covariance matrix. In lines 35-36 we calculate the inverse of the covariance matrix, which is required to calculate the Mahalanobis distance. Mahalanobis distance is the multivariate generalization of finding how many standard deviations away a point is from the mean of the multivariate distribution. This is (for vector x) defined as D^2 = (x - μ)' Σ^-1 (x - μ) function C=Covariance(X)
% Return covariance given data matrix X (row = object, column = feature)
% @author: Kardi Teknomo
% http://people.revoledu.com/kardi/index.html
[n,k]=size(X);
Xc=X-repmat(mean(X),n,1); % centered data
C=Xc'*Xc/n; % covariance The Mahalanobis Distance for five new beers that you haven't tried yet, based on five factors from a set of twenty benchmark beers that you love. MOUTLIERS(R1, alpha): when alpha = 0 or is omitted, then returns an n × 2 array whose first column contains the Mahalanobis distance squared of each vector in R1 The Mahalanobis distance (MD), in the original and principal component (PC) space, will be examined and interpreted in relation with the Euclidean distance (ED). It has excellent applications in multivariate anomaly detection, classification on highly imbalanced datasets and one-class classification and more untapped use cases. Simple example calculating Mahalanobis distance between two groups in R. The unbiased estimator for pooled covariance matrix is the standard way, as is in the Wikipedia page: https://en.wikipedia.org/wiki/Pooled_variance. The columns indicate the features, and the rows are the observations. 