ltpda: m-toolbox/classes/+utils/@math/kstest.m comparison

comparison m-toolbox/classes/+utils/@math/kstest.m @ 0:f0afece42f48

Import.

author	Daniele Nicolodi <nicolodi@science.unitn.it>
date	Wed, 23 Nov 2011 19:22:13 +0100
parents
children

comparison

equal deleted inserted replaced

--1:000000000000
+:f0afece42f48
+% KSTEST perform the Kolmogorov - Smirnov statistical hypothesis test
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%
+% DESCRIPTION:
+%
+% Kolmogorov - Smirnov test is typically used to assess if a sample comes
+% from a specific distribution or if two data samples came from the same
+% distribution. The test statistics is d_K = max|S(x) - K(x)| where S(x)
+% and K(x) are cumulative distribution functions of the two inputs
+% respectively.
+% In the case of the test on a single data series:
+% - null hypothesis is that the data are a realizations of a random variable
+%   which is distributed according to the given probability distribution
+% In the case of the test on two data series:
+% - null hypothesis is that the two data series are realizations of the same random variable
+%
+% CALL:
+%
+% H = utils.math.kstest(y1, y2, alpha, distparams)
+% [H] = utils.math.kstest(y1, y2, alpha, distparams)
+% [H, KSstatistic] = utils.math.kstest(y1, y2, alpha, distparams)
+% [H, KSstatistic, criticalValue] = utils.math.kstest(y1, y2, alpha, distparams)
+% [H, KSstatistic, criticalValue] = utils.math.kstest(y1, y2, alpha, distparams, shapeparam)
+% [H, KSstatistic, criticalValue, pValue] = utils.math.kstest(y1, y2, alpha, distparams, shapeparam, criticalValue)
+%
+% INPUT:
+%
+% - Y1 are the data we want to test against Y2.
+%
+% - Y2 can be a theoretical distribution or a second set of data. In case
+% of theoretical distribution, Y2 should be a string with the corresponding
+% distribution name. Permitted values are:
+%   - 'NormDist' Nomal distribution
+%   - 'Chi2Dist' Chi square distribution
+%   - 'FDist' F distribution
+%   - 'GammaDist' Gamma distribution
+% If Y2 is left empty a normal distribution is assumed.
+%
+% - ALPHA is the desired significance level (default = 0.05). It represents
+% the probability of rejecting the null hypothesis when it is true.
+% Rejecting the null hypothesis, H0, when it is true is called a Type I
+% Error. Therefore, if the null hypothesis is true , the level of the test,
+% is the probability of a type I error.
+%
+% - DISTPARAMS are the parameters of the chosen theoretical distribution.
+% You should not assign this input if Y2 are experimental data. In general
+% DISTPARAMS is a vector containing the following distribution parameters:
+%   - In case of 'NormDist', DISTPARAMS is a vector containing mean and
+%   standard deviation of the normal distribution [mean sigma]. Default [0 1]
+%   - In case of 'Chi2Dist' , DISTPARAMS is a number containing containing
+%   the degrees of freedom of the chi square distribution [dof]. Default [2]
+%   - In case of 'FDist', DISTPARAMS is a vector containing the two degrees
+%   of freedom of the F distribution [dof1 dof2]. Default [2 2]
+%   - In case of 'GammaDist', DISTPARAMS is a vector containing the shape
+%   and scale parameters [k, theta]. Default [2 2]
+%
+% - SHAPEPARAM In the case of comparison of a data series with a
+% theoretical distribution and the data series is composed of correlated
+% elements. K can be adjusted with a shape parameter in order to recover
+% test fairness [3]. In such a case the test is performed for K' = Phi * K.
+% Phi is the corresponding Shape parameter. The shape parameter depends on
+% the correlations and on the significance value. It does not depend on
+% data length. Default [1]
+%
+% - CRITICALVALUE In case the critical value for the test is available from
+% external calculations, e.g. Monte Carlo simulation, the vale can be input
+% to the method
+%
+% OUTPUT:
+%
+% - H indicates the result of the hypothesis test:
+%  H = false => Do not reject the null hypothesis at significance level ALPHA.
+%  H = true => Reject the null hypothesis at significance level ALPHA.
+%
+% - TEST STATISTIC is the value of d_K = max|S(x) - K(x)|.
+%
+% - CRITICAL VALUE is the value of the test statistics corresponding to the
+% significance level. CRITICAL VALUE is depending on K, where K is the data length of Y1 if
+% Y2 is a theoretical distribution, otherwise if Y1 and Y2 are two data
+% samples K = n1*n2/(n1 + n2) where n1 and n2 are data length of Y1 and Y2
+% respectively. In the case of comparison of a data series with a
+% theoretical distribution and the data series is composed of correlated
+% elements. K can be adjusted with a shape parameter in order to recover
+% test fairness [3]. In such a case the test is performed for K' = Phi * K.
+% If TEST STATISTIC > CRITICAL VALUE the null hypothesis is rejected.
+%
+% - P VALUE is the probability value associated to the test statistic.
+%
+% Luigi Ferraioli 17-02-2011
+%
+% REFERENCES:
+%
+%   [1] Massey, F.J., (1951) "The Kolmogorov-Smirnov Test for Goodness of
+%   Fit", Journal of the American Statistical Association, 46(253):68-78.
+%   [2] Miller, L.H., (1956) "Table of Percentage Points of Kolmogorov
+%   Statistics", Journal of the American Statistical Association,
+%   51(273):111-121.
+%   [3] Ferraioli L. et al, to be published.
+%
+% % $Id: kstest.m,v 1.8 2011/07/14 07:09:29 mauro Exp $
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+function [H, KSstatistic, criticalValue, pValue] = kstest(y1, y2, alpha, varargin)
+% check inputs
+if isempty(y2)
+y2 = 'normdist'; % set normal distribution as default
+end
+if isempty(alpha)
+alpha = 0.05;
+end
+if nargin > 3
+dof = varargin{1};
+elseif nargin <= 3 && ischar(y2)
+switch lower(y2) % assign dof
+case 'fdist'
+dof = [2 2];
+case 'normdist'
+dof = [0 1];
+case 'chi2dist'
+dof = [2];
+case 'gammadist'
+dof = [2 2];
+end
+end
+shp = 1;
+if nargin > 4
+shp = varargin{2};
+if isempty(shp)
+shp = 1;
+end
+end
+if nargin > 5
+criticalValue = varargin{3};
+else
+criticalValue = [];
+end
+n1     =  length(y1);
+% get empirical distribution for input data
+[CD1,x1] = utils.math.ecdf(y1);
+% check if we have a second dataset or a theoretical distribution as second
+% input
+if ischar(y2)
+% switch between theoretical distributions
+switch lower(y2)
+case 'fdist'
+CD2 = utils.math.Fcdf(x1, dof(1), dof(2));
+case 'normdist'
+CD2 = utils.math.Normcdf(x1, dof(1), dof(2));
+case 'chi2dist'
+CD2 = utils.math.Chi2cdf(x1, dof(1));
+case 'gammadist'
+CD2 = gammainc(x./dof(2), dof(1));
+otherwise
+error('??? Unrecognized distribution type')
+end
+n2 =  [];
+n1 = shp*n1;
+% calculate empirical distribution for second input dataset
+else
+[eCD2, ex2] = utils.math.ecdf(y2);
+CD2  =  interp1(ex2, eCD2, x1, 'linear');
+n2  =  length(y2);
+end
+KSstatistic = max(abs(CD1 - CD2));
+if isempty(criticalValue)
+criticalValue = utils.math.SKcriticalvalues(n1, n2, alpha/2);
+end
+% "H = 0" implies that we "Do not reject the null hypothesis at the
+% significance level of alpha," and "H = 1" implies that we "Reject null
+% hypothesis at significance level of alpha."
+H  =  (KSstatistic > criticalValue);
+if nargout > 3
+pValue = utils.math.KSpValue(KSstatistic, n1, n2);
+end
+end

Mercurial > hg > ltpda

comparison m-toolbox/classes/+utils/@math/kstest.m @ 0:f0afece42f48