mgcpy.independence_tests package¶
Submodules¶
mgcpy.independence_tests.abstract_class module¶
Main Independence Test Abstract Class
-
class
mgcpy.independence_tests.abstract_class.
IndependenceTest
(compute_distance_matrix=None)[source]¶ Bases:
abc.ABC
IndependenceTest abstract class
Specifies the generic interface that must be implemented by all the independence tests in the mgcpy package.
Parameters: compute_distance_matrix ( FunctionType
orcallable()
) – a function to compute the pairwise distance matrix, given a data matrix-
test_statistic
(self, matrix_X, matrix_Y)[source]¶ Abstract method to compute the test statistic given two data matrices
Parameters: - matrix_X (2D numpy.array) – a
[n*p]
data matrix, a matrix with n samples inp
dimensions - matrix_Y (2D numpy.array) – a
[n*q]
data matrix, a matrix with n samples inq
dimensions
Returns: returns a list of two items, that contains:
test_statistic_: the test statistic computed using the respective independence test test_statistic_metadata_: (optional) metadata other than the test_statistic, that the independence tests computes in the process
Return type: list
- matrix_X (2D numpy.array) – a
-
p_value
(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶ Tests independence between two datasets using the independence test and permutation test.
Parameters: - matrix_X (2D numpy.array) – a
[n*p]
matrix, a matrix with n samples inp
dimensions - matrix_Y (2D numpy.array) – a
[n*q]
matrix, a matrix with n samples inq
dimensions - replication_factor (integer) – specifies the number of replications to use for
the permutation test. Defaults to
1000
.
Returns: returns a list of two items, that contains:
p_value_: P-value p_value_metadata_: (optional) a dict
of metadata other than the p_value, that the independence tests computes in the process
- matrix_X (2D numpy.array) – a
-
p_value_block
(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶ Tests independence between two datasets using block permutation test.
Parameters: - matrix_X (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*p]
data matrix, a matrix withn
samples inp
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*q]
data matrix, a matrix withn
samples inq
dimensions
- a
- replication_factor (integer) – specifies the number of replications to use for
the permutation test. Defaults to
1000
.
Returns: returns a list of two items, that contains:
p_value: P-value of MGC metadata: a dict
of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
Return type: list
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)
- matrix_X (2D numpy.array) –
-
mgcpy.independence_tests.dcorr module¶
-
class
mgcpy.independence_tests.dcorr.
DCorr
(compute_distance_matrix=None, which_test='unbiased', is_paired=False)[source]¶ Bases:
mgcpy.independence_tests.abstract_class.IndependenceTest
Parameters: - compute_distance_matrix (FunctionType or callable()) – a function to compute the pairwise distance matrix, given a data matrix
- which_test (string) – the type of global correlation to use, can be ‘unbiased’, ‘biased’ ‘mantel’
-
test_statistic
(self, matrix_X, matrix_Y, is_fast=False, fast_dcorr_data={})[source]¶ Computes the distance correlation between two datasets.
Parameters: - matrix_X (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*d]
data matrix, a matrix withn
samples inp
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*d]
data matrix, a matrix withn
samples inq
dimensions
- a
- is_fast (boolean) – is a boolean flag which specifies if the test_statistic should be computed (approximated) using the fast version of dcorr. This defaults to False.
- fast_dcorr_data (dictonary) –
a
dict
of fast dcorr params, refer: self._fast_dcorr_test_statisticsub_samples: specifies the number of subsamples.
Returns: returns a list of two items, that contains:
test_statistic: the sample dcorr statistic within [-1, 1] independence_test_metadata: a dict
of metadata with the following keys: - :variance_X: the variance of the data matrix X - :variance_Y: the variance of the data matrix Y
Return type: list
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.dcorr import DCorr >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> dcorr = DCorr(which_test = 'unbiased') >>> dcorr_statistic, test_statistic_metadata = dcorr.test_statistic(X, Y)
- matrix_X (2D numpy.array) –
-
compute_global_covariance
(self, dist_mtx_X, dist_mtx_Y)[source]¶ Helper function: Compute the global covariance using distance matrix A and B
Parameters: - dist_mtx_X (2D numpy.array) – a [n*n] distance matrix
- dist_mtx_Y (2D numpy.array) – a [n*n] distance matrix
Returns: the data covariance or variance based on the distance matrices
Return type: numpy.float
-
unbiased_T
(self, matrix_X, matrix_Y)[source]¶ Helper function: Compute the t-test statistic for unbiased dcorr
Parameters: - matrix_X (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*d]
matrix, a matrix withn
samples inp
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*d]
matrix, a matrix withn
samples inq
dimensions
- a
Returns: test statistic of t-test for unbiased dcorr
Return type: numpy.float
- matrix_X (2D numpy.array) –
-
p_value
(self, matrix_X, matrix_Y, replication_factor=1000, is_fast=False, fast_dcorr_data={})[source]¶ Compute the p-value if the correlation test is unbiased, p-value can be computed using a t test otherwise computed using permutation test
Parameters: - matrix_X (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*d]
data matrix, a matrix withn
samples inp
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*d]
data matrix, a matrix withn
samples inq
dimensions
- a
- replication_factor (integer) – specifies the number of replications to use for
the permutation test. Defaults to
1000
. - is_fast (boolean) – is a boolean flag which specifies if the test_statistic should be computed (approximated) using the fast version of dcorr. This defaults to False.
- fast_dcorr_data (dictonary) –
a
dict
of fast dcorr params, refer: self._fast_dcorr_test_statisticsub_samples: specifies the number of subsamples.
Returns: p-value of distance correlation
Return type: numpy.float
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.dcorr import DCorr >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> dcorr = DCorr() >>> p_value, metadata = dcorr.p_value(X, Y, replication_factor = 100)
- matrix_X (2D numpy.array) –
-
get_name
(self)¶ Returns: the name of the independence test Return type: string
-
p_value_block
(self, matrix_X, matrix_Y, replication_factor=1000)¶ Tests independence between two datasets using block permutation test.
Parameters: - matrix_X (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*p]
data matrix, a matrix withn
samples inp
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*q]
data matrix, a matrix withn
samples inq
dimensions
- a
- replication_factor (integer) – specifies the number of replications to use for
the permutation test. Defaults to
1000
.
Returns: returns a list of two items, that contains:
p_value: P-value of MGC metadata: a dict
of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
Return type: list
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)
- matrix_X (2D numpy.array) –
mgcpy.independence_tests.hhg module¶
-
class
mgcpy.independence_tests.hhg.
HHG
(compute_distance_matrix=None)[source]¶ Bases:
mgcpy.independence_tests.abstract_class.IndependenceTest
Parameters: compute_distance_matrix (FunctionType or callable()) – a function to compute the pairwise distance matrix, given a data matrix -
test_statistic
(self, matrix_X, matrix_Y)[source]¶ Computes the HHG correlation measure between two datasets.
Parameters: - matrix_X (2D numpy.array) – a [n*p] data matrix, a matrix with n samples in p dimensions
- matrix_Y (2D numpy.array) – a [n*q] data matrix, a matrix with n samples in q dimensions
- replication_factor (int) – specifies the number of replications to use for the permutation test. Defaults to 1000.
Returns: returns a list of two items, that contains:
test_statistic_: test statistic test_statistic_metadata_: (optional) a dict
of metadata other than the p_value, that the independence tests computes in the process
Return type: float, dict
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.hhg import HHG
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> hhg = HHG() >>> hhg_test_stat = hhg.test_statistic(X, Y)
-
p_value
(self, matrix_X=None, matrix_Y=None, replication_factor=1000)[source]¶ Tests independence between two datasets using HHG and permutation test.
Parameters: - matrix_X (2D numpy.array) – a [n*p] data matrix, a matrix with n samples in p dimensions
- matrix_Y (2D numpy.array) – a [n*q] data matrix, a matrix with n samples in q dimensions
- replication_factor (int) – specifies the number of replications to use for the permutation test. Defaults to 1000.
Returns: returns a list of two items, that contains:
p_value_: P-value p_value_metadata_: (optional) a dict
of metadata other than the p_value, that the independence tests computes in the process
Return type: float, dict
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.hhg import HHG
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> hhg = HHG() >>> hhg_p_value = hhg.p_value(X, Y)
-
get_name
(self)¶ Returns: the name of the independence test Return type: string
-
p_value_block
(self, matrix_X, matrix_Y, replication_factor=1000)¶ Tests independence between two datasets using block permutation test.
Parameters: - matrix_X (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*p]
data matrix, a matrix withn
samples inp
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*q]
data matrix, a matrix withn
samples inq
dimensions
- a
- replication_factor (integer) – specifies the number of replications to use for
the permutation test. Defaults to
1000
.
Returns: returns a list of two items, that contains:
p_value: P-value of MGC metadata: a dict
of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
Return type: list
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)
- matrix_X (2D numpy.array) –
-
mgcpy.independence_tests.kendall_spearman module¶
-
class
mgcpy.independence_tests.kendall_spearman.
KendallSpearman
(compute_distance_matrix=None, which_test='kendall')[source]¶ Bases:
mgcpy.independence_tests.abstract_class.IndependenceTest
Parameters: - compute_distance_matrix (FunctionType or callable()) – a function to compute the pairwise distance matrix, given a data matrix
- which_test (str) – specifies which test to use, including ‘kendall’ or ‘spearman’
-
test_statistic
(self, matrix_X, matrix_Y)[source]¶ Computes the Spearman’s rho or Kendall’s tau measure between two datasets. - Implments scipy.stats’s implementation for both
Parameters: - matrix_X (1D numpy.array) – a [n*1] data matrix, a matrix with n samples in 1 dimension
- matrix_Y (1D numpy.array) – a [n*1] data matrix, a matrix with n samples in 1 dimension
Returns: returns a list of two items, that contains:
test_stat_: test statistic test_statistic_metadata_: (optional) a dict
of metadata other than the p_value, that the independence tests computes in the process
Return type: float, dict
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.kendall_spearman import KendallSpearman
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> kendall_spearman = KendallSpearman() >>> kendall_spearman_stat = kendall_spearman.test_statistic(X, Y)
-
p_value
(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶ Tests independence between two datasets using the independence test.
Parameters: - matrix_X (2D numpy.array) – a [n*p] data matrix, a matrix with n samples in p dimensions
- matrix_Y (2D numpy.array) – a [n*q] data matrix, a matrix with n samples in q dimensions
- replication_factor (int) – specifies the number of replications to use for the permutation test. Defaults to 1000.
Returns: returns a list of two items, that contains:
p_value_: P-value p_value_metadata_: (optional) a dict
of metadata other than the p_value, that the independence tests computes in the process
Return type: float, dict
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.kendall_spearman import KendallSpearman
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> kendall_spearman = KendallSpearman() >>> kendall_spearman_p_value = kendall_spearman.p_value(X, Y)
-
get_name
(self)¶ Returns: the name of the independence test Return type: string
-
p_value_block
(self, matrix_X, matrix_Y, replication_factor=1000)¶ Tests independence between two datasets using block permutation test.
Parameters: - matrix_X (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*p]
data matrix, a matrix withn
samples inp
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*q]
data matrix, a matrix withn
samples inq
dimensions
- a
- replication_factor (integer) – specifies the number of replications to use for
the permutation test. Defaults to
1000
.
Returns: returns a list of two items, that contains:
p_value: P-value of MGC metadata: a dict
of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
Return type: list
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)
- matrix_X (2D numpy.array) –
mgcpy.independence_tests.manova module¶
mgcpy.independence_tests.mdmr module¶
Main MDMR Independence Test Module
-
class
mgcpy.independence_tests.mdmr.
MDMR
(compute_distance_matrix=None)[source]¶ Bases:
mgcpy.independence_tests.abstract_class.IndependenceTest
Parameters: compute_distance_matrix ( FunctionType
orcallable()
) – a function to compute the pairwise distance matrix, given a data matrix-
test_statistic
(self, matrix_X, matrix_Y, permutations=0, individual=0, disttype='cityblock')[source]¶ Computes MDMR Pseudo-F statistic between two datasets.
- It first takes the distance matrix of Y (by )
- Next it regresses X into a portion due to Y and a portion due to residual
- The p-value is for the null hypothesis that the variable of X is not correlated with Y’s distance matrix
Parameters: - data_matrix_X (2D numpy.array) –
(optional, default picked from class attr) is interpreted as:
- a
[n*d]
data matrix, a matrix with n samples in d dimensions
- a
- data_matrix_Y (2D numpy.array) –
(optional, default picked from class attr) is interpreted as:
- a
[n*d]
data matrix, a matrix with n samples in d dimensions
- a
- 'individual' – -integer, 0 or 1 with value 0 tests the entire X matrix (default) with value 1 tests the entire X matrix and then each predictor variable individually
Returns: with individual = 0, returns 1 values, with individual = 1 returns 2 values, containing:
-the test statistic of the entire X matrix -for individual = 1, an array with the variable of X in the first column,
the test statistic in the second, and the permutation p-value in the third (which here will always be 1)
Return type: list
-
p_value
(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶ Tests independence between two datasets using MGC and permutation test.
Parameters: - matrix_X (2D numpy.array) –
is interpreted as:
- a
[n*d]
data matrix, a matrix withn
samples ind
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as:
- a
[n*d]
data matrix, a matrix withn
samples ind
dimensions
- a
- replication_factor (integer) – specifies the number of replications to use for
the permutation test. Defaults to
1000
.
Returns: returns a list of two items,that contains:
p_value: P-value of MGC p_value_metadata:
Return type: list
- matrix_X (2D numpy.array) –
-
ind_p_value
(self, matrix_X, matrix_Y, permutations=1000, individual=1, disttype='cityblock')[source]¶ Individual predictor variable p-values calculation
Parameters: - matrix_X (2D numpy.array) –
is interpreted as:
- a
[n*d]
data matrix, a matrix withn
samples ind
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as:
- a
[n*d]
data matrix, a matrix withn
samples ind
dimensions
- a
- matrix_X (2D numpy.array) –
-
p_value_block
(self, matrix_X, matrix_Y, replication_factor=1000)¶ Tests independence between two datasets using block permutation test.
Parameters: - matrix_X (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*p]
data matrix, a matrix withn
samples inp
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*q]
data matrix, a matrix withn
samples inq
dimensions
- a
- replication_factor (integer) – specifies the number of replications to use for
the permutation test. Defaults to
1000
.
Returns: returns a list of two items, that contains:
p_value: P-value of MGC metadata: a dict
of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
Return type: list
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)
- matrix_X (2D numpy.array) –
-
mgcpy.independence_tests.mgc module¶
Main MGC Independence Test Module
-
class
mgcpy.independence_tests.mgc.
MGC
(compute_distance_matrix=None, base_global_correlation='mgc')[source]¶ Bases:
mgcpy.independence_tests.abstract_class.IndependenceTest
Parameters: - compute_distance_matrix (
FunctionType
orcallable()
) – a function to compute the pairwise distance matrix, given a data matrix - base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’dcor’,’mantel’, and ‘rank’. Defaults to mgc.
-
test_statistic
(self, matrix_X, matrix_Y, is_fast=False, fast_mgc_data={})[source]¶ Computes the MGC measure between two datasets.
- It first computes all the local correlations
- Then, it returns the maximal statistic among all local correlations based on thresholding.
Parameters: - matrix_X (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*p]
data matrix, a matrix withn
samples inp
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*q]
data matrix, a matrix withn
samples inq
dimensions
- a
- is_fast (boolean) – is a boolean flag which specifies if the test_statistic should be computed (approximated) using the fast version of mgc. This defaults to False.
- fast_mgc_data (dictonary) –
a
dict
of fast mgc params, refer: self._fast_mgc_test_statisticsub_samples: specifies the number of subsamples.
Returns: returns a list of two items, that contains:
test_statistic: the sample MGC statistic within [-1, 1] independence_test_metadata: a dict
of metadata with the following keys: - :local_correlation_matrix: a 2D matrix of all local correlations within[-1,1]
- :optimal_scale: the estimated optimal scale as an[x, y]
pair.
Return type: list
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc import MGC >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc = MGC() >>> mgc_statistic, test_statistic_metadata = mgc.test_statistic(X, Y)
-
p_value
(self, matrix_X, matrix_Y, replication_factor=1000, is_fast=False, fast_mgc_data={})[source]¶ Tests independence between two datasets using MGC and permutation test.
Parameters: - matrix_X (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*p]
data matrix, a matrix withn
samples inp
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*q]
data matrix, a matrix withn
samples inq
dimensions
- a
- replication_factor (integer) – specifies the number of replications to use for
the permutation test. Defaults to
1000
. - is_fast (boolean) – is a boolean flag which specifies if the p_value should be computed (approximated) using the fast version of mgc. This defaults to False.
- fast_mgc_data (dictonary) –
a
dict
of fast mgc params, , refer: self._fast_mgc_p_valuesub_samples: specifies the number of subsamples.
Returns: returns a list of two items, that contains:
p_value: P-value of MGC metadata: a dict
of metadata with the following keys:test_statistic: the sample MGC statistic within [-1, 1]
p_local_correlation_matrix: a 2D matrix of the P-values of the local correlations local_correlation_matrix: a 2D matrix of all local correlations within [-1,1]
optimal_scale: the estimated optimal scale as an [x, y]
pair.
Return type: list
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc import MGC >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc = MGC() >>> p_value, metadata = mgc.p_value(X, Y, replication_factor = 100)
- matrix_X (2D numpy.array) –
-
get_name
(self)¶ Returns: the name of the independence test Return type: string
-
p_value_block
(self, matrix_X, matrix_Y, replication_factor=1000)¶ Tests independence between two datasets using block permutation test.
Parameters: - matrix_X (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*p]
data matrix, a matrix withn
samples inp
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*q]
data matrix, a matrix withn
samples inq
dimensions
- a
- replication_factor (integer) – specifies the number of replications to use for
the permutation test. Defaults to
1000
.
Returns: returns a list of two items, that contains:
p_value: P-value of MGC metadata: a dict
of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
Return type: list
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)
- matrix_X (2D numpy.array) –
- compute_distance_matrix (
mgcpy.independence_tests.rv_corr module¶
-
class
mgcpy.independence_tests.rv_corr.
RVCorr
(compute_distance_matrix=None, which_test='rv')[source]¶ Bases:
mgcpy.independence_tests.abstract_class.IndependenceTest
Parameters: - compute_distance_matrix (FunctionType or callable()) – a function to compute the pairwise distance matrix, given a data matrix
- which_test (str) – specifies which test to use, including ‘rv’, ‘pearson’, and ‘cca’.
-
test_statistic
(self, matrix_X=None, matrix_Y=None)[source]¶ Computes the Pearson/RV/CCa correlation measure between two datasets.
- Default computes linear correlation for RV
- Computes pearson’s correlation
- Calculates local linear correlations for CCa
Parameters: - matrix_X (2D numpy.array) – a [n*p] data matrix, a matrix with n samples in p dimensions
- matrix_Y (2D numpy.array) – a [n*q] data matrix, a matrix with n samples in q dimensions
- replication_factor (int) – specifies the number of replications to use for the permutation test. Defaults to 1000.
Returns: returns a list of two items, that contains:
test_statistic_: test statistic test_statistic_metadata_: (optional) a dict
of metadata other than the p_value, that the independence tests computes in the process
Return type: float, dict
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.rv_corr import RVCorr
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> rvcorr = RVCorr() >>> rvcorr_test_stat = rvcorr.test_statistic(X, Y)
-
p_value
(self, matrix_X, matrix_Y, replication_factor=1000)[source]¶ Tests independence between two datasets using the independence test.
Parameters: - matrix_X (2D numpy.array) – a [n*p] data matrix, a matrix with n samples in p dimensions
- matrix_Y (2D numpy.array) – a [n*q] data matrix, a matrix with n samples in q dimensions
- replication_factor (int) – specifies the number of replications to use for the permutation test. Defaults to 1000.
Returns: returns a list of two items, that contains:
p_value_: P-value p_value_metadata_: (optional) a dict
of metadata other than the p_value, that the independence tests computes in the process
Return type: float, dict
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.rv_corr import RVCorr
>>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> rvcorr = RVCorr() >>> rvcorr_p_value = rvcorr.p_value(X, Y)
-
get_name
(self)¶ Returns: the name of the independence test Return type: string
-
p_value_block
(self, matrix_X, matrix_Y, replication_factor=1000)¶ Tests independence between two datasets using block permutation test.
Parameters: - matrix_X (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*p]
data matrix, a matrix withn
samples inp
dimensions
- a
- matrix_Y (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*q]
data matrix, a matrix withn
samples inq
dimensions
- a
- replication_factor (integer) – specifies the number of replications to use for
the permutation test. Defaults to
1000
.
Returns: returns a list of two items, that contains:
p_value: P-value of MGC metadata: a dict
of metadata with the following keys: - :null_distribution: numpy array representing distribution of test statistic under null.
Return type: list
Example:
>>> import numpy as np >>> from mgcpy.independence_tests.mgc.mgc_ts import MGC_TS >>> >>> X = np.array([0.07487683, -0.18073412, 0.37266440, 0.06074847, 0.76899045, ... 0.51862516, -0.13480764, -0.54368083, -0.73812644, 0.54910974]).reshape(-1, 1) >>> Y = np.array([-1.31741173, -0.41634224, 2.24021815, 0.88317196, 2.00149312, ... 1.35857623, -0.06729464, 0.16168344, -0.61048226, 0.41711113]).reshape(-1, 1) >>> mgc_ts = MGC_TS() >>> p_value, metadata = mgc_ts.p_value(X, Y, replication_factor = 100)
- matrix_X (2D numpy.array) –