mgcpy.independence_tests.utils package¶
Submodules¶
mgcpy.independence_tests.utils.compute_distance_matrix module¶
Common Distance Calculation Matrix
-
mgcpy.independence_tests.utils.compute_distance_matrix.
compute_distance
(matrix_X, matrix_Y, _compute_distance)[source]¶ Computes the distance matrix among both independence tests
Parameters: - matrix_X (2D numpy.array) – is interpreted as a
[n*p]
data matrix, a matrix withn
samples inp
dimensions - matrix_Y (2D numpy.array) – is interpreted as a
[n*q]
data matrix, a matrix withn
samples inq
dimensions - _compute_distance (
FunctionType
orcallable()
) – is interpreted as the distance matrix calculation with the specified metric
Returns: returns a list of two items, that contains:
matrix_X: the calculated distance matrix for matrix_X
matrix_Y: the calculated distance matrix for matrix_Y
Return type: list
- matrix_X (2D numpy.array) – is interpreted as a
mgcpy.independence_tests.utils.distance_transform module¶
MGC’s Distance Transform Module
-
mgcpy.independence_tests.utils.distance_transform.
center_distance_matrix
(ndarray distance_matrix, str base_global_correlation='mgc', is_ranked=True)¶ Appropriately transform distance matrices by centering them, based on the specified global correlation to build on
Parameters: - distance_matrix (2D numpy.array) – a symmetric distance matrix
- base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’unbiased’, ‘biased’, ‘mantel’, and ‘rank’. Defaults to mgc.
- is_ranked (boolean) – specifies whether ranking within a column is computed or not Defaults to True.
Returns: A
dict
with the following keys:centered_distance_matrix: a [n*n]
centered distance matrixranked_distance_matrix: a [n*n]
column-ranked distance matrix
Return type: dictionary
-
mgcpy.independence_tests.utils.distance_transform.
dense_rank_data
(ndarray data_matrix)¶ Equivalent to scipy.stats.rankdata(x, “dense”), but faster!
Parameters: data_matrix – any data matrix. Returns: dense ranked data_matrix
Return type: 2D numpy.array
-
mgcpy.independence_tests.utils.distance_transform.
rank_distance_matrix
(ndarray distance_matrix)¶ Sorts the entries within each column in ascending order
For ties, the “minimum” ranking is used, e.g. if there are repeating distance entries, The order is like 1,2,2,3,3,4,…
Parameters: distance_matrix (2D numpy.array) – a symmetric distance matrix. Returns: column-wise ranked matrix of distance_matrix
Return type: 2D numpy.array
-
mgcpy.independence_tests.utils.distance_transform.
transform_distance_matrix
(ndarray distance_matrix_A, ndarray distance_matrix_B, str base_global_correlation='mgc', is_ranked=True)¶ Transforms the distance matrices appropriately, with column-wise ranking if needed.
Parameters: - distance_matrix_A – first symmetric distance matrix,
[n*n]
- distance_matrix_B – second symmetric distance matrix,
[n*n]
- base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’unbiased’, ‘biased’, ‘mantel’, and ‘rank’. Defaults to mgc.
- is_ranked (boolean) – specifies whether ranking within a column is computed or not, if, base_global_correlation = “rank”, then ranking is performed regardless of the value if is_ranked. Defaults to True.
Returns: A
dict
with the following keys:centered_distance_matrix_A: a [n*n]
centered distance matrix of Acentered_distance_matrix_B: a [n*n]
centered distance matrix of Branked_distance_matrix_A: a [n*n]
column-ranked distance matrix of Aranked_distance_matrix_B: a [n*n]
column-ranked distance matrix of B
Return type: dictionary
Example:
>>> import numpy as np >>> from scipy.spatial import distance_matrix >>> from mgcpy.mgc.distance_transform import transform_distance_matrix >>> >>> X = np.array([[2, 1, 100], [4, 2, 10], [8, 3, 10]]) >>> Y = np.array([[30, 20, 10], [5, 10, 20], [8, 16, 32]]) >>> X_distance_matrix = distance_matrix(X, X) >>> Y_distance_matrix = distance_matrix(Y, Y) >>> transformed_distance_matrix_X_Y = transform_distance_matrix(X_distance_matrix, Y_distance_matrix)
- distance_matrix_A – first symmetric distance matrix,
mgcpy.independence_tests.utils.fast_functions module¶
Common Functions used in Fast Dcorr and Fast MGC
mgcpy.independence_tests.utils.mdmr_functions module¶
-
mgcpy.independence_tests.utils.mdmr_functions.
check_rank
(X)[source]¶ This function checks if X is rank deficient.
Parameters: matrix_X (2D numpy.array) – is interpreted as:
- a
[n*d]
data matrix, a matrix withn
samples ind
dimensions
Return type: None Raise: Raises Exception if X matrix is rank deficient. - a
-
mgcpy.independence_tests.utils.mdmr_functions.
hatify
(X)[source]¶ Calculates the “hat” matrix.
Parameters: X (2D numpy.array) – is interpreted as:
- a
[n*d]
data matrix, a matrix withn
samples ind
dimensions
Returns: returns the hat matrix of the data matrix input. Return type: 2D numpy.array - a
-
mgcpy.independence_tests.utils.mdmr_functions.
gower_center
(Y)[source]¶ Computes Gower’s centered similarity matrix.
Parameters: Y (2D numpy.array) – is interpreted as:
- a
[n*n]
distance matrix
Returns: returns the gower centered similarity matrix of the input matrix. Return type: 2D numpy.array - a
-
mgcpy.independence_tests.utils.mdmr_functions.
gower_center_many
(Ys)[source]¶ Gower centers each matrix in the input, which is a special centering process disucssed in detail in Gower (1966).
Parameters: Ys (2D numpy.array Note: in practice this function is only run on one matrix currently. Due to this, Ys will just be a 1D numpy.array) – is interpreted as:
- an array of
[n^2*1]
distance matrices
Returns: returns the gower centered similarity matrix of the all input matrices. Return type: 2D numpy.array - an array of
-
mgcpy.independence_tests.utils.mdmr_functions.
gen_H2_perms
(X, predictors, permutation_indexes)[source]¶ Return H2 for each permutation of X indices, where H2 is the hat matrix minus the hat matrix of the untested columns.
Parameters: - X (2D numpy.array) –
is interpreted as:
- a
[n*d+1]
data matrix, a matrix withn
samples ind
dimensions
and a column of ones placed before the matrix
- a
- predictors (1D numpy.array) –
is interpreted as:
- a
[1*d]
array with the number of each variable in X used as a predictor
- a
- permutation_indexes (2D numpy.array) –
is interpreted as:
- a
[p+1*n]
matrix where p is the number of permutations given in the main code.
This matrix has p permutations of indexes of the X data.
- a
Returns: a
[p+1*n^2]
array of the flattened H2 matrices for each permutationReturn type: 2D numpy.array
- X (2D numpy.array) –
-
mgcpy.independence_tests.utils.mdmr_functions.
gen_IH_perms
(X, predictors, permutation_indexes)[source]¶ Return I-H where H is the hat matrix and I is the identity matrix.
The function calculates this correctly for multiple predictor tests.
Parameters: - X (2D numpy.array) –
is interpreted as:
- a
[n*d+1]
data matrix, a matrix withn
samples ind
dimensions
and a column of ones placed before the matrix
- a
- predictors (1D numpy.array) –
is interpreted as:
- a
[1*d]
array with the number of each variable in X used as a predictor
- a
- permutation_indexes (2D numpy.array) –
is interpreted as:
- a
[p+1*n]
matrix where p is the number of permutations given in the main code.
This matrix has p permutations of indexes of the X data.
- a
Returns: a
[p+1*n^2]
array of the flattened arrays of the IH matrix for each permutationReturn type: 2D numpy.array
- X (2D numpy.array) –
-
mgcpy.independence_tests.utils.mdmr_functions.
calc_ftest
(Hs, IHs, Gs, m2, nm)[source]¶ This function calculates the pseudo-F statistic.
Parameters: - Hs (2D numpy.array) –
is interpreted as:
- a
[p+1*n^2]
array with the flattened H2 matrix for each permutation
- a
- IHs (2D numpy.array) –
is interpreted as:
- a
[p+1*n^2]
array with the flattened IH matrix for each permutation
- a
- Gs (2D numpy.array) –
is interpreted as:
- a [n^2*a] array with the gower centered distance matrix where a is in practice 1
- m2 (float) –
is interpreted as:
- a float equal to the number of predictors minus the number of tests (which will be 1)
- nm (float) –
is interpreted as:
- a float equal to the number of subjects minus the number of predictors
Returns: a
[p+1*1]
array of F statistics for each permutationReturn type: 1D numpy.array
- Hs (2D numpy.array) –
-
mgcpy.independence_tests.utils.mdmr_functions.
fperms_to_pvals
(F_perms)[source]¶ This function calculates the permutation p-value from the test statistics of all permutations.
Parameters: F_perms (1D numpy.array) – is interpreted as:
- a
[p+1*1]
array of F statistics for each permutation
Returns: a float which is the permutation p-value of the F-statistic Return type: float - a