mgcpy.independence_tests.utils package¶

Submodules¶

mgcpy.independence_tests.utils.compute_distance_matrix module¶

Common Distance Calculation Matrix

mgcpy.independence_tests.utils.compute_distance_matrix.compute_distance(matrix_X, matrix_Y, _compute_distance)[source]¶

Computes the distance matrix among both independence tests

Parameters:

matrix_X (2D numpy.array) – is interpreted as a [n*p] data matrix, a matrix with n samples in p dimensions
matrix_Y (2D numpy.array) – is interpreted as a [n*q] data matrix, a matrix with n samples in q dimensions
_compute_distance (FunctionType or callable()) – is interpreted as the distance matrix calculation with the specified metric

Returns:

returns a list of two items, that contains:

matrix_X: the calculated distance matrix for matrix_X
matrix_Y: the calculated distance matrix for matrix_Y

Return type:

list

mgcpy.independence_tests.utils.distance_transform module¶

MGC’s Distance Transform Module

mgcpy.independence_tests.utils.distance_transform.center_distance_matrix(ndarray distance_matrix, str base_global_correlation='mgc', is_ranked=True)¶

Appropriately transform distance matrices by centering them, based on the specified global correlation to build on

Parameters:

distance_matrix (2D numpy.array) – a symmetric distance matrix
base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’unbiased’, ‘biased’, ‘mantel’, and ‘rank’. Defaults to mgc.
is_ranked (boolean) – specifies whether ranking within a column is computed or not Defaults to True.

Returns:

A dict with the following keys:

centered_distance_matrix:

a [n*n] centered distance matrix
ranked_distance_matrix:

a [n*n] column-ranked distance matrix

Return type:

dictionary

mgcpy.independence_tests.utils.distance_transform.dense_rank_data(ndarray data_matrix)¶

Equivalent to scipy.stats.rankdata(x, “dense”), but faster!

Parameters:	data_matrix – any data matrix.
Returns:	dense ranked `data_matrix`
Return type:	2D numpy.array

mgcpy.independence_tests.utils.distance_transform.rank_distance_matrix(ndarray distance_matrix)¶

Sorts the entries within each column in ascending order

For ties, the “minimum” ranking is used, e.g. if there are repeating distance entries, The order is like 1,2,2,3,3,4,…

Parameters:	distance_matrix (2D numpy.array) – a symmetric distance matrix.
Returns:	column-wise ranked matrix of `distance_matrix`
Return type:	2D numpy.array

mgcpy.independence_tests.utils.distance_transform.transform_distance_matrix(ndarray distance_matrix_A, ndarray distance_matrix_B, str base_global_correlation='mgc', is_ranked=True)¶

Transforms the distance matrices appropriately, with column-wise ranking if needed.

Parameters:

distance_matrix_A – first symmetric distance matrix, [n*n]
distance_matrix_B – second symmetric distance matrix, [n*n]
base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’unbiased’, ‘biased’, ‘mantel’, and ‘rank’. Defaults to mgc.
is_ranked (boolean) – specifies whether ranking within a column is computed or not, if, base_global_correlation = “rank”, then ranking is performed regardless of the value if is_ranked. Defaults to True.

Returns:

A dict with the following keys:

centered_distance_matrix_A:

a [n*n] centered distance matrix of A
centered_distance_matrix_B:

a [n*n] centered distance matrix of B
ranked_distance_matrix_A:

a [n*n] column-ranked distance matrix of A
ranked_distance_matrix_B:

a [n*n] column-ranked distance matrix of B

Return type:

dictionary

Example:

>>> import numpy as np
>>> from scipy.spatial import distance_matrix
>>> from mgcpy.mgc.distance_transform import transform_distance_matrix
>>>
>>> X = np.array([[2, 1, 100], [4, 2, 10], [8, 3, 10]])
>>> Y = np.array([[30, 20, 10], [5, 10, 20], [8, 16, 32]])
>>> X_distance_matrix = distance_matrix(X, X)
>>> Y_distance_matrix = distance_matrix(Y, Y)
>>> transformed_distance_matrix_X_Y = transform_distance_matrix(X_distance_matrix, Y_distance_matrix)

mgcpy.independence_tests.utils.fast_functions module¶

Common Functions used in Fast Dcorr and Fast MGC

mgcpy.independence_tests.utils.mdmr_functions module¶

mgcpy.independence_tests.utils.mdmr_functions.check_rank(X)[source]¶

This function checks if X is rank deficient.

Parameters:

matrix_X (2D numpy.array) –

is interpreted as:

a [n*d] data matrix, a matrix with n samples in d dimensions

Return type: None

Raise: Raises Exception if X matrix is rank deficient.

mgcpy.independence_tests.utils.mdmr_functions.hatify(X)[source]¶

Calculates the “hat” matrix.

Parameters:

X (2D numpy.array) –

is interpreted as:

a [n*d] data matrix, a matrix with n samples in d dimensions

Returns: returns the hat matrix of the data matrix input.

Return type: 2D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.gower_center(Y)[source]¶

Computes Gower’s centered similarity matrix.

Parameters:

Y (2D numpy.array) –

is interpreted as:

a [n*n] distance matrix

Returns: returns the gower centered similarity matrix of the input matrix.

Return type: 2D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.gower_center_many(Ys)[source]¶

Gower centers each matrix in the input, which is a special centering process disucssed in detail in Gower (1966).

Parameters:

Ys (2D numpy.array Note: in practice this function is only run on one matrix currently. Due to this, Ys will just be a 1D numpy.array) –

is interpreted as:

an array of [n^2*1] distance matrices

Returns: returns the gower centered similarity matrix of the all input matrices.

Return type: 2D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.gen_H2_perms(X, predictors, permutation_indexes)[source]¶

Return H2 for each permutation of X indices, where H2 is the hat matrix minus the hat matrix of the untested columns.

Parameters:	X (2D numpy.array) – is interpreted as: a `[nd+1]` data matrix, a matrix with `n` samples in `d` dimensions and a column of ones placed before the matrix predictors* (1D numpy.array) – is interpreted as: a `[1d]` array with the number of each variable in X used as a predictor permutation_indexes* (2D numpy.array) – is interpreted as: a `[p+1*n]` matrix where p is the number of permutations given in the main code. This matrix has p permutations of indexes of the X data.
Returns:	a `[p+1*n^2]` array of the flattened H2 matrices for each permutation
Return type:	2D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.gen_IH_perms(X, predictors, permutation_indexes)[source]¶

Return I-H where H is the hat matrix and I is the identity matrix.

The function calculates this correctly for multiple predictor tests.

Parameters:	X (2D numpy.array) – is interpreted as: a `[nd+1]` data matrix, a matrix with `n` samples in `d` dimensions and a column of ones placed before the matrix predictors* (1D numpy.array) – is interpreted as: a `[1d]` array with the number of each variable in X used as a predictor permutation_indexes* (2D numpy.array) – is interpreted as: a `[p+1*n]` matrix where p is the number of permutations given in the main code. This matrix has p permutations of indexes of the X data.
Returns:	a `[p+1*n^2]` array of the flattened arrays of the IH matrix for each permutation
Return type:	2D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.calc_ftest(Hs, IHs, Gs, m2, nm)[source]¶

This function calculates the pseudo-F statistic.

Parameters:	Hs (2D numpy.array) – is interpreted as: a `[p+1n^2]` array with the flattened H2 matrix for each permutation IHs* (2D numpy.array) – is interpreted as: a `[p+1n^2]` array with the flattened IH matrix for each permutation Gs (2D numpy.array) – is interpreted as: a [n^2a] array with the gower centered distance matrix where a is in practice 1 m2 (float) – is interpreted as: a float equal to the number of predictors minus the number of tests (which will be 1) nm (float) – is interpreted as: a float equal to the number of subjects minus the number of predictors
Returns:	a `[p+1*1]` array of F statistics for each permutation
Return type:	1D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.fperms_to_pvals(F_perms)[source]¶

This function calculates the permutation p-value from the test statistics of all permutations.

Parameters:

F_perms (1D numpy.array) –

is interpreted as:

a [p+1*1] array of F statistics for each permutation

Returns: a float which is the permutation p-value of the F-statistic

Return type: float

mgcpy.independence_tests.utils package¶

Submodules¶

mgcpy.independence_tests.utils.compute_distance_matrix module¶

mgcpy.independence_tests.utils.distance_transform module¶

mgcpy.independence_tests.utils.fast_functions module¶

mgcpy.independence_tests.utils.mdmr_functions module¶

Module contents¶