mgcpy.independence_tests.utils package

Submodules

mgcpy.independence_tests.utils.compute_distance_matrix module

Common Distance Calculation Matrix

mgcpy.independence_tests.utils.compute_distance_matrix.compute_distance(matrix_X, matrix_Y, _compute_distance)[source]

Computes the distance matrix among both independence tests

Parameters:
  • matrix_X (2D numpy.array) – is interpreted as a [n*p] data matrix, a matrix with n samples in p dimensions
  • matrix_Y (2D numpy.array) – is interpreted as a [n*q] data matrix, a matrix with n samples in q dimensions
  • _compute_distance (FunctionType or callable()) – is interpreted as the distance matrix calculation with the specified metric
Returns:

returns a list of two items, that contains:

  • matrix_X:the calculated distance matrix for matrix_X
  • matrix_Y:the calculated distance matrix for matrix_Y

Return type:

list

mgcpy.independence_tests.utils.distance_transform module

MGC’s Distance Transform Module

mgcpy.independence_tests.utils.distance_transform.center_distance_matrix(ndarray distance_matrix, str base_global_correlation='mgc', is_ranked=True)

Appropriately transform distance matrices by centering them, based on the specified global correlation to build on

Parameters:
  • distance_matrix (2D numpy.array) – a symmetric distance matrix
  • base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’unbiased’, ‘biased’, ‘mantel’, and ‘rank’. Defaults to mgc.
  • is_ranked (boolean) – specifies whether ranking within a column is computed or not Defaults to True.
Returns:

A dict with the following keys:

  • centered_distance_matrix:
     a [n*n] centered distance matrix
  • ranked_distance_matrix:
     a [n*n] column-ranked distance matrix

Return type:

dictionary

mgcpy.independence_tests.utils.distance_transform.dense_rank_data(ndarray data_matrix)

Equivalent to scipy.stats.rankdata(x, “dense”), but faster!

Parameters:data_matrix – any data matrix.
Returns:dense ranked data_matrix
Return type:2D numpy.array
mgcpy.independence_tests.utils.distance_transform.rank_distance_matrix(ndarray distance_matrix)

Sorts the entries within each column in ascending order

For ties, the “minimum” ranking is used, e.g. if there are repeating distance entries, The order is like 1,2,2,3,3,4,…

Parameters:distance_matrix (2D numpy.array) – a symmetric distance matrix.
Returns:column-wise ranked matrix of distance_matrix
Return type:2D numpy.array
mgcpy.independence_tests.utils.distance_transform.transform_distance_matrix(ndarray distance_matrix_A, ndarray distance_matrix_B, str base_global_correlation='mgc', is_ranked=True)

Transforms the distance matrices appropriately, with column-wise ranking if needed.

Parameters:
  • distance_matrix_A – first symmetric distance matrix, [n*n]
  • distance_matrix_B – second symmetric distance matrix, [n*n]
  • base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’unbiased’, ‘biased’, ‘mantel’, and ‘rank’. Defaults to mgc.
  • is_ranked (boolean) – specifies whether ranking within a column is computed or not, if, base_global_correlation = “rank”, then ranking is performed regardless of the value if is_ranked. Defaults to True.
Returns:

A dict with the following keys:

  • centered_distance_matrix_A:
     a [n*n] centered distance matrix of A
  • centered_distance_matrix_B:
     a [n*n] centered distance matrix of B
  • ranked_distance_matrix_A:
     a [n*n] column-ranked distance matrix of A
  • ranked_distance_matrix_B:
     a [n*n] column-ranked distance matrix of B

Return type:

dictionary

Example:

>>> import numpy as np
>>> from scipy.spatial import distance_matrix
>>> from mgcpy.mgc.distance_transform import transform_distance_matrix
>>>
>>> X = np.array([[2, 1, 100], [4, 2, 10], [8, 3, 10]])
>>> Y = np.array([[30, 20, 10], [5, 10, 20], [8, 16, 32]])
>>> X_distance_matrix = distance_matrix(X, X)
>>> Y_distance_matrix = distance_matrix(Y, Y)
>>> transformed_distance_matrix_X_Y = transform_distance_matrix(X_distance_matrix, Y_distance_matrix)

mgcpy.independence_tests.utils.fast_functions module

Common Functions used in Fast Dcorr and Fast MGC

mgcpy.independence_tests.utils.mdmr_functions module

mgcpy.independence_tests.utils.mdmr_functions.check_rank(X)[source]

This function checks if X is rank deficient.

Parameters:matrix_X (2D numpy.array) –

is interpreted as:

  • a [n*d] data matrix, a matrix with n samples in d dimensions
Return type:None
Raise:Raises Exception if X matrix is rank deficient.
mgcpy.independence_tests.utils.mdmr_functions.hatify(X)[source]

Calculates the “hat” matrix.

Parameters:X (2D numpy.array) –

is interpreted as:

  • a [n*d] data matrix, a matrix with n samples in d dimensions
Returns:returns the hat matrix of the data matrix input.
Return type:2D numpy.array
mgcpy.independence_tests.utils.mdmr_functions.gower_center(Y)[source]

Computes Gower’s centered similarity matrix.

Parameters:Y (2D numpy.array) –

is interpreted as:

  • a [n*n] distance matrix
Returns:returns the gower centered similarity matrix of the input matrix.
Return type:2D numpy.array
mgcpy.independence_tests.utils.mdmr_functions.gower_center_many(Ys)[source]

Gower centers each matrix in the input, which is a special centering process disucssed in detail in Gower (1966).

Parameters:Ys (2D numpy.array Note: in practice this function is only run on one matrix currently. Due to this, Ys will just be a 1D numpy.array) –

is interpreted as:

  • an array of [n^2*1] distance matrices
Returns:returns the gower centered similarity matrix of the all input matrices.
Return type:2D numpy.array
mgcpy.independence_tests.utils.mdmr_functions.gen_H2_perms(X, predictors, permutation_indexes)[source]

Return H2 for each permutation of X indices, where H2 is the hat matrix minus the hat matrix of the untested columns.

Parameters:
  • X (2D numpy.array) –

    is interpreted as:

    • a [n*d+1] data matrix, a matrix with n samples in d dimensions

    and a column of ones placed before the matrix

  • predictors (1D numpy.array) –

    is interpreted as:

    • a [1*d] array with the number of each variable in X used as a predictor
  • permutation_indexes (2D numpy.array) –

    is interpreted as:

    • a [p+1*n] matrix where p is the number of permutations given in the main code.

    This matrix has p permutations of indexes of the X data.

Returns:

a [p+1*n^2] array of the flattened H2 matrices for each permutation

Return type:

2D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.gen_IH_perms(X, predictors, permutation_indexes)[source]

Return I-H where H is the hat matrix and I is the identity matrix.

The function calculates this correctly for multiple predictor tests.

Parameters:
  • X (2D numpy.array) –

    is interpreted as:

    • a [n*d+1] data matrix, a matrix with n samples in d dimensions

    and a column of ones placed before the matrix

  • predictors (1D numpy.array) –

    is interpreted as:

    • a [1*d] array with the number of each variable in X used as a predictor
  • permutation_indexes (2D numpy.array) –

    is interpreted as:

    • a [p+1*n] matrix where p is the number of permutations given in the main code.

    This matrix has p permutations of indexes of the X data.

Returns:

a [p+1*n^2] array of the flattened arrays of the IH matrix for each permutation

Return type:

2D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.calc_ftest(Hs, IHs, Gs, m2, nm)[source]

This function calculates the pseudo-F statistic.

Parameters:
  • Hs (2D numpy.array) –

    is interpreted as:

    • a [p+1*n^2] array with the flattened H2 matrix for each permutation
  • IHs (2D numpy.array) –

    is interpreted as:

    • a [p+1*n^2] array with the flattened IH matrix for each permutation
  • Gs (2D numpy.array) –

    is interpreted as:

    • a [n^2*a] array with the gower centered distance matrix where a is in practice 1
  • m2 (float) –

    is interpreted as:

    • a float equal to the number of predictors minus the number of tests (which will be 1)
  • nm (float) –

    is interpreted as:

    • a float equal to the number of subjects minus the number of predictors
Returns:

a [p+1*1] array of F statistics for each permutation

Return type:

1D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.fperms_to_pvals(F_perms)[source]

This function calculates the permutation p-value from the test statistics of all permutations.

Parameters:F_perms (1D numpy.array) –

is interpreted as:

  • a [p+1*1] array of F statistics for each permutation
Returns:a float which is the permutation p-value of the F-statistic
Return type:float

Module contents