mgcpy.independence_tests.mgc_utils package¶

Submodules¶

mgcpy.independence_tests.mgc_utils.local_correlation module¶

MGC’s Local Correlation Module

mgcpy.independence_tests.mgc_utils.local_correlation.local_correlations(ndarray matrix_A, ndarray matrix_B, distance_metric='euclidean', base_global_correlation='mgc')¶

Computes all the local correlation coefficients in O(n^2 log n)

Parameters:

matrix_A (2D numpy.array) –
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*d] data matrix, a matrix with n samples in d dimensions
matrix_B (2D numpy.array) –
is interpreted as either:
- a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
- a [n*d] data matrix, a matrix with n samples in d dimensions
distance_metric (string) – specifies the distance_metric to use for computing the distance_matrix, defaults to ‘euclidean’
base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’dcor’,’mantel’, and ‘rank’. Defaults to mgc.

Returns:

A dict with the following keys:

local_correlation_matrix:

a 2D matrix of all local correlations within [-1,1]
local_variance_A:

all local variances of A
local_variance_B:

all local variances of B

Return type:

dictionary

Example:

>>> import numpy as np
>>> from scipy.spatial import distance_matrix
>>> from mgcpy.mgc.local_correlation import local_correlations
>>>
>>> X = np.array([[2, 1, 100], [4, 2, 10], [8, 3, 10]])
>>> Y = np.array([[30, 20, 10], [5, 10, 20], [8, 16, 32]])
>>> result = local_correlations(X, Y)

mgcpy.independence_tests.mgc_utils.local_correlation.local_covariance(ndarray distance_matrix_A, ndarray distance_matrix_B, ndarray ranked_distance_matrix_A, ndarray ranked_distance_matrix_B)¶

Computes all local covariances simultaneously in O(n^2).

Parameters:	distance_matrix_A (2D numpy.array) – first distance matrix (centered or appropriately transformed), `[nn]` distance_matrix_B* (2D numpy.array) – second distance matrix (centered or appropriately transformed), `[nn]` ranked_distance_matrix_A* (2D numpy.array) – column-wise ranked matrix of `A`, `[nn]` ranked_distance_matrix_B* (2D numpy.array) – column-wise ranked matrix of `B`, `[n*n]`
Returns:	matrix of all local covariances, `[n*n]`
Return type:	2D numpy.array

mgcpy.independence_tests.mgc_utils.threshold_smooth module¶

MGC’s Sample Statistic Module

mgcpy.independence_tests.mgc_utils.threshold_smooth.threshold_local_correlations(local_correlation_matrix, sample_size)[source]¶

Finds a connected region of significance in the local correlation map by thresholding

Parameters:	local_correlation_matrix – all local correlations within `[-1,1]` sample_size (integer) – the sample size of original data (which may not equal `m` or `n` in case of repeating data).
Returns:	a binary matrix of size `m` and `n`, with 1’s indicating the significant region.
Return type:	2D numpy.array

mgcpy.independence_tests.mgc_utils.threshold_smooth.smooth_significant_local_correlations(significant_connected_region, local_correlation_matrix)[source]¶

Finds the smoothed maximal within the significant region R:

If area of R is too small it returns the last local correlation

Otherwise, returns the maximum within significant_connected_region.