mgcpy.independence_tests.mgc_utils package¶
Submodules¶
mgcpy.independence_tests.mgc_utils.local_correlation module¶
MGC’s Local Correlation Module
-
mgcpy.independence_tests.mgc_utils.local_correlation.
local_correlations
(ndarray matrix_A, ndarray matrix_B, distance_metric='euclidean', base_global_correlation='mgc')¶ Computes all the local correlation coefficients in
O(n^2 log n)
Parameters: - matrix_A (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*d]
data matrix, a matrix withn
samples ind
dimensions
- a
- matrix_B (2D numpy.array) –
is interpreted as either:
- a
[n*n]
distance matrix, a square matrix with zeros on diagonal forn
samples OR - a
[n*d]
data matrix, a matrix withn
samples ind
dimensions
- a
- distance_metric (string) – specifies the distance_metric to use for computing the
distance_matrix
, defaults to ‘euclidean’ - base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’dcor’,’mantel’, and ‘rank’. Defaults to mgc.
Returns: A
dict
with the following keys:local_correlation_matrix: a 2D matrix of all local correlations within [-1,1]
local_variance_A: all local variances of A local_variance_B: all local variances of B
Return type: dictionary
Example:
>>> import numpy as np >>> from scipy.spatial import distance_matrix >>> from mgcpy.mgc.local_correlation import local_correlations >>> >>> X = np.array([[2, 1, 100], [4, 2, 10], [8, 3, 10]]) >>> Y = np.array([[30, 20, 10], [5, 10, 20], [8, 16, 32]]) >>> result = local_correlations(X, Y)
- matrix_A (2D numpy.array) –
-
mgcpy.independence_tests.mgc_utils.local_correlation.
local_covariance
(ndarray distance_matrix_A, ndarray distance_matrix_B, ndarray ranked_distance_matrix_A, ndarray ranked_distance_matrix_B)¶ Computes all local covariances simultaneously in
O(n^2)
.Parameters: - distance_matrix_A (2D numpy.array) – first distance matrix (centered or appropriately transformed),
[n*n]
- distance_matrix_B (2D numpy.array) – second distance matrix (centered or appropriately transformed),
[n*n]
- ranked_distance_matrix_A (2D numpy.array) – column-wise ranked matrix of
A
,[n*n]
- ranked_distance_matrix_B (2D numpy.array) – column-wise ranked matrix of
B
,[n*n]
Returns: matrix of all local covariances,
[n*n]
Return type: 2D numpy.array
- distance_matrix_A (2D numpy.array) – first distance matrix (centered or appropriately transformed),
mgcpy.independence_tests.mgc_utils.threshold_smooth module¶
MGC’s Sample Statistic Module
-
mgcpy.independence_tests.mgc_utils.threshold_smooth.
threshold_local_correlations
(local_correlation_matrix, sample_size)[source]¶ Finds a connected region of significance in the local correlation map by thresholding
Parameters: - local_correlation_matrix – all local correlations within
[-1,1]
- sample_size (integer) – the sample size of original data
(which may not equal
m
orn
in case of repeating data).
Returns: a binary matrix of size
m
andn
, with 1’s indicating the significant region.Return type: 2D numpy.array
- local_correlation_matrix – all local correlations within
-
mgcpy.independence_tests.mgc_utils.threshold_smooth.
smooth_significant_local_correlations
(significant_connected_region, local_correlation_matrix)[source]¶ Finds the smoothed maximal within the significant region R:
- If area of R is too small it returns the last local correlation
- Otherwise, returns the maximum within significant_connected_region.
Parameters: - significant_connected_region (2D numpy.array) – a binary matrix of size
m
andn
, with 1’s indicating the significant region. - local_correlation_matrix – all local correlations within
[-1,1]
Returns: A
dict
with the following keys:mgc_statistic: the sample MGC statistic within [-1, 1]
optimal_scale: the estimated optimal scale as an [x, y]
pair.
Return type: dictionary