mgcpy.independence_tests.mgc_utils package¶
Submodules¶
mgcpy.independence_tests.mgc_utils.local_correlation module¶
MGC’s Local Correlation Module
-
mgcpy.independence_tests.mgc_utils.local_correlation.local_correlations(ndarray matrix_A, ndarray matrix_B, distance_metric='euclidean', base_global_correlation='mgc')¶ Computes all the local correlation coefficients in
O(n^2 log n)Parameters: - matrix_A (2D numpy.array) –
is interpreted as either:
- a
[n*n]distance matrix, a square matrix with zeros on diagonal fornsamples OR - a
[n*d]data matrix, a matrix withnsamples inddimensions
- a
- matrix_B (2D numpy.array) –
is interpreted as either:
- a
[n*n]distance matrix, a square matrix with zeros on diagonal fornsamples OR - a
[n*d]data matrix, a matrix withnsamples inddimensions
- a
- distance_metric (string) – specifies the distance_metric to use for computing the
distance_matrix, defaults to ‘euclidean’ - base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’dcor’,’mantel’, and ‘rank’. Defaults to mgc.
Returns: A
dictwith the following keys:local_correlation_matrix: a 2D matrix of all local correlations within [-1,1]local_variance_A: all local variances of A local_variance_B: all local variances of B
Return type: dictionary
Example:
>>> import numpy as np >>> from scipy.spatial import distance_matrix >>> from mgcpy.mgc.local_correlation import local_correlations >>> >>> X = np.array([[2, 1, 100], [4, 2, 10], [8, 3, 10]]) >>> Y = np.array([[30, 20, 10], [5, 10, 20], [8, 16, 32]]) >>> result = local_correlations(X, Y)
- matrix_A (2D numpy.array) –
-
mgcpy.independence_tests.mgc_utils.local_correlation.local_covariance(ndarray distance_matrix_A, ndarray distance_matrix_B, ndarray ranked_distance_matrix_A, ndarray ranked_distance_matrix_B)¶ Computes all local covariances simultaneously in
O(n^2).Parameters: - distance_matrix_A (2D numpy.array) – first distance matrix (centered or appropriately transformed),
[n*n] - distance_matrix_B (2D numpy.array) – second distance matrix (centered or appropriately transformed),
[n*n] - ranked_distance_matrix_A (2D numpy.array) – column-wise ranked matrix of
A,[n*n] - ranked_distance_matrix_B (2D numpy.array) – column-wise ranked matrix of
B,[n*n]
Returns: matrix of all local covariances,
[n*n]Return type: 2D numpy.array
- distance_matrix_A (2D numpy.array) – first distance matrix (centered or appropriately transformed),
mgcpy.independence_tests.mgc_utils.threshold_smooth module¶
MGC’s Sample Statistic Module
-
mgcpy.independence_tests.mgc_utils.threshold_smooth.threshold_local_correlations(local_correlation_matrix, sample_size)[source]¶ Finds a connected region of significance in the local correlation map by thresholding
Parameters: - local_correlation_matrix – all local correlations within
[-1,1] - sample_size (integer) – the sample size of original data
(which may not equal
mornin case of repeating data).
Returns: a binary matrix of size
mandn, with 1’s indicating the significant region.Return type: 2D numpy.array
- local_correlation_matrix – all local correlations within
-
mgcpy.independence_tests.mgc_utils.threshold_smooth.smooth_significant_local_correlations(significant_connected_region, local_correlation_matrix)[source]¶ Finds the smoothed maximal within the significant region R:
- If area of R is too small it returns the last local correlation
- Otherwise, returns the maximum within significant_connected_region.
Parameters: - significant_connected_region (2D numpy.array) – a binary matrix of size
mandn, with 1’s indicating the significant region. - local_correlation_matrix – all local correlations within
[-1,1]
Returns: A
dictwith the following keys:mgc_statistic: the sample MGC statistic within [-1, 1]optimal_scale: the estimated optimal scale as an [x, y]pair.
Return type: dictionary