mgcpy.independence_tests.mgc_utils package

Submodules

mgcpy.independence_tests.mgc_utils.local_correlation module

MGC’s Local Correlation Module

mgcpy.independence_tests.mgc_utils.local_correlation.local_correlations(ndarray matrix_A, ndarray matrix_B, distance_metric='euclidean', base_global_correlation='mgc')

Computes all the local correlation coefficients in O(n^2 log n)

Parameters:
  • matrix_A (2D numpy.array) –

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
    • a [n*d] data matrix, a matrix with n samples in d dimensions
  • matrix_B (2D numpy.array) –

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
    • a [n*d] data matrix, a matrix with n samples in d dimensions
  • distance_metric (string) – specifies the distance_metric to use for computing the distance_matrix, defaults to ‘euclidean’
  • base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’dcor’,’mantel’, and ‘rank’. Defaults to mgc.
Returns:

A dict with the following keys:

  • local_correlation_matrix:
     a 2D matrix of all local correlations within [-1,1]
  • local_variance_A:
     all local variances of A
  • local_variance_B:
     all local variances of B

Return type:

dictionary

Example:

>>> import numpy as np
>>> from scipy.spatial import distance_matrix
>>> from mgcpy.mgc.local_correlation import local_correlations
>>>
>>> X = np.array([[2, 1, 100], [4, 2, 10], [8, 3, 10]])
>>> Y = np.array([[30, 20, 10], [5, 10, 20], [8, 16, 32]])
>>> result = local_correlations(X, Y)
mgcpy.independence_tests.mgc_utils.local_correlation.local_covariance(ndarray distance_matrix_A, ndarray distance_matrix_B, ndarray ranked_distance_matrix_A, ndarray ranked_distance_matrix_B)

Computes all local covariances simultaneously in O(n^2).

Parameters:
  • distance_matrix_A (2D numpy.array) – first distance matrix (centered or appropriately transformed), [n*n]
  • distance_matrix_B (2D numpy.array) – second distance matrix (centered or appropriately transformed), [n*n]
  • ranked_distance_matrix_A (2D numpy.array) – column-wise ranked matrix of A, [n*n]
  • ranked_distance_matrix_B (2D numpy.array) – column-wise ranked matrix of B, [n*n]
Returns:

matrix of all local covariances, [n*n]

Return type:

2D numpy.array

mgcpy.independence_tests.mgc_utils.threshold_smooth module

MGC’s Sample Statistic Module

mgcpy.independence_tests.mgc_utils.threshold_smooth.threshold_local_correlations(local_correlation_matrix, sample_size)[source]

Finds a connected region of significance in the local correlation map by thresholding

Parameters:
  • local_correlation_matrix – all local correlations within [-1,1]
  • sample_size (integer) – the sample size of original data (which may not equal m or n in case of repeating data).
Returns:

a binary matrix of size m and n, with 1’s indicating the significant region.

Return type:

2D numpy.array

mgcpy.independence_tests.mgc_utils.threshold_smooth.smooth_significant_local_correlations(significant_connected_region, local_correlation_matrix)[source]

Finds the smoothed maximal within the significant region R:

  • If area of R is too small it returns the last local correlation
  • Otherwise, returns the maximum within significant_connected_region.
Parameters:
  • significant_connected_region (2D numpy.array) – a binary matrix of size m and n, with 1’s indicating the significant region.
  • local_correlation_matrix – all local correlations within [-1,1]
Returns:

A dict with the following keys:

  • mgc_statistic:the sample MGC statistic within [-1, 1]
  • optimal_scale:the estimated optimal scale as an [x, y] pair.

Return type:

dictionary

Module contents