mgcpy.hypothesis_tests package

Submodules

mgcpy.hypothesis_tests.transforms module

mgcpy.hypothesis_tests.transforms.k_sample_transform(x, y, is_y_categorical=False)[source]

Transform to represent a k-sample test as an independence test

Parameters:
  • X (2D numpy.array) –

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
    • a [n*p] data matrix, a matrix with n samples in p dimensions
  • Y (2D numpy.array) –

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
    • a [n*p] data matrix, a matrix with n samples in p dimensions
    • a [n*1] label matrix, categorical data for X, if is_y_categorical is set to True
  • is_y_categorical (boolean) – if set to True, Y has categorical data ans is a labels array for X, else, it is a plain data matrix
Returns:

  • u:a concatenated data matrix of dimensions [2*n, p]
  • v:a label matrix for u, which indicates to which category each data entry in u belongs to

Return type:

list

mgcpy.hypothesis_tests.transforms.paired_two_sample_transform(x, y)[source]

Transform to represent a paired two-sample test as an independence test Steps:

  • combine x and y to get the joint_distribution
  • sample n pairs from the joint_distribution
  • compute the eucledian distance between the sampled n pairs, which is randomly_sampled_pairs_distance
  • compute the eucledian distance between the actual x and y, which is actual_pairs_distance
  • compute the two sample transformed matrices of randomly_sampled_pairs_distance and actual_pairs_distance
Parameters:
  • X (2D numpy.array) – is interpreted as either: - a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR - a [n*p] data matrix, a matrix with n samples in p dimensions
  • Y (2D numpy.array) – is interpreted as either: - a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR - a [n*p] data matrix, a matrix with n samples in p dimensions
Returns:

  • u:a data matrix of dimensions [2*n, p]
  • v:a label matrix for u, which indicates to which category each data entry in u belongs to

Return type:

list

mgcpy.hypothesis_tests.transforms.paired_two_sample_test_dcorr(x, y, which_test='biased', compute_distance_matrix=None, is_fast=False)[source]

Compute paired two sample test’s DCorr test_statistic

Parameters:
  • X (2D numpy.array) –

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
    • a [n*p] data matrix, a matrix with n samples in p dimensions
  • Y (2D numpy.array) –

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR
    • a [n*p] data matrix, a matrix with n samples in p dimensions
Returns:

paired two sample DCorr test_statistic

Return type:

float

Module contents