mgcpy.benchmarks package¶

Subpackages¶

mgcpy.benchmarks.hypothesis_tests package
- Subpackages
  - mgcpy.benchmarks.hypothesis_tests.three_sample_test package
  - mgcpy.benchmarks.hypothesis_tests.two_sample_test package
- Module contents

Submodules¶

mgcpy.benchmarks.power module¶

mgcpy.benchmarks.power.power(independence_test, sample_generator, num_samples=100, num_dimensions=1, noise=0.0, repeats=1000, alpha=0.05, simulation_type='')[source]¶

Estimate the power of an independence test given a simulator to sample from

Return empirical_power:
Parameters:	independence_test (`Object(Independence_Test)`) – an object whose class inherits from the `Independence_Test` abstract class sample_generator (`FunctionType` or `callable()`) – a function used to generate simulation from `simulations` with parameters given by the following arguments - `num_samples`: default to 100 - `num_dimensions`: default to 1 - `noise`: default to 0 num_samples (int) – the number of samples generated by the simulation (default to 100) num_dimensions (int) – the number of dimensions of the samples generated by the simulation (default to 1) noise (float) – the noise used in simulation (default to 0) repeats (int) – the number of times we generate new samples to estimate the null/alternative distribution (default to 1000) alpha (float) – the type I error level (default to 0.05) simulation_type (string) – specify simulation when necessary (default to empty string)
	the estimated power
Return type:	numpy.float

Example

>>> from mgcpy.benchmarks.power import power
>>> from mgcpy.independence_tests.mgc.mgc import MGC
>>> from mgcpy.benchmarks.simulations import circle_sim
>>> mgc = MGC()
>>> mgc_power = power(mgc, circle_sim, num_samples=100, num_dimensions=2, simulation_type='ellipse')

mgcpy.benchmarks.power.power_given_data(independence_test, simulation_type, data_type='dimension', num_samples=100, num_dimensions=1, repeats=1000, alpha=0.05, additional_params={})[source]¶

Estimate the power of an independence test given pre-generated data from the repository MGC-paper Mostly for internal testing purposes

Return empirical_power:
Parameters:	independence_test (`Object(Independence_Test)`) – an object whose class inherits from the `Independence_Test` abstract class simulation_type (int within `[1, 20]`) – specify which simulation is used data_type (string, either 'dimension' or 'sample_size') – the pre-generated data is either increasing in dimensions or increasing in sample sizes num_samples (int) – the number of samples generated by the simulation (default to 100) num_dimensions (int) – the number of dimensions of the samples generated by the simulation (default to 1) noise (float) – the noise used in simulation (default to 0) repeats (int) – the number of times we generate new samples to estimate the null/alternative distribution (default to 1000) alpha (float) – the type I error level (default to 0.05)
	the estimated power
Return type:	numpy.float

Example

>>> from mgcpy.benchmarks.power import power_given_data
>>> from mgcpy.independence_tests.mgc.mgc import MGC
>>> from mgcpy.benchmarks.simulations import circle_sim
>>> mgc = MGC()
>>> mgc_power = power_given_data(mgc, simulation_type=4, num_samples=100, num_dimensions=2)

mgcpy.benchmarks.simulations module¶

mgcpy.benchmarks.simulations.gen_coeffs(num_dim)[source]¶

Helper function for generating a linear simulation.

Parameters:	num_dim – number of dimensions for the simulation
Returns:	a vector of coefficients

mgcpy.benchmarks.simulations.gen_x_unif(num_samp, num_dim, low=-1, high=1)[source]¶

Helper function for generating n samples from d-dimensional vector

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation low – the lower limit of the data matrix, defaults to -1 high – the upper limit of the data matrix, defaults to 1
Returns:	uniformly distributed simulated data matrix

mgcpy.benchmarks.simulations.linear_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]¶

Function for generating a linear simulation.

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation noise – noise level of the simulation, defaults to 1 indep – whether to sample x and y independently, defaults to false low – the lower limit of the data matrix, defaults to -1 high – the upper limit of the data matrix, defaults to 1
Returns:	the data matrix and a response array

mgcpy.benchmarks.simulations.exp_sim(num_samp, num_dim, noise=10, indep=False, low=0, high=3)[source]¶

Function for generating an exponential simulation.

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation noise – noise level of the simulation, defaults to 10 indep – whether to sample x and y independently, defaults to false low – the lower limit of the data matrix, defaults to 0 high – the upper limit of the data matrix, defaults to 3
Returns:	the data matrix and a response array

mgcpy.benchmarks.simulations.cub_sim(num_samp, num_dim, noise=80, indep=False, low=-1, high=1, cub_coeff=array([-12, 48, 128]), scale=0.3333333333333333)[source]¶

Function for generating a cubic simulation.

Parameters:

num_samp – number of samples for the simulation
num_dim – number of dimensions for the simulation
noise – noise level of the simulation, defaults to 80
indep – whether to sample x and y independently, defaults to False
low – the lower limit of the data matrix, defaults to -1
high – the upper limit of the data matrix, defaults to 1
cub_coeff – coefficients of the cubic function where each value corresponds to the respective order coefficientj, defaults to [-12, 48, 128]
scale – scaling center of the cubic, defaults to 1/3

Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.joint_sim(num_samp, num_dim, noise=0.5)[source]¶

Function for generating a joint-normal simulation.

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation noise – noise level of the simulation, defaults to 80
Returns:	the data matrix and a response array

mgcpy.benchmarks.simulations.step_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]¶

Function for generating a joint-normal simulation.

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation noise – noise level of the simulation, defaults to 1 indep – whether to sample x and y independently, defaults to false low – the lower limit of the data matrix, defaults to -1 high – the upper limit of the data matrix, defaults to 1
Returns:	the data matrix and a response array

mgcpy.benchmarks.simulations.quad_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]¶

Function for generating a quadratic simulation.

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation noise – noise level of the simulation, defaults to 1 indep – whether to sample x and y independently, defaults to false low – the lower limit of the data matrix, defaults to -1 high – the upper limit of the data matrix, defaults to 1
Returns:	the data matrix and a response array

mgcpy.benchmarks.simulations.w_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]¶

Function for generating a w-shaped simulation.

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation noise – noise level of the simulation, defaults to 1 indep – whether to sample x and y independently, defaults to false low – the lower limit of the data matrix, defaults to -1 high – the upper limit of the data matrix, defaults to 1
Returns:	the data matrix and a response array

mgcpy.benchmarks.simulations.spiral_sim(num_samp, num_dim, noise=0.4, low=0, high=5)[source]¶

Function for generating a spiral simulation.

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation noise – noise level of the simulation, defaults to 0.4 low – the lower limit of the data matrix, defaults to 0 high – the upper limit of the data matrix, defaults to 5
Returns:	the data matrix and a response array

mgcpy.benchmarks.simulations.ubern_sim(num_samp, num_dim, noise=0.5, bern_prob=0.5)[source]¶

Function for generating an uncorrelated bernoulli simulation.

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation noise – noise level of the simulation, defaults to 0.5 bern_prob – the bernoulli probability, defaults to 0.5
Returns:	the data matrix and a response array

mgcpy.benchmarks.simulations.log_sim(num_samp, num_dim, noise=3, indep=False, base=2)[source]¶

Function for generating a logarithmic simulation.

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation noise – noise level of the simulation, defaults to 1 indep – whether to sample x and y independently, defaults to false base – the base of the log, defaults to 2
Returns:	the data matrix and a response array

mgcpy.benchmarks.simulations.root_sim(num_samp, num_dim, noise=0.25, indep=False, low=-1, high=1, n_root=4)[source]¶

Function for generating an nth root simulation.

Parameters:

num_samp – number of samples for the simulation
num_dim – number of dimensions for the simulation
noise – noise level of the simulation, defaults to 1
indep – whether to sample x and y independently, defaults to false
low – the lower limit of the data matrix, defaults to -1
high – the upper limit of the data matrix, defaults to 1
n_root – the root of the simulation, defaults to 4

Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.sin_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1, period=12.566370614359172)[source]¶

Function for generating a sinusoid simulation.

Note: For producing 4*pi and 16*pi simulations, change the period to the respective value.

Parameters:

num_samp – number of samples for the simulation
num_dim – number of dimensions for the simulation
noise – noise level of the simulation, defaults to 1
indep – whether to sample x and y independently, defaults to false
low – the lower limit of the data matrix, defaults to -1
high – the upper limit of the data matrix, defaults to 1
period – the period of the sine wave, defaults to 4*pi

Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.square_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1, period=-0.39269908169872414)[source]¶

Function for generating a square or diamond simulation.

Note: For producing square or diamond simulations, change the period to -pi/8 or -pi/4.

Parameters:

num_samp – number of samples for the simulation
num_dim – number of dimensions for the simulation
noise – noise level of the simulation, defaults to 0.05
indep – whether to sample x and y independently, defaults to false
low – the lower limit of the data matrix, defaults to -1
high – the upper limit of the data matrix, defaults to 1
period – the period of the sine and cosine square equation, defaults to 4*pi

Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.two_parab_sim(num_samp, num_dim, noise=2, low=-1, high=1, prob=0.5)[source]¶

Function for generating a two parabolas simulation.

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation noise – noise level of the simulation, defaults to 2 low – the lower limit of the data matrix, defaults to -1 high – the upper limit of the data matrix, defaults to 1 prob – the binomial probability, defaults to 0.5
Returns:	the data matrix and a response array

mgcpy.benchmarks.simulations.circle_sim(num_samp, num_dim, noise=0.4, low=-1, high=1, radius=1)[source]¶

Function for generating a circle or ellipse simulation.

Note: For producing circle or ellipse simulations, change the radius to 1 or 5.

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation noise – noise level of the simulation, defaults to 0.4 low – the lower limit of the data matrix, defaults to -1 high – the upper limit of the data matrix, defaults to 1 radius – the radius of the circle or ellipse, defaults to 1
Returns:	the data matrix and a response array

mgcpy.benchmarks.simulations.multi_noise_sim(num_samp, num_dim)[source]¶

Function for generating a multiplicative noise simulation.

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation
Returns:	the data matrix and a response array

mgcpy.benchmarks.simulations.multi_indep_sim(num_samp, num_dim, prob=0.5, sep1=3, sep2=2)[source]¶

Function for generating a multimodal independence simulation.

Parameters:	num_samp – number of samples for the simulation num_dim – number of dimensions for the simulation prob – the binomial probability, defaults to 0.5 sep1 – determines the size and separation of clusters, defaults to 3 sep2 – determines the size and separation of clusters, defaults to 2
Returns:	the data matrix and a response array

mgcpy.benchmarks package¶

Subpackages¶

Submodules¶

mgcpy.benchmarks.power module¶

mgcpy.benchmarks.simulations module¶

Module contents¶