mgcpy.benchmarks package

Submodules

mgcpy.benchmarks.power module

mgcpy.benchmarks.power.power(independence_test, sample_generator, num_samples=100, num_dimensions=1, noise=0.0, repeats=1000, alpha=0.05, simulation_type='')[source]

Estimate the power of an independence test given a simulator to sample from

Parameters:
  • independence_test (Object(Independence_Test)) – an object whose class inherits from the Independence_Test abstract class
  • sample_generator (FunctionType or callable()) – a function used to generate simulation from simulations with parameters given by the following arguments - num_samples: default to 100 - num_dimensions: default to 1 - noise: default to 0
  • num_samples (int) – the number of samples generated by the simulation (default to 100)
  • num_dimensions (int) – the number of dimensions of the samples generated by the simulation (default to 1)
  • noise (float) – the noise used in simulation (default to 0)
  • repeats (int) – the number of times we generate new samples to estimate the null/alternative distribution (default to 1000)
  • alpha (float) – the type I error level (default to 0.05)
  • simulation_type (string) – specify simulation when necessary (default to empty string)
Return empirical_power:
 

the estimated power

Return type:

numpy.float

Example

>>> from mgcpy.benchmarks.power import power
>>> from mgcpy.independence_tests.mgc.mgc import MGC
>>> from mgcpy.benchmarks.simulations import circle_sim
>>> mgc = MGC()
>>> mgc_power = power(mgc, circle_sim, num_samples=100, num_dimensions=2, simulation_type='ellipse')
mgcpy.benchmarks.power.power_given_data(independence_test, simulation_type, data_type='dimension', num_samples=100, num_dimensions=1, repeats=1000, alpha=0.05, additional_params={})[source]

Estimate the power of an independence test given pre-generated data from the repository MGC-paper Mostly for internal testing purposes

Parameters:
  • independence_test (Object(Independence_Test)) – an object whose class inherits from the Independence_Test abstract class
  • simulation_type (int within [1, 20]) – specify which simulation is used
  • data_type (string, either 'dimension' or 'sample_size') – the pre-generated data is either increasing in dimensions or increasing in sample sizes
  • num_samples (int) – the number of samples generated by the simulation (default to 100)
  • num_dimensions (int) – the number of dimensions of the samples generated by the simulation (default to 1)
  • noise (float) – the noise used in simulation (default to 0)
  • repeats (int) – the number of times we generate new samples to estimate the null/alternative distribution (default to 1000)
  • alpha (float) – the type I error level (default to 0.05)
Return empirical_power:
 

the estimated power

Return type:

numpy.float

Example

>>> from mgcpy.benchmarks.power import power_given_data
>>> from mgcpy.independence_tests.mgc.mgc import MGC
>>> from mgcpy.benchmarks.simulations import circle_sim
>>> mgc = MGC()
>>> mgc_power = power_given_data(mgc, simulation_type=4, num_samples=100, num_dimensions=2)

mgcpy.benchmarks.simulations module

mgcpy.benchmarks.simulations.gen_coeffs(num_dim)[source]

Helper function for generating a linear simulation.

Parameters:num_dim – number of dimensions for the simulation
Returns:a vector of coefficients
mgcpy.benchmarks.simulations.gen_x_unif(num_samp, num_dim, low=-1, high=1)[source]

Helper function for generating n samples from d-dimensional vector

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • low – the lower limit of the data matrix, defaults to -1
  • high – the upper limit of the data matrix, defaults to 1
Returns:

uniformly distributed simulated data matrix

mgcpy.benchmarks.simulations.linear_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]

Function for generating a linear simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 1
  • indep – whether to sample x and y independently, defaults to false
  • low – the lower limit of the data matrix, defaults to -1
  • high – the upper limit of the data matrix, defaults to 1
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.exp_sim(num_samp, num_dim, noise=10, indep=False, low=0, high=3)[source]

Function for generating an exponential simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 10
  • indep – whether to sample x and y independently, defaults to false
  • low – the lower limit of the data matrix, defaults to 0
  • high – the upper limit of the data matrix, defaults to 3
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.cub_sim(num_samp, num_dim, noise=80, indep=False, low=-1, high=1, cub_coeff=array([-12, 48, 128]), scale=0.3333333333333333)[source]

Function for generating a cubic simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 80
  • indep – whether to sample x and y independently, defaults to False
  • low – the lower limit of the data matrix, defaults to -1
  • high – the upper limit of the data matrix, defaults to 1
  • cub_coeff – coefficients of the cubic function where each value corresponds to the respective order coefficientj, defaults to [-12, 48, 128]
  • scale – scaling center of the cubic, defaults to 1/3
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.joint_sim(num_samp, num_dim, noise=0.5)[source]

Function for generating a joint-normal simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 80
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.step_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]

Function for generating a joint-normal simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 1
  • indep – whether to sample x and y independently, defaults to false
  • low – the lower limit of the data matrix, defaults to -1
  • high – the upper limit of the data matrix, defaults to 1
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.quad_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]

Function for generating a quadratic simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 1
  • indep – whether to sample x and y independently, defaults to false
  • low – the lower limit of the data matrix, defaults to -1
  • high – the upper limit of the data matrix, defaults to 1
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.w_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]

Function for generating a w-shaped simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 1
  • indep – whether to sample x and y independently, defaults to false
  • low – the lower limit of the data matrix, defaults to -1
  • high – the upper limit of the data matrix, defaults to 1
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.spiral_sim(num_samp, num_dim, noise=0.4, low=0, high=5)[source]

Function for generating a spiral simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 0.4
  • low – the lower limit of the data matrix, defaults to 0
  • high – the upper limit of the data matrix, defaults to 5
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.ubern_sim(num_samp, num_dim, noise=0.5, bern_prob=0.5)[source]

Function for generating an uncorrelated bernoulli simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 0.5
  • bern_prob – the bernoulli probability, defaults to 0.5
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.log_sim(num_samp, num_dim, noise=3, indep=False, base=2)[source]

Function for generating a logarithmic simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 1
  • indep – whether to sample x and y independently, defaults to false
  • base – the base of the log, defaults to 2
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.root_sim(num_samp, num_dim, noise=0.25, indep=False, low=-1, high=1, n_root=4)[source]

Function for generating an nth root simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 1
  • indep – whether to sample x and y independently, defaults to false
  • low – the lower limit of the data matrix, defaults to -1
  • high – the upper limit of the data matrix, defaults to 1
  • n_root – the root of the simulation, defaults to 4
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.sin_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1, period=12.566370614359172)[source]

Function for generating a sinusoid simulation.

Note: For producing 4*pi and 16*pi simulations, change the period to the respective value.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 1
  • indep – whether to sample x and y independently, defaults to false
  • low – the lower limit of the data matrix, defaults to -1
  • high – the upper limit of the data matrix, defaults to 1
  • period – the period of the sine wave, defaults to 4*pi
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.square_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1, period=-0.39269908169872414)[source]

Function for generating a square or diamond simulation.

Note: For producing square or diamond simulations, change the period to -pi/8 or -pi/4.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 0.05
  • indep – whether to sample x and y independently, defaults to false
  • low – the lower limit of the data matrix, defaults to -1
  • high – the upper limit of the data matrix, defaults to 1
  • period – the period of the sine and cosine square equation, defaults to 4*pi
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.two_parab_sim(num_samp, num_dim, noise=2, low=-1, high=1, prob=0.5)[source]

Function for generating a two parabolas simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 2
  • low – the lower limit of the data matrix, defaults to -1
  • high – the upper limit of the data matrix, defaults to 1
  • prob – the binomial probability, defaults to 0.5
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.circle_sim(num_samp, num_dim, noise=0.4, low=-1, high=1, radius=1)[source]

Function for generating a circle or ellipse simulation.

Note: For producing circle or ellipse simulations, change the radius to 1 or 5.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • noise – noise level of the simulation, defaults to 0.4
  • low – the lower limit of the data matrix, defaults to -1
  • high – the upper limit of the data matrix, defaults to 1
  • radius – the radius of the circle or ellipse, defaults to 1
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.multi_noise_sim(num_samp, num_dim)[source]

Function for generating a multiplicative noise simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
Returns:

the data matrix and a response array

mgcpy.benchmarks.simulations.multi_indep_sim(num_samp, num_dim, prob=0.5, sep1=3, sep2=2)[source]

Function for generating a multimodal independence simulation.

Parameters:
  • num_samp – number of samples for the simulation
  • num_dim – number of dimensions for the simulation
  • prob – the binomial probability, defaults to 0.5
  • sep1 – determines the size and separation of clusters, defaults to 3
  • sep2 – determines the size and separation of clusters, defaults to 2
Returns:

the data matrix and a response array

Module contents