mgcpy.benchmarks package¶
Subpackages¶
Submodules¶
mgcpy.benchmarks.power module¶
-
mgcpy.benchmarks.power.
power
(independence_test, sample_generator, num_samples=100, num_dimensions=1, noise=0.0, repeats=1000, alpha=0.05, simulation_type='')[source]¶ Estimate the power of an independence test given a simulator to sample from
Parameters: - independence_test (
Object(Independence_Test)
) – an object whose class inherits from theIndependence_Test
abstract class - sample_generator (
FunctionType
orcallable()
) – a function used to generate simulation fromsimulations
with parameters given by the following arguments -num_samples
: default to 100 -num_dimensions
: default to 1 -noise
: default to 0 - num_samples (int) – the number of samples generated by the simulation (default to 100)
- num_dimensions (int) – the number of dimensions of the samples generated by the simulation (default to 1)
- noise (float) – the noise used in simulation (default to 0)
- repeats (int) – the number of times we generate new samples to estimate the null/alternative distribution (default to 1000)
- alpha (float) – the type I error level (default to 0.05)
- simulation_type (string) – specify simulation when necessary (default to empty string)
Return empirical_power: the estimated power
Return type: numpy.float
Example
>>> from mgcpy.benchmarks.power import power >>> from mgcpy.independence_tests.mgc.mgc import MGC >>> from mgcpy.benchmarks.simulations import circle_sim >>> mgc = MGC() >>> mgc_power = power(mgc, circle_sim, num_samples=100, num_dimensions=2, simulation_type='ellipse')
- independence_test (
-
mgcpy.benchmarks.power.
power_given_data
(independence_test, simulation_type, data_type='dimension', num_samples=100, num_dimensions=1, repeats=1000, alpha=0.05, additional_params={})[source]¶ Estimate the power of an independence test given pre-generated data from the repository
MGC-paper
Mostly for internal testing purposesParameters: - independence_test (
Object(Independence_Test)
) – an object whose class inherits from theIndependence_Test
abstract class - simulation_type (int within
[1, 20]
) – specify which simulation is used - data_type (string, either 'dimension' or 'sample_size') – the pre-generated data is either increasing in dimensions or increasing in sample sizes
- num_samples (int) – the number of samples generated by the simulation (default to 100)
- num_dimensions (int) – the number of dimensions of the samples generated by the simulation (default to 1)
- noise (float) – the noise used in simulation (default to 0)
- repeats (int) – the number of times we generate new samples to estimate the null/alternative distribution (default to 1000)
- alpha (float) – the type I error level (default to 0.05)
Return empirical_power: the estimated power
Return type: numpy.float
Example
>>> from mgcpy.benchmarks.power import power_given_data >>> from mgcpy.independence_tests.mgc.mgc import MGC >>> from mgcpy.benchmarks.simulations import circle_sim >>> mgc = MGC() >>> mgc_power = power_given_data(mgc, simulation_type=4, num_samples=100, num_dimensions=2)
- independence_test (
mgcpy.benchmarks.simulations module¶
-
mgcpy.benchmarks.simulations.
gen_coeffs
(num_dim)[source]¶ Helper function for generating a linear simulation.
Parameters: num_dim – number of dimensions for the simulation Returns: a vector of coefficients
-
mgcpy.benchmarks.simulations.
gen_x_unif
(num_samp, num_dim, low=-1, high=1)[source]¶ Helper function for generating n samples from d-dimensional vector
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- low – the lower limit of the data matrix, defaults to -1
- high – the upper limit of the data matrix, defaults to 1
Returns: uniformly distributed simulated data matrix
-
mgcpy.benchmarks.simulations.
linear_sim
(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]¶ Function for generating a linear simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 1
- indep – whether to sample x and y independently, defaults to false
- low – the lower limit of the data matrix, defaults to -1
- high – the upper limit of the data matrix, defaults to 1
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
exp_sim
(num_samp, num_dim, noise=10, indep=False, low=0, high=3)[source]¶ Function for generating an exponential simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 10
- indep – whether to sample x and y independently, defaults to false
- low – the lower limit of the data matrix, defaults to 0
- high – the upper limit of the data matrix, defaults to 3
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
cub_sim
(num_samp, num_dim, noise=80, indep=False, low=-1, high=1, cub_coeff=array([-12, 48, 128]), scale=0.3333333333333333)[source]¶ Function for generating a cubic simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 80
- indep – whether to sample x and y independently, defaults to False
- low – the lower limit of the data matrix, defaults to -1
- high – the upper limit of the data matrix, defaults to 1
- cub_coeff – coefficients of the cubic function where each value corresponds to the respective order coefficientj, defaults to [-12, 48, 128]
- scale – scaling center of the cubic, defaults to 1/3
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
joint_sim
(num_samp, num_dim, noise=0.5)[source]¶ Function for generating a joint-normal simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 80
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
step_sim
(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]¶ Function for generating a joint-normal simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 1
- indep – whether to sample x and y independently, defaults to false
- low – the lower limit of the data matrix, defaults to -1
- high – the upper limit of the data matrix, defaults to 1
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
quad_sim
(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]¶ Function for generating a quadratic simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 1
- indep – whether to sample x and y independently, defaults to false
- low – the lower limit of the data matrix, defaults to -1
- high – the upper limit of the data matrix, defaults to 1
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
w_sim
(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]¶ Function for generating a w-shaped simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 1
- indep – whether to sample x and y independently, defaults to false
- low – the lower limit of the data matrix, defaults to -1
- high – the upper limit of the data matrix, defaults to 1
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
spiral_sim
(num_samp, num_dim, noise=0.4, low=0, high=5)[source]¶ Function for generating a spiral simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 0.4
- low – the lower limit of the data matrix, defaults to 0
- high – the upper limit of the data matrix, defaults to 5
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
ubern_sim
(num_samp, num_dim, noise=0.5, bern_prob=0.5)[source]¶ Function for generating an uncorrelated bernoulli simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 0.5
- bern_prob – the bernoulli probability, defaults to 0.5
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
log_sim
(num_samp, num_dim, noise=3, indep=False, base=2)[source]¶ Function for generating a logarithmic simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 1
- indep – whether to sample x and y independently, defaults to false
- base – the base of the log, defaults to 2
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
root_sim
(num_samp, num_dim, noise=0.25, indep=False, low=-1, high=1, n_root=4)[source]¶ Function for generating an nth root simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 1
- indep – whether to sample x and y independently, defaults to false
- low – the lower limit of the data matrix, defaults to -1
- high – the upper limit of the data matrix, defaults to 1
- n_root – the root of the simulation, defaults to 4
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
sin_sim
(num_samp, num_dim, noise=1, indep=False, low=-1, high=1, period=12.566370614359172)[source]¶ Function for generating a sinusoid simulation.
Note: For producing 4*pi and 16*pi simulations, change the
period
to the respective value.Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 1
- indep – whether to sample x and y independently, defaults to false
- low – the lower limit of the data matrix, defaults to -1
- high – the upper limit of the data matrix, defaults to 1
- period – the period of the sine wave, defaults to 4*pi
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
square_sim
(num_samp, num_dim, noise=1, indep=False, low=-1, high=1, period=-0.39269908169872414)[source]¶ Function for generating a square or diamond simulation.
Note: For producing square or diamond simulations, change the
period
to -pi/8 or -pi/4.Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 0.05
- indep – whether to sample x and y independently, defaults to false
- low – the lower limit of the data matrix, defaults to -1
- high – the upper limit of the data matrix, defaults to 1
- period – the period of the sine and cosine square equation, defaults to 4*pi
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
two_parab_sim
(num_samp, num_dim, noise=2, low=-1, high=1, prob=0.5)[source]¶ Function for generating a two parabolas simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 2
- low – the lower limit of the data matrix, defaults to -1
- high – the upper limit of the data matrix, defaults to 1
- prob – the binomial probability, defaults to 0.5
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
circle_sim
(num_samp, num_dim, noise=0.4, low=-1, high=1, radius=1)[source]¶ Function for generating a circle or ellipse simulation.
Note: For producing circle or ellipse simulations, change the
radius
to 1 or 5.Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- noise – noise level of the simulation, defaults to 0.4
- low – the lower limit of the data matrix, defaults to -1
- high – the upper limit of the data matrix, defaults to 1
- radius – the radius of the circle or ellipse, defaults to 1
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
multi_noise_sim
(num_samp, num_dim)[source]¶ Function for generating a multiplicative noise simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
Returns: the data matrix and a response array
-
mgcpy.benchmarks.simulations.
multi_indep_sim
(num_samp, num_dim, prob=0.5, sep1=3, sep2=2)[source]¶ Function for generating a multimodal independence simulation.
Parameters: - num_samp – number of samples for the simulation
- num_dim – number of dimensions for the simulation
- prob – the binomial probability, defaults to 0.5
- sep1 – determines the size and separation of clusters, defaults to 3
- sep2 – determines the size and separation of clusters, defaults to 2
Returns: the data matrix and a response array