qtl_power package

Module contents

Initialization of qtl-power module.

Submodules

qtl_power.extreme_pheno module

Power calculations for extreme phenotype sampling designs.

class qtl_power.extreme_pheno.ExtremePhenotype

Bases: object

Class defining extreme phenotype designs.

est_power_extreme_pheno(n=100, maf=0.01, beta=0.1, niter=100, alpha=0.05, q0=0.1, q1=0.1)

Estimate the power from an extreme-phenotype sampling design.

Parameters:

n (int) – total sample size.
maf (float) – minor allele frequency of tested variant.
beta (float) – effect-size in standard deviations.
niter (int) – number of simulation iterations.
alpha (float) – significance threshold for Fishers Exact Test.
q0 (float) – bottom quantile to establish as controls (or low-extremes).
q1 (float) – upper quantile to establish as cases (or upper extremes).

Returns:

power of extreme sampling design

Return type:

power (float)

sim_extreme_pheno(n=100, maf=0.01, beta=0.1, seed=42)

Simulate an extreme phenotype under an HWE assumption.

Parameters:

n (int) – total sample size.
maf (float) – minor allele frequency of tested variant.
beta (float) – effect-size in standard deviations.
seed (int) – random seed for simulations.

Returns:

vector of allele-counts. phenotypes (np.array): quantitative phenotypes.

Return type:

allele_count (np.array)

qtl_power.gwas module

Functions to calculate power in GWAS designs.

class qtl_power.gwas.Gwas

Bases: object

Parent class for GWAS Power calculation.

llr_power(alpha=5e-08, df=1, ncp=1)

Power under a non-central chi-squared distribution.

Parameters:

alpha (float) – p-value threshold for GWAS
df (int) – degrees of freedom
ncp (float) – non-centrality parameter

Returns:

power for association

Return type:

power (float)

class qtl_power.gwas.GwasBinary

Bases: Gwas

GWAS Power calculator for Case/Control study design.

binary_trait_beta_power(n=100, power=0.9, p=0.1, r2=1.0, alpha=5e-08, prop_cases=0.5)

Optimal detectable effect-size under a case-control GWAS study design.

Parameters:

n (int) – sample-size of unrelated individuals.
power (float) –
.
beta (float) – effect-size of variant.
r2 (float) – correlation r2 between causal variant and tag variant.
alpha (float) – p-value threshold for detection.
prop_cases (float) – proportion of samples that are cases.

Returns:

non-centrality parameter.

Return type:

ncp (float)

binary_trait_opt_n(beta=0.1, power=0.9, p=0.1, r2=1.0, alpha=5e-08, prop_cases=0.5)

Determine the sample-size required to detect this effect.

Parameters:

beta (float) – effect-size of the variant.
power (float) – threshold power level.
p (float) – minor allele frequency of variant.
r2 (float) – correlation r2 between causal variant and tagging variant.
alpha (float) – p-value threshold for GWAS
prop_cases (float) – proportion of cases in the dataset

Returns:

optimal sample size for detection at this power-level.

Return type:

opt_n (float)

binary_trait_power(n=100, p=0.1, beta=0.1, r2=1.0, alpha=5e-08, prop_cases=0.1)

Power under a case-control GWAS study design.

Parameters:

n (int) – sample-size of unrelated individuals.
p (float) – minor allele frequency of variant.
beta (float) – effect-size of variant.
r2 (float) – correlation r2 between causal variant and tagging variant.
alpha (float) – p-value threshold for detection.
prop_cases (float) – proportion of samples that are cases.

Returns:

non-centrality parameter.

Return type:

ncp (float)

ncp_binary(n=100, p=0.1, beta=0.1, r2=1.0, prop_cases=0.1)

Determine the effect-size required to detect an association at this MAF.

Parameters:

n (int) – sample-size of unrelated individuals.
p (float) – minor allele frequency of variant.
beta (float) – effect-size of variant.
r2 (float) – correlation r2 between causal variant and tagging variant.
prop_cases (float) – proportion of samples that are cases.

Returns:

non-centrality parameter.

Return type:

ncp (float)

class qtl_power.gwas.GwasBinaryModel

Bases: Gwas

GWAS Power calculations under different encodings of genotypic risk.

binary_trait_beta_power_model(n=100, p=0.1, model='additive', prev=0.01, alpha=5e-08, prop_cases=0.5, power=0.9)

Threshold effects under a specific power threshold and genetic model.

Parameters:

n (int) – sample-size of unrelated individuals.
p (float) – minor allele frequency of variant.
beta (float) – effect-size of variant (in terms of relative-risk).
model (string) – genetic model for effects (additive, recessive, or dominant).
prev (float) – prevalence of the trait in question.
alpha (float) – p-value threshold for detection.
prop_cases (float) – proportion of samples that are cases.
power (float) – power under the model.

Returns:

detectable effect-size at the power threshold and model.

Return type:

opt_beta (float)

binary_trait_power_model(n=100, p=0.1, beta=0.1, model='additive', prev=0.01, alpha=5e-08, prop_cases=0.5)

Power under a case-control GWAS study design.

Parameters:

n (int) – sample-size of unrelated individuals.
p (float) – minor allele frequency of variant.
beta (float) – effect-size of variant (in terms of relative-risk).
model (string) – genetic model for effects (additive, recessive, or dominant).
prev (float) – prevalence of the trait in question.
alpha (float) – p-value threshold for detection.
prop_cases (float) – proportion of samples that are cases.

Returns:

power under the model.

Return type:

power (float)

ncp_binary_model(n=100, p=0.1, beta=0.1, model='additive', prev=0.01, alpha=5e-08, prop_cases=0.5): Explore how multiple models affect power in case-control traits.

class qtl_power.gwas.GwasQuant

Bases: Gwas

Class for power calculations of a GWAS for a quantitative trait.

ncp_quant(n=100, p=0.1, beta=0.1, r2=1.0)

Compute the non-centrality parameter for a quantitative trait GWAS.

Parameters:

n (int) – sample-size of unrelated individuals.
p (float) – minor allele frequency of variant.
beta (float) – effect-size of variant.
r2 (float) – correlation r2 between causal variant and tagging variant.

Returns:

non-centrality parameter.

Return type:

ncp (float)

quant_trait_beta_power(n=100, power=0.9, p=0.1, r2=1.0, alpha=5e-08)

Determine the effect-size required to detect an association at this MAF.

Parameters:

n (int) – sample-size of unrelated individuals.
power (float) – threshold power level.
p (float) – minor allele frequency of variant.
r2 (float) – correlation r2 between causal variant and tagging variant.
alpha (float) – p-value threshold for GWAS

Returns:

optimal beta for detection at a specific power level

Return type:

opt_beta (float)

quant_trait_opt_n(beta=0.1, power=0.9, p=0.1, r2=1.0, alpha=5e-08)

Determine the sample-size required to detect this effect.

Parameters:

beta (float) – effect-size of the variant.
power (float) – threshold power level.
p (float) – minor allele frequency of variant.
r2 (float) – correlation r2 between causal variant and tagging variant.
alpha (float) – p-value threshold for GWAS

Returns:

optimal sample size for detection at this power-level.

Return type:

opt_n (float)

quant_trait_power(n=100, p=0.1, beta=0.1, r2=1.0, alpha=5e-08)

Power for a quantitative trait association study.

Parameters:

n (int) – sample-size of unrelated individuals.
p (float) – minor allele frequency of variant.
beta (float) – effect-size of variant.
r2 (float) – correlation r2 between causal variant and tagging variant.
alpha (float) – p-value threshold for GWAS

Returns:

non-centrality parameter.

Return type:

ncp (float)

qtl_power.rare_variants module

Estimating power for rare-variant association methods from PAGEANT.

class qtl_power.rare_variants.RareVariantBurdenPower

Bases: RareVariantPower

Approximation of power for rare-variant burden tests based on results from Derkach et al (2018).

ncp_burden_test_model1(n=100, j=30, jd=10, jp=0, tev=0.1)

Approximation of the non-centrality parameter under model S1 from Derkach et al.

The key assumption in this case is that there is independence between an alleles effect-size and EV.

Parameters:

n (int) – total sample size.
j (int) – total number of variants in the gene.
jd (int) – number of disease variants in the gene.
jp (int) – number of protective variants in the gene.
tev (float) – proportion of variance explained by gene.

Returns:

non-centrality parameter.

Return type:

ncp (float)

ncp_burden_test_model2(ws, ps, n=100, jd=10, tev=0.1)

Approximation of the non-centrality parameter under model S2 from Derkach et al.

The key assumption in this case is that there is independence between alleles effect-size and its MAF.

Parameters:

ws (np.array) – numpy array of variant weights
ps (np.array) – numpy array of variant frequencies
n (int) – total sample size.
jd (int) – number of disease variants in the gene.
jp (int) – number of protective variants in the gene.
tev (float) – proportion of variance explained by gene.

Returns:

non-centrality parameter.

Return type:

ncp (float)

ncp_burden_test_model3(ws, ps, n=100, jd=10, tev=0.1, eta=0.1)

Approximation of the non-centrality parameter under model S3 from Derkach et al.

The key assumption in this case is that alleles effect-size is strongly coupled to its MAF.

Parameters:

n (int) – total sample size.
j (int) – total number of variants in the gene.
jd (int) – number of disease variants in the gene.
jp (int) – number of protective variants in the gene.
tev (float) – proportion of variance explained by gene.

Returns:

non-centrality parameter.

Return type:

ncp (float)

opt_n_burden_model1(j=30, tev=0.01, prop_causal=0.8, prop_risk=0.5, alpha=1e-06, power=0.8)

Estimate the sample-size required for detection of supplied TEV in a region.

Parameters:

j (int) – total number of variants in the gene.
tev (float) – proportion of variance explained by gene.
prop_causal (float) – proportion of causal variants.
prop_risk (float) – number of protective variants.
alpha (float) – p-value threshold for power.
power (float) – power for detection under the burden model.

Returns:

TEV required for detection at this rate.

Return type:

opt_tev (float)

power_burden_model1(n=100, j=30, prop_causal=0.8, prop_risk=0.1, tev=0.1, alpha=1e-06)

Estimate the power under a burden model 1 from PAGEANT.

Parameters:

n (int) – total sample size.
j (int) – total number of variants in the gene.
prop_causal (float) – proportion of causal variants.
prop_risk (float) – number of protective variants.
tev (float) – proportion of variance explained by gene.
alpha (float) – p-value threshold for power.

Returns:

power for detection under the burden model.

Return type:

power (float)

power_burden_model1_real(n=100, nreps=10, **kwargs)

Estimate power under model 1 from PAGEANT with realistic variants per gene.

Parameters:

n (int) – number of samples
nreps (int) – number of replicates

Returns:

array of power estimates based on realistic number of variants.

Return type:

est_power (np.array)

power_burden_model2(ws, ps, n=100, j=30, prop_causal=0.8, prop_risk=0.1, tev=0.1, alpha=1e-06)

Estimate the power under a burden model 1 from PAGEANT.

Parameters:

n (int) – total sample size.
j (int) – total number of variants in the gene.
prop_causal (float) – proportion of causal variants.
prop_risk (float) – number of protective variants.
tev (float) – proportion of variance explained by gene.
alpha (float) – p-value threshold for power.

Returns:

power for detection under the burden model.

Return type:

power (float)

tev_power_burden_model1(n=100, j=30, prop_causal=0.8, prop_risk=0.5, alpha=1e-06, power=0.8)

Estimate the total explained variance by a region for adequate detection at a power threshold.

Parameters:

n (int) – total sample size.
j (int) – total number of variants in the gene.
prop_causal (float) – proportion of causal variants.
prop_risk (float) – number of protective variants.
alpha (float) – p-value threshold for power.
power (float) – power for detection under the burden model.

Returns:

TEV required for detection at this rate.

Return type:

opt_tev (float)

class qtl_power.rare_variants.RareVariantPower

Bases: object

Power calculator for rare-variant power.

Methods based on derivations from [PAGEANT](https://doi.org/10.1093/bioinformatics/btx770)

llr_power(alpha=1e-06, df=1, ncp=1, ncp0=0)

Power under a non-central chi-squared distribution.

Parameters:

alpha (float) – p-value threshold for GWAS
df (int) – degrees of freedom
ncp (float) – non-centrality parameter
ncp0 (float) – null non-centrality parameter

Returns:

power for association

Return type:

power (float)

sim_af_weights(j=100, a1=0.1846, b1=11.1248, n=100, clip=True, seed=42, test='SKAT')

Simulate allele frequencies from a beta distribution.

Ideally the beta distribution is derived from realized allele frequencies. The current parameters are based on 15k African ancestry individuals. For mimicing a much larger set (112k) of Non-Finnish European individuals, use the parameters a1=0.14311324240262455, b1=26.97369198989023,

Parameters:

j (int) – number of variants
a1 (float) – shape parameter of the beta distribution
b1 (float) – scale parameter of the beta distribution
n (float) – number of samples
clip (boolean) – perform clipping based on the current sample-size.
seed (int) – random seed.
test (string) – type of test to be performed (SKAT, Calpha, Hotelling)

Returns:

array of weights per-variant. ps (np.array): array of allele frequencies.

Return type:

ws (np.array)

sim_var_per_gene(a=1.47, b=0.0108, seed=42)

Simulate the number of variants per-gene.

Parameter values are derived from GnomAD Exonic variants on Chromosome 4 from ~15730 AFR ancestry subjects.

For a Non-Finnish European ancestry setting with larger sample size (~112350), use a=1.44306, b=0.00372.

Parameters:

a (float) – shape parameter for a gamma distribution
b (float) – scale parameter for a gamma distribution
seed (int) – random seed.

Returns:

number of variants per-gene.

Return type:

nvar (int)

class qtl_power.rare_variants.RareVariantVCPower

Bases: RareVariantPower

Approximation of power for rare-variant variance component tests based on results from Derkach et al (2018).

match_cumulants_ncp(c1, c2, c3, c4)

Obtain the degrees of freedom and non-centrality parameter from cumulants.

Parameters:

c1 (float) – first cumulant of non-central chi-squared dist.
c2 (float) – second cumulant of non-central chi-squared dist.
c3 (float) – third cumulant of non-central chi-squared dist.
c4 (float) – fourth cumulant of non-central chi-squared dist.

Returns:

degrees of freedom for test. ncp (float): non-centrality parameter.

Return type:

df (int)

ncp_vc_first_order_model1(ws, ps, n=100, tev=0.1)

Approximation of the non-centrality parameter under model S1 from Derkach et al.

The key assumption is independence between an alleles effect-size and its MAF, from Table S1 in Derkach et al.

Parameters:

ws (np.array) – numpy array of weights per-variant
ps (np.array) – numpy array of allele frequencies
n (int) – sample size
tev (float) – total explained variance by a locus

Returns:

degrees of freedom for variance component test ncp (float): non-centrality parameter

Return type:

df (float)

power_vc_first_order_model1(ws, ps, n=100, tev=0.1, alpha=1e-06, df=1)

Compute the power for detection under model 1 for a variance component test.

Parameters:

ws (np.array) – numpy array of weights per-variant
ps (np.array) – numpy array of allele frequencies
n (int) – sample size
tev (float) – total explained variance by a locus
alpha (float) – total significance level for estimation of power
df (float) – degree of freedom for test

Returns:

estimated power under this variance component model.

Return type:

power (float)