qtl_power package

Module contents

Initialization of qtl-power module.

Submodules

qtl_power.extreme_pheno module

Power calculations for extreme phenotype sampling designs.

class qtl_power.extreme_pheno.ExtremePhenotype

Bases: object

Class defining extreme phenotype designs.

est_power_extreme_pheno(n=100, maf=0.01, beta=0.1, niter=100, alpha=0.05, q0=0.1, q1=0.1)

Estimate the power from an extreme-phenotype sampling design.

Parameters:
  • n (int) – total sample size.

  • maf (float) – minor allele frequency of tested variant.

  • beta (float) – effect-size in standard deviations.

  • niter (int) – number of simulation iterations.

  • alpha (float) – significance threshold for Fishers Exact Test.

  • q0 (float) – bottom quantile to establish as controls (or low-extremes).

  • q1 (float) – upper quantile to establish as cases (or upper extremes).

Returns:

power of extreme sampling design

Return type:

power (float)

sim_extreme_pheno(n=100, maf=0.01, beta=0.1, seed=42)

Simulate an extreme phenotype under an HWE assumption.

Parameters:
  • n (int) – total sample size.

  • maf (float) – minor allele frequency of tested variant.

  • beta (float) – effect-size in standard deviations.

  • seed (int) – random seed for simulations.

Returns:

vector of allele-counts. phenotypes (np.array): quantitative phenotypes.

Return type:

allele_count (np.array)

qtl_power.gwas module

Functions to calculate power in GWAS designs.

class qtl_power.gwas.Gwas

Bases: object

Parent class for GWAS Power calculation.

llr_power(alpha=5e-08, df=1, ncp=1)

Power under a non-central chi-squared distribution.

Parameters:
  • alpha (float) – p-value threshold for GWAS

  • df (int) – degrees of freedom

  • ncp (float) – non-centrality parameter

Returns:

power for association

Return type:

power (float)

class qtl_power.gwas.GwasBinary

Bases: Gwas

GWAS Power calculator for Case/Control study design.

binary_trait_beta_power(n=100, power=0.9, p=0.1, r2=1.0, alpha=5e-08, prop_cases=0.5)

Optimal detectable effect-size under a case-control GWAS study design.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • power (float) –

    .

  • beta (float) – effect-size of variant.

  • r2 (float) – correlation r2 between causal variant and tag variant.

  • alpha (float) – p-value threshold for detection.

  • prop_cases (float) – proportion of samples that are cases.

Returns:

non-centrality parameter.

Return type:

ncp (float)

binary_trait_opt_n(beta=0.1, power=0.9, p=0.1, r2=1.0, alpha=5e-08, prop_cases=0.5)

Determine the sample-size required to detect this effect.

Parameters:
  • beta (float) – effect-size of the variant.

  • power (float) – threshold power level.

  • p (float) – minor allele frequency of variant.

  • r2 (float) – correlation r2 between causal variant and tagging variant.

  • alpha (float) – p-value threshold for GWAS

  • prop_cases (float) – proportion of cases in the dataset

Returns:

optimal sample size for detection at this power-level.

Return type:

opt_n (float)

binary_trait_power(n=100, p=0.1, beta=0.1, r2=1.0, alpha=5e-08, prop_cases=0.1)

Power under a case-control GWAS study design.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • p (float) – minor allele frequency of variant.

  • beta (float) – effect-size of variant.

  • r2 (float) – correlation r2 between causal variant and tagging variant.

  • alpha (float) – p-value threshold for detection.

  • prop_cases (float) – proportion of samples that are cases.

Returns:

non-centrality parameter.

Return type:

ncp (float)

ncp_binary(n=100, p=0.1, beta=0.1, r2=1.0, prop_cases=0.1)

Determine the effect-size required to detect an association at this MAF.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • p (float) – minor allele frequency of variant.

  • beta (float) – effect-size of variant.

  • r2 (float) – correlation r2 between causal variant and tagging variant.

  • prop_cases (float) – proportion of samples that are cases.

Returns:

non-centrality parameter.

Return type:

ncp (float)

class qtl_power.gwas.GwasBinaryModel

Bases: Gwas

GWAS Power calculations under different encodings of genotypic risk.

binary_trait_beta_power_model(n=100, p=0.1, model='additive', prev=0.01, alpha=5e-08, prop_cases=0.5, power=0.9)

Threshold effects under a specific power threshold and genetic model.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • p (float) – minor allele frequency of variant.

  • beta (float) – effect-size of variant (in terms of relative-risk).

  • model (string) – genetic model for effects (additive, recessive, or dominant).

  • prev (float) – prevalence of the trait in question.

  • alpha (float) – p-value threshold for detection.

  • prop_cases (float) – proportion of samples that are cases.

  • power (float) – power under the model.

Returns:

detectable effect-size at the power threshold and model.

Return type:

opt_beta (float)

binary_trait_power_model(n=100, p=0.1, beta=0.1, model='additive', prev=0.01, alpha=5e-08, prop_cases=0.5)

Power under a case-control GWAS study design.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • p (float) – minor allele frequency of variant.

  • beta (float) – effect-size of variant (in terms of relative-risk).

  • model (string) – genetic model for effects (additive, recessive, or dominant).

  • prev (float) – prevalence of the trait in question.

  • alpha (float) – p-value threshold for detection.

  • prop_cases (float) – proportion of samples that are cases.

Returns:

power under the model.

Return type:

power (float)

ncp_binary_model(n=100, p=0.1, beta=0.1, model='additive', prev=0.01, alpha=5e-08, prop_cases=0.5)

Explore how multiple models affect power in case-control traits.

class qtl_power.gwas.GwasQuant

Bases: Gwas

Class for power calculations of a GWAS for a quantitative trait.

ncp_quant(n=100, p=0.1, beta=0.1, r2=1.0)

Compute the non-centrality parameter for a quantitative trait GWAS.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • p (float) – minor allele frequency of variant.

  • beta (float) – effect-size of variant.

  • r2 (float) – correlation r2 between causal variant and tagging variant.

Returns:

non-centrality parameter.

Return type:

ncp (float)

quant_trait_beta_power(n=100, power=0.9, p=0.1, r2=1.0, alpha=5e-08)

Determine the effect-size required to detect an association at this MAF.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • power (float) – threshold power level.

  • p (float) – minor allele frequency of variant.

  • r2 (float) – correlation r2 between causal variant and tagging variant.

  • alpha (float) – p-value threshold for GWAS

Returns:

optimal beta for detection at a specific power level

Return type:

opt_beta (float)

quant_trait_opt_n(beta=0.1, power=0.9, p=0.1, r2=1.0, alpha=5e-08)

Determine the sample-size required to detect this effect.

Parameters:
  • beta (float) – effect-size of the variant.

  • power (float) – threshold power level.

  • p (float) – minor allele frequency of variant.

  • r2 (float) – correlation r2 between causal variant and tagging variant.

  • alpha (float) – p-value threshold for GWAS

Returns:

optimal sample size for detection at this power-level.

Return type:

opt_n (float)

quant_trait_power(n=100, p=0.1, beta=0.1, r2=1.0, alpha=5e-08)

Power for a quantitative trait association study.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • p (float) – minor allele frequency of variant.

  • beta (float) – effect-size of variant.

  • r2 (float) – correlation r2 between causal variant and tagging variant.

  • alpha (float) – p-value threshold for GWAS

Returns:

non-centrality parameter.

Return type:

ncp (float)

qtl_power.rare_variants module

Estimating power for rare-variant association methods from PAGEANT.

class qtl_power.rare_variants.RareVariantBurdenPower

Bases: RareVariantPower

Approximation of power for rare-variant burden tests based on results from Derkach et al (2018).

ncp_burden_test_model1(n=100, j=30, jd=10, jp=0, tev=0.1)

Approximation of the non-centrality parameter under model S1 from Derkach et al.

The key assumption in this case is that there is independence between an alleles effect-size and its MAF.

Parameters:
  • n (int) – total sample size.

  • j (int) – total number of variants in the gene.

  • jd (int) – number of disease variants in the gene.

  • jp (int) – number of protective variants in the gene.

  • tev (float) – proportion of variance explained by gene.

Returns:

non-centrality parameter.

Return type:

ncp (float)

opt_n_burden_model1(j=30, tev=0.01, prop_causal=0.8, prop_risk=0.5, alpha=1e-06, power=0.8)

Estimate the sample-size required for detection of supplied TEV in a region.

Parameters:
  • j (int) – total number of variants in the gene.

  • tev (float) – proportion of variance explained by gene.

  • prop_causal (float) – proportion of causal variants.

  • prop_risk (float) – number of protective variants.

  • alpha (float) – p-value threshold for power.

  • power (float) – power for detection under the burden model.

Returns:

TEV required for detection at this rate.

Return type:

opt_tev (float)

power_burden_model1(n=100, j=30, prop_causal=0.8, prop_risk=0.1, tev=0.1, alpha=1e-06)

Estimate the power under a burden model 1 from PAGEANT.

Parameters:
  • n (int) – total sample size.

  • j (int) – total number of variants in the gene.

  • prop_causal (float) – proportion of causal variants.

  • prop_risk (float) – number of protective variants.

  • tev (float) – proportion of variance explained by gene.

  • alpha (float) – p-value threshold for power.

Returns:

power for detection under the burden model.

Return type:

power (float)

power_burden_model1_real(n=100, nreps=10, **kwargs)

Estimate power under model 1 from PAGEANT with realistic variants per gene.

Parameters:
  • n (int) – number of samples

  • nreps (int) – number of replicates

Returns:

array of power estimates based on realistic number of variants.

Return type:

est_power (np.array)

tev_power_burden_model1(n=100, j=30, prop_causal=0.8, prop_risk=0.5, alpha=1e-06, power=0.8)

Estimate the total explained variance by a region for adequate detection at a power threshold.

Parameters:
  • n (int) – total sample size.

  • j (int) – total number of variants in the gene.

  • prop_causal (float) – proportion of causal variants.

  • prop_risk (float) – number of protective variants.

  • alpha (float) – p-value threshold for power.

  • power (float) – power for detection under the burden model.

Returns:

TEV required for detection at this rate.

Return type:

opt_tev (float)

class qtl_power.rare_variants.RareVariantPower

Bases: object

Power calculator for rare-variant power.

Methods based on derivations from [PAGEANT](https://doi.org/10.1093/bioinformatics/btx770)

llr_power(alpha=1e-06, df=1, ncp=1, ncp0=0)

Power under a non-central chi-squared distribution.

Parameters:
  • alpha (float) – p-value threshold for GWAS

  • df (int) – degrees of freedom

  • ncp (float) – non-centrality parameter

  • ncp0 (float) – null non-centrality parameter

Returns:

power for association

Return type:

power (float)

sim_af_weights(j=100, a1=0.1846, b1=11.1248, n=100, clip=True, seed=42, test='SKAT')

Simulate allele frequencies from a beta distribution.

Ideally the beta distribution is derived from realized allele frequencies. The current parameters are based on 15k African ancestry individuals. For mimicing a much larger set (112k) of Non-Finnish European individuals, use the parameters a1=0.14311324240262455, b1=26.97369198989023,

Parameters:
  • j (int) – number of variants

  • a1 (float) – shape parameter of the beta distribution

  • b1 (float) – scale parameter of the beta distribution

  • n (float) – number of samples

  • clip (boolean) – perform clipping based on the current sample-size.

  • seed (int) – random seed.

  • test (string) – type of test to be performed (SKAT, Calpha, Hotelling)

Returns:

array of weights per-variant. ps (np.array): array of allele frequencies.

Return type:

ws (np.array)

sim_var_per_gene(a=1.47, b=0.0108, seed=42)

Simulate the number of variants per-gene.

Parameter values are derived from GnomAD Exonic variants on Chromosome 4 from ~15730 AFR ancestry subjects.

For a Non-Finnish European ancestry setting with larger sample size (~112350), use a=1.44306, b=0.00372.

Parameters:
  • a (float) – shape parameter for a gamma distribution

  • b (float) – scale parameter for a gamma distribution

  • seed (int) – random seed.

Returns:

number of variants per-gene.

Return type:

nvar (int)

class qtl_power.rare_variants.RareVariantVCPower

Bases: RareVariantPower

Approximation of power for rare-variant variance component tests based on results from Derkach et al (2018).

match_cumulants_ncp(c1, c2, c3, c4)

Obtain the degrees of freedom and non-centrality parameter from cumulants.

Parameters:
  • c1 (float) – first cumulant of non-central chi-squared dist.

  • c2 (float) – second cumulant of non-central chi-squared dist.

  • c3 (float) – third cumulant of non-central chi-squared dist.

  • c4 (float) – fourth cumulant of non-central chi-squared dist.

Returns:

degrees of freedom for test. ncp (float): non-centrality parameter.

Return type:

df (int)

ncp_vc_first_order_model1(ws, ps, n=100, tev=0.1)

Approximation of the non-centrality parameter under model S1 from Derkach et al.

The key assumption is independence between an alleles effect-size and its MAF, from Table S1 in Derkach et al.

Parameters:
  • ws (np.array) – numpy array of weights per-variant

  • ps (np.array) – numpy array of allele frequencies

  • n (int) – sample size

  • tev (float) – total explained variance by a locus

Returns:

degrees of freedom for variance component test ncp (float): non-centrality parameter

Return type:

df (float)

power_vc_first_order_model1(ws, ps, n=100, tev=0.1, alpha=1e-06, df=1)

Compute the power for detection under model 1 for a variance component test.

Parameters:
  • ws (np.array) – numpy array of weights per-variant

  • ps (np.array) – numpy array of allele frequencies

  • n (int) – sample size

  • tev (float) – total explained variance by a locus

  • alpha (float) – total significance level for estimation of power

  • df (float) – degree of freedom for test

Returns:

estimated power under this variance component model.

Return type:

power (float)