qtl_power package

Module contents

Initialization of qtl-power module.

Submodules

qtl_power.extreme_pheno module

Power calculations for extreme phenotype sampling designs.

class qtl_power.extreme_pheno.ExtremePhenotype

Bases: object

Class defining extreme phenotype designs.

est_power_extreme_pheno(n=100, af=0.01, beta=0.1, niter=100, alpha=0.05, q0=0.1, q1=0.1)

Estimate the power from an extreme-phenotype sampling design.

Parameters:
  • n (int) – total sample size.

  • af (float) – allele frequency of tested variant (must be <= 0.5).

  • beta (float) – effect-size in standard deviations.

  • niter (int) – number of simulation iterations.

  • alpha (float) – significance threshold for Fishers Exact Test.

  • q0 (float) – bottom quantile to establish as controls (or low-extremes).

  • q1 (float) – upper quantile to establish as cases (or upper extremes).

Returns:

power of extreme sampling design

Return type:

power (float)

sim_extreme_pheno(n=100, af=0.01, beta=0.1, seed=42)

Simulate an extreme phenotype under an HWE assumption.

Parameters:
  • n (int) – total sample size.

  • af (float) – allele frequency of tested variant (must be <= 0.5).

  • beta (float) – effect-size in standard deviations.

  • seed (int) – random seed for simulations.

Returns:

vector of allele-counts. phenotypes (np.array): quantitative phenotypes.

Return type:

allele_count (np.array)

qtl_power.gwas module

Functions to calculate power in GWAS designs.

class qtl_power.gwas.Gwas

Bases: object

Parent class for GWAS Power calculation.

static genotype_var(af, var_g)

Return genotype variance: var_g if provided, else 2*af*(1-af).

llr_power(alpha=5e-08, df=1, ncp=1)

Power under a non-central chi-squared distribution.

\[\text{power} = 1 - F_{\chi^2(df,\,\lambda)}\!\left(q_{1-\alpha}\right)\]

where \(q_{1-\alpha}\) is the \((1-\alpha)\) quantile of the central \(\chi^2(df)\) distribution and \(\lambda\) is the non-centrality parameter.

Parameters:
  • alpha (float) – p-value threshold for GWAS

  • df (int) – degrees of freedom

  • ncp (float) – non-centrality parameter \(\lambda\)

Returns:

power for association

Return type:

power (float)

class qtl_power.gwas.GwasBinary

Bases: Gwas

GWAS power calculator for a case/control study design.

Under an additive model the score-test NCP is:

\[\lambda = r^2 \, N \, \beta^2 \cdot 2\,\text{af}(1-\text{af}) \cdot K(1-K)\]

where \(K\) is the proportion of cases, \(\beta\) is the per-allele log-OR (small-effect linear approximation), and \(r^2\) is the LD / imputation-accuracy squared correlation.

For CNV predictors pass var_g = \(\text{Var}(C)\) to replace \(2\,\text{af}(1-\text{af})\) with the copy-number variance.

binary_trait_beta_power(n=100, power=0.9, af=0.1, r2=1.0, alpha=5e-08, prop_cases=0.5, var_g=None)

Minimum detectable effect size under a case-control GWAS study design.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • power (float) – target power level in \((0, 1)\).

  • af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when var_g is provided.

  • r2 (float) – LD \(r^2\) (\(0 \leq r^2 \leq 1\)).

  • alpha (float) – p-value threshold for detection.

  • prop_cases (float) – proportion of samples that are cases.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

minimum detectable \(|\beta|\).

Return type:

opt_beta (float)

binary_trait_opt_n(beta=0.1, power=0.9, af=0.1, r2=1.0, alpha=5e-08, prop_cases=0.5, var_g=None)

Minimum sample size to achieve the target power.

Parameters:
  • beta (float) – per-allele effect size.

  • power (float) – target power level in \((0, 1)\).

  • af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when var_g is provided.

  • r2 (float) – LD \(r^2\) (\(0 \leq r^2 \leq 1\)).

  • alpha (float) – p-value threshold for GWAS.

  • prop_cases (float) – proportion of cases in the dataset.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

required \(N\) (fractional; take

\(\lceil \cdot \rceil\) in practice).

Return type:

opt_n (float)

binary_trait_power(n=100, af=0.1, beta=0.1, r2=1.0, alpha=5e-08, prop_cases=0.1, var_g=None)

Power under a case-control GWAS study design.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when var_g is provided.

  • beta (float) – per-allele effect size.

  • r2 (float) – LD \(r^2\) (\(0 \leq r^2 \leq 1\)).

  • alpha (float) – p-value threshold for detection.

  • prop_cases (float) – proportion of samples that are cases.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

power in \([0, 1]\).

Return type:

power (float)

ncp_binary(n=100, af=0.1, beta=0.1, r2=1.0, prop_cases=0.1, var_g=None)

Non-centrality parameter for a case/control GWAS.

\[\lambda = r^2 \, N \, \beta^2 \cdot V_g \cdot K(1-K)\]

where \(K\) = prop_cases and \(V_g = 2\,\text{af}(1-\text{af})\) for SNPs or \(V_g = \text{Var}(C)\) when var_g is supplied.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • af (float) – allele frequency (\(0 \leq \text{af} \leq 1\)). Ignored when var_g is provided.

  • beta (float) – per-allele effect size (linear approximation to log-OR).

  • r2 (float) – LD \(r^2\) (\(0 \leq r^2 \leq 1\)).

  • prop_cases (float) – proportion of samples that are cases (\(0 < K < 1\)).

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

non-centrality parameter \(\lambda\).

Return type:

ncp (float)

class qtl_power.gwas.GwasBinaryModel

Bases: Gwas

GWAS Power calculations under different encodings of genotypic risk.

binary_trait_beta_power_model(n=100, af=0.1, model='additive', prev=0.01, alpha=5e-08, prop_cases=0.5, power=0.9)

Threshold effects under a specific power threshold and genetic model.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • af (float) – allele frequency of variant.

  • beta (float) – effect-size of variant (in terms of relative-risk).

  • model (string) – genetic model for effects (additive, recessive, or dominant).

  • prev (float) – prevalence of the trait in question.

  • alpha (float) – p-value threshold for detection.

  • prop_cases (float) – proportion of samples that are cases.

  • power (float) – power under the model.

Returns:

detectable effect-size at the power threshold and model.

Return type:

opt_beta (float)

binary_trait_power_model(n=100, af=0.1, beta=0.1, model='additive', prev=0.01, alpha=5e-08, prop_cases=0.5)

Power under a case-control GWAS study design.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • af (float) – allele frequency of variant.

  • beta (float) – effect-size of variant (in terms of relative-risk).

  • model (string) – genetic model for effects (additive, recessive, or dominant).

  • prev (float) – prevalence of the trait in question.

  • alpha (float) – p-value threshold for detection.

  • prop_cases (float) – proportion of samples that are cases.

Returns:

power under the model.

Return type:

power (float)

ncp_binary_model(n=100, af=0.1, beta=0.1, model='additive', prev=0.01, alpha=5e-08, prop_cases=0.5)

Explore how multiple models affect power in case-control traits.

class qtl_power.gwas.GwasBinomialTrait(mu=0.5)

Bases: Gwas

GWAS power calculator for a binomial count trait.

Each individual contributes \(n_i\) Bernoulli trials:

\[Y_i \sim \text{Binomial}(n_i,\; p_i),\quad p_i = \mu + \beta\,(g_i - 2\,\text{af})\]

where \(g_i \in \{0, 1, 2\}\) is the additive genotype under HWE and the genotype is mean-centred so that \(\mu = E[p_i]\) is the population mean success probability, invariant when sweeping af.

The score-test non-centrality parameter is:

\[\lambda = r^2 \, N \, \beta^2 \cdot 2\,\text{af}(1-\text{af}) \cdot \frac{\bar{n}}{\mu(1-\mu)}\]

where \(\bar{n}\) is the mean number of trials per individual and \(r^2\) is the LD / imputation-accuracy squared correlation between the causal variant and the typed/imputed tag (\(r^2 = 1\) gives the perfectly-typed case).

For CNV predictors pass var_g = \(\text{Var}(C)\) to replace \(2\,\text{af}(1-\text{af})\) with the copy-number variance.

static beta_to_log_or(beta, mu)

Convert raw probability \(\beta\) to an approximate log-odds ratio.

First-order delta method on the logit transformation:

\[\log\text{OR} \approx \frac{\beta}{\mu(1-\mu)}\]

Valid when \(\beta\) is small relative to \(\mu(1-\mu)\). Accuracy degrades when \(\mu\) is near 0 or 1, or when the raw \(\beta\) is large.

Parameters:
  • beta (float) – per-allele change in success probability (raw units).

  • mu (float) – population mean success probability.

Returns:

approximate log-odds ratio per allele.

Return type:

log_or (float)

static beta_to_sd_units(beta, mu, n_mean)

Convert raw probability \(\beta\) to phenotypic-SD units.

The per-individual rate \(Y_i/n_i\) has variance \(\mu(1-\mu)/\bar{n}\), so:

\[\beta_\text{SD} = \frac{\beta}{\sqrt{\mu(1-\mu)/\bar{n}}}\]

This is exact under the identity-link model. Deeper sequencing (larger \(\bar{n}\)) makes the same raw \(\beta\) appear larger in SD units.

Parameters:
  • beta (float) – per-allele change in success probability (raw units).

  • mu (float) – population mean success probability.

  • n_mean (float) – mean number of trials per individual \(\bar{n}\).

Returns:

effect size in units of phenotypic SD.

Return type:

beta_sd (float)

binomial_trait_beta_power(n=100, af=0.2, n_mean=10.0, power=0.8, r2=1.0, alpha=5e-08, var_g=None)

Minimum detectable \(|\beta|\) at the target power level.

For SNPs (\(g \in \{0,1,2\}\)), \(\beta\) is bounded above so that \(p_i = \mu + \beta(g_i - 2\,\text{af})\) stays in \((0,1)\) for all genotypes:

\[\beta_\max = \min\!\left( \frac{1-\mu}{2(1-\text{af})},\; \frac{\mu}{2\,\text{af}} \right)\]

When var_g is supplied the copy-number range is unknown, so the solver uses \(\beta_\max = \min(\mu,\,1-\mu)\) as a conservative bound (valid at the population mean; may not hold for extreme copy numbers in the tails of the CNV distribution).

Parameters:
  • n (int) – number of individuals.

  • af (float) – allele frequency (\(0 < \text{af} < 1\)). Used for \(\beta_\max\) when var_g is None.

  • n_mean (float) – mean trials per individual \(\bar{n}\).

  • power (float) – target power level in \((0, 1)\).

  • r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).

  • alpha (float) – p-value threshold.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

minimum detectable \(|\beta|\).

Return type:

opt_beta (float)

binomial_trait_opt_n(af=0.2, beta=0.05, n_mean=10.0, power=0.8, r2=1.0, alpha=5e-08, var_g=None)

Minimum sample size to achieve the target power.

Parameters:
  • af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when var_g is provided.

  • beta (float) – per-allele change in success probability.

  • n_mean (float) – mean trials per individual \(\bar{n}\).

  • power (float) – target power level in \((0, 1)\).

  • r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).

  • alpha (float) – p-value threshold.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

required \(N\) (fractional; take

\(\lceil \cdot \rceil\) in practice).

Return type:

opt_n (float)

binomial_trait_power(n=100, af=0.2, beta=0.05, n_mean=10.0, r2=1.0, alpha=5e-08, var_g=None)

Power to detect the association under the binomial trait model.

Parameters:
  • n (int) – number of individuals.

  • af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when var_g is provided.

  • beta (float) – per-allele change in success probability.

  • n_mean (float) – mean trials per individual \(\bar{n}\).

  • r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).

  • alpha (float) – p-value threshold.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

power in \([0, 1]\).

Return type:

power (float)

binomial_trait_power_with_nvar(n=100, af=0.2, beta=0.05, n_mean=10.0, n_var=0.0, n_sigma=1.0, r2=1.0, alpha=5e-08, var_g=None)

Power with \(\pm\) n_sigma uncertainty bands from variable trial counts.

Parameters:
  • n (int) – number of individuals.

  • af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when var_g is provided.

  • beta (float) – per-allele change in success probability.

  • n_mean (float) – mean trials per individual \(\bar{n}\).

  • n_var (float) – variance of trials per individual.

  • n_sigma (float) – number of NCP standard deviations for the bands.

  • r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).

  • alpha (float) – p-value threshold.

  • var_g (float, optional) – genotype / copy-number variance. Supported only when n_var = 0; see ncp_binomial_sd().

Returns:

power at \(\lambda - \sigma_\lambda\), \(\lambda\), and \(\lambda + \sigma_\lambda\).

Return type:

(power_low, power_mid, power_high) (tuple[float, float, float])

static log_or_to_beta(log_or, mu)

Convert a log-odds ratio to an approximate raw probability \(\beta\).

Inverse of beta_to_log_or() (same small-effect approximation applies):

\[\beta \approx \log\text{OR} \cdot \mu(1-\mu)\]
Parameters:
  • log_or (float) – log-odds ratio per allele.

  • mu (float) – population mean success probability.

Returns:

approximate per-allele change in success probability.

Return type:

beta (float)

static mu_from_p0(p0, af, beta)

Convert baseline probability \(p_0 = \Pr(\text{success} \mid g=0)\) to population mean \(\mu\).

\[\mu = p_0 + 2\,\text{af}\cdot\beta\]
Parameters:
  • p0 (float) – success probability for the \(g=0\) genotype.

  • af (float) – allele frequency.

  • beta (float) – per-allele change in success probability.

Returns:

population mean success probability.

Return type:

mu (float)

ncp_binomial(n=100, af=0.2, beta=0.05, n_mean=10.0, r2=1.0, var_g=None)

Non-centrality parameter for the binomial-trait score test.

\[\lambda = r^2 \, N \, \beta^2 \cdot V_g \cdot \frac{\bar{n}}{\mu(1-\mu)}\]

where \(V_g = 2\,\text{af}(1-\text{af})\) for SNPs or \(V_g = \text{Var}(C)\) when var_g is supplied.

Parameters:
  • n (int) – number of individuals.

  • af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when var_g is provided.

  • beta (float) – per-allele change in success probability.

  • n_mean (float) – mean number of Binomial trials per individual \(\bar{n}\).

  • r2 (float) – LD / imputation-accuracy \(r^2\) between causal and typed variant (\(0 < r^2 \leq 1\)).

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

non-centrality parameter \(\lambda\).

Return type:

ncp (float)

ncp_binomial_sd(n=100, af=0.2, beta=0.05, n_mean=10.0, n_var=0.0, r2=1.0, var_g=None)

Compute the standard deviation of the NCP due to variable trials.

By the delta method:

\[\text{Var}(\lambda) \approx \lambda^2 \cdot \frac{\text{Var}(\tilde{g}^2 n_i)} {\bigl(E[\tilde{g}^2]\,\bar{n}\bigr)^2 \, N}\]

where \(\tilde{g} = g - 2\,\text{af}\). Returns 0.0 when n_var = 0 (fixed-trial design).

The higher-moment calculation requires the full \(\{0,1,2\}\) genotype distribution and is therefore not supported when var_g is supplied and n_var > 0.

Parameters:
  • n (int) – number of individuals.

  • af (float) – allele frequency (\(0 < \text{af} < 1\)).

  • beta (float) – per-allele change in success probability.

  • n_mean (float) – mean trials per individual \(\bar{n}\).

  • n_var (float) – variance of trials per individual (\(\geq 0\)).

  • r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).

  • var_g (float, optional) – genotype / copy-number variance. Supported only when n_var = 0 (fixed-trial design); raises NotImplementedError otherwise.

Returns:

standard deviation of the realised NCP.

Return type:

sd (float)

power_curve(sample_sizes, af=0.2, beta=0.05, n_mean=10.0, r2=1.0, alpha=5e-08, var_g=None)

Power as a function of sample size (vectorised).

All NCPs are computed in one pass and a single ncx2.cdf() call is made — no Python loop over sample sizes.

Parameters:
  • sample_sizes (array-like) – array of \(N\) values.

  • af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when var_g is provided.

  • beta (float) – per-allele change in success probability.

  • n_mean (float) – mean trials per individual \(\bar{n}\).

  • r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).

  • alpha (float) – p-value threshold.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

power at each sample size.

Return type:

powers (np.ndarray)

static sd_units_to_beta(beta_sd, mu, n_mean)

Convert SD-unit effect size back to raw probability units.

Inverse of beta_to_sd_units():

\[\beta = \beta_\text{SD} \cdot \sqrt{\frac{\mu(1-\mu)}{\bar{n}}}\]
Parameters:
  • beta_sd (float) – effect size in units of phenotypic SD.

  • mu (float) – population mean success probability.

  • n_mean (float) – mean number of trials per individual \(\bar{n}\).

Returns:

per-allele change in success probability.

Return type:

beta (float)

class qtl_power.gwas.GwasPoisson(mu=1.0, link='log')

Bases: Gwas

GWAS power calculator for a Poisson count trait.

The outcome \(Y_i \sim \text{Poisson}(\mu_i)\) is linked to the additive genotype \(g_i \in \{0, 1, 2\}\) (HWE) via:

Log link (default, \(\beta\) is a log-rate-ratio per allele):

\[\log(\mu_i) = \log(\mu) + \beta\,(g_i - 2\,\text{af})\]

Identity link (\(\beta\) is an absolute rate change per allele):

\[\mu_i = \mu + \beta\,(g_i - 2\,\text{af})\]

\(\mu\) is the population mean count at the null. The genotype is mean-centred (\(g_i - 2\,\text{af}\)) so the NCP is symmetric in af.

The score-test non-centrality parameter is:

\[\begin{split}\lambda = r^2 \, N \, \beta^2 \cdot 2\,\text{af}(1-\text{af}) \cdot \begin{cases} \mu & \text{log link} \\ 1/\mu & \text{identity link} \end{cases}\end{split}\]

For CNV predictors pass var_g = \(\text{Var}(C)\) to replace \(2\,\text{af}(1-\text{af})\) with the copy-number variance.

static beta_to_fold_change(beta)

Fold change in rate per allele copy (log link only).

\[\text{fold change} = e^{\beta}\]
Parameters:

beta (float) – log-rate-ratio per allele.

Returns:

multiplicative rate ratio per allele.

Return type:

fold_change (float)

static beta_to_log_rr(beta, mu)

Convert an identity-link \(\beta\) to an approximate log-rate-ratio.

First-order delta method on the log transformation:

\[\log\text{RR} \approx \frac{\beta}{\mu}\]

Accurate when \(\beta \ll \mu\).

Parameters:
  • beta (float) – per-allele rate change (identity-link units).

  • mu (float) – population mean count.

Returns:

approximate log-rate-ratio per allele.

Return type:

log_rr (float)

static log_rr_to_beta(log_rr, mu)

Convert a log-rate-ratio to an approximate identity-link \(\beta\).

Inverse of beta_to_log_rr() (same small-effect approximation applies):

\[\beta \approx \mu \cdot \log\text{RR}\]
Parameters:
  • log_rr (float) – log-rate-ratio per allele.

  • mu (float) – population mean count.

Returns:

approximate per-allele rate change.

Return type:

beta (float)

ncp_poisson(n=100, af=0.2, beta=0.1, r2=1.0, var_g=None)

Non-centrality parameter for the Poisson-trait score test.

\[\begin{split}\lambda = r^2 \, N \, \beta^2 \cdot V_g \cdot \begin{cases} \mu & \text{log link} \\ 1/\mu & \text{identity link} \end{cases}\end{split}\]

where \(V_g = 2\,\text{af}(1-\text{af})\) for SNPs or \(V_g = \text{Var}(C)\) when var_g is supplied.

Parameters:
  • n (int) – number of individuals.

  • af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when var_g is provided.

  • beta (float) – per-allele log-rate-ratio (log link) or rate change (identity link).

  • r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

non-centrality parameter.

Return type:

ncp (float)

poisson_trait_beta_power(n=100, af=0.2, power=0.8, r2=1.0, alpha=5e-08, var_g=None)

Minimum detectable \(|\beta|\) at the target power level.

The solver bracket upper bound is:

  • Log link: \(\log(100)\) (no analytical bound; cap avoids blowup).

  • Identity link (SNP): \(\mu / (2\,\text{af})\), the tightest constraint keeping all Poisson means positive (\(\mu_i = \mu + \beta(g_i - 2\,\text{af}) > 0\) at \(g_i = 0\)).

  • Identity link (CNV): \(\mu\) as a conservative bound when var_g is supplied and the copy-number range is unknown.

Parameters:
  • n (int) – number of individuals.

  • af (float) – allele frequency. Used for \(\beta_\max\) (identity link) when var_g is None.

  • power (float) – target power level.

  • r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).

  • alpha (float) – p-value threshold.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

minimum detectable \(\beta\).

Return type:

opt_beta (float)

poisson_trait_opt_n(af=0.2, beta=0.1, power=0.8, r2=1.0, alpha=5e-08, var_g=None)

Minimum sample size to achieve target power.

Parameters:
  • af (float) – allele frequency. Ignored when var_g is provided.

  • beta (float) – per-allele log-rate-ratio (log link) or rate change (identity link).

  • power (float) – target power level.

  • r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).

  • alpha (float) – p-value threshold.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

required \(N\) (fractional; take \(\lceil \cdot \rceil\) in practice).

Return type:

opt_n (float)

poisson_trait_power(n=100, af=0.2, beta=0.1, r2=1.0, alpha=5e-08, var_g=None)

Power to detect association under the Poisson trait model.

Parameters:
  • n (int) – number of individuals.

  • af (float) – allele frequency. Ignored when var_g is provided.

  • beta (float) – per-allele log-rate-ratio (log link) or rate change (identity link).

  • r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).

  • alpha (float) – p-value threshold.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

power in \([0, 1]\).

Return type:

power (float)

power_curve(sample_sizes, af=0.2, beta=0.1, r2=1.0, alpha=5e-08, var_g=None)

Power as a function of sample size (vectorised).

All NCPs are computed in one pass and a single ncx2.cdf() call is made — no Python loop over sample sizes.

Parameters:
  • sample_sizes (array-like) – array of \(N\) values.

  • af (float) – allele frequency. Ignored when var_g is provided.

  • beta (float) – per-allele log-rate-ratio (log link) or rate change (identity link).

  • r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).

  • alpha (float) – p-value threshold.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

power at each sample size.

Return type:

powers (np.ndarray)

class qtl_power.gwas.GwasQuant

Bases: Gwas

Power calculations for a GWAS on a standardised quantitative trait.

The linear model is \(Y_i = \mu + \beta\,g_i + \varepsilon_i\) with \(\varepsilon_i \sim N(0, 1)\) and additive genotype \(g_i \in \{0, 1, 2\}\) (HWE). The score-test NCP is:

\[\lambda = r^2 \, N \, \beta^2 \cdot 2\,\text{af}(1 - \text{af})\]

where \(r^2\) is the LD / imputation-accuracy squared correlation between the causal variant and the typed tag.

For CNV predictors pass var_g = \(\text{Var}(C)\) to replace \(2\,\text{af}(1-\text{af})\) with the copy-number variance.

ncp_quant(n=100, af=0.1, beta=0.1, r2=1.0, var_g=None)

Non-centrality parameter for a quantitative trait GWAS.

\[\lambda = r^2 \, N \, \beta^2 \cdot V_g\]

where \(V_g = 2\,\text{af}(1-\text{af})\) for SNPs or \(V_g = \text{Var}(C)\) when var_g is supplied.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • af (float) – allele frequency of variant (\(0 < \text{af} < 1\)). Ignored when var_g is provided.

  • beta (float) – per-allele effect size in phenotypic-SD units.

  • r2 (float) – LD \(r^2\) between causal and tagged variant (\(0 < r^2 \leq 1\)).

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

non-centrality parameter \(\lambda\).

Return type:

ncp (float)

quant_trait_beta_power(n=100, power=0.9, af=0.1, r2=1.0, alpha=5e-08, var_g=None)

Minimum detectable effect size at the target power level.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • power (float) – target power level in \((0, 1)\).

  • af (float) – allele frequency of variant (\(0 < \text{af} < 1\)). Ignored when var_g is provided.

  • r2 (float) – LD \(r^2\) (\(0 < r^2 \leq 1\)).

  • alpha (float) – p-value threshold for GWAS.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

minimum detectable \(|\beta|\).

Return type:

opt_beta (float)

quant_trait_opt_n(beta=0.1, power=0.9, af=0.1, r2=1.0, alpha=5e-08, var_g=None)

Minimum sample size to achieve the target power.

Parameters:
  • beta (float) – per-allele effect size in phenotypic-SD units.

  • power (float) – target power level in \((0, 1)\).

  • af (float) – allele frequency of variant (\(0 < \text{af} < 1\)). Ignored when var_g is provided.

  • r2 (float) – LD \(r^2\) (\(0 < r^2 \leq 1\)).

  • alpha (float) – p-value threshold for GWAS.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

required \(N\) (fractional; take

\(\lceil \cdot \rceil\) in practice).

Return type:

opt_n (float)

quant_trait_power(n=100, af=0.1, beta=0.1, r2=1.0, alpha=5e-08, var_g=None)

Power for a quantitative trait association study.

Parameters:
  • n (int) – sample-size of unrelated individuals.

  • af (float) – allele frequency of variant (\(0 < \text{af} < 1\)). Ignored when var_g is provided.

  • beta (float) – per-allele effect size in phenotypic-SD units.

  • r2 (float) – LD \(r^2\) (\(0 < r^2 \leq 1\)).

  • alpha (float) – p-value threshold for GWAS.

  • var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).

Returns:

power in \([0, 1]\).

Return type:

power (float)

qtl_power.rare_variants module

Estimating power for rare-variant association methods from PAGEANT.

class qtl_power.rare_variants.RareVariantBurdenPower

Bases: RareVariantPower

Approximation of power for rare-variant burden tests based on results from Derkach et al (2018).

ncp_burden_test_model1(n=100, j=30, jd=10, jp=0, tev=0.1)

Approximation of the non-centrality parameter under model S1 from Derkach et al.

The key assumption in this case is that there is independence between an alleles effect-size and EV.

Parameters:
  • n (int) – total sample size.

  • j (int) – total number of variants in the gene.

  • jd (int) – number of disease variants in the gene.

  • jp (int) – number of protective variants in the gene.

  • tev (float) – proportion of variance explained by gene.

Returns:

non-centrality parameter.

Return type:

ncp (float)

ncp_burden_test_model2(ws, ps, n=100, jd=10, tev=0.1)

Approximation of the non-centrality parameter under model S2 from Derkach et al.

The key assumption in this case is that there is independence between alleles effect-size and its MAF.

Parameters:
  • ws (np.array) – numpy array of variant weights

  • ps (np.array) – numpy array of variant frequencies

  • n (int) – total sample size.

  • jd (int) – number of disease variants in the gene.

  • jp (int) – number of protective variants in the gene.

  • tev (float) – proportion of variance explained by gene.

Returns:

non-centrality parameter.

Return type:

ncp (float)

ncp_burden_test_model3(ws, ps, n=100, jd=10, tev=0.1, eta=0.1)

Approximation of the non-centrality parameter under model S3 from Derkach et al.

The key assumption in this case is that alleles effect-size is strongly coupled to its MAF.

Parameters:
  • n (int) – total sample size.

  • j (int) – total number of variants in the gene.

  • jd (int) – number of disease variants in the gene.

  • jp (int) – number of protective variants in the gene.

  • tev (float) – proportion of variance explained by gene.

Returns:

non-centrality parameter.

Return type:

ncp (float)

opt_n_burden_model1(j=30, tev=0.01, prop_causal=0.8, prop_risk=0.5, alpha=1e-06, power=0.8)

Estimate the sample-size required for detection of supplied TEV in a region.

Parameters:
  • j (int) – total number of variants in the gene.

  • tev (float) – proportion of variance explained by gene.

  • prop_causal (float) – proportion of causal variants.

  • prop_risk (float) – number of protective variants.

  • alpha (float) – p-value threshold for power.

  • power (float) – power for detection under the burden model.

Returns:

TEV required for detection at this rate.

Return type:

opt_tev (float)

power_burden_model1(n=100, j=30, prop_causal=0.8, prop_risk=0.1, tev=0.1, alpha=1e-06)

Estimate the power under a burden model 1 from PAGEANT.

Parameters:
  • n (int) – total sample size.

  • j (int) – total number of variants in the gene.

  • prop_causal (float) – proportion of causal variants.

  • prop_risk (float) – number of protective variants.

  • tev (float) – proportion of variance explained by gene.

  • alpha (float) – p-value threshold for power.

Returns:

power for detection under the burden model.

Return type:

power (float)

power_burden_model1_real(n=100, nreps=10, **kwargs)

Estimate power under model 1 from PAGEANT with realistic variants per gene.

Parameters:
  • n (int) – number of samples

  • nreps (int) – number of replicates

Returns:

array of power estimates based on realistic number of variants.

Return type:

est_power (np.array)

power_burden_model2(ws, ps, n=100, j=30, prop_causal=0.8, prop_risk=0.1, tev=0.1, alpha=1e-06)

Estimate the power under a burden model 1 from PAGEANT.

Parameters:
  • n (int) – total sample size.

  • j (int) – total number of variants in the gene.

  • prop_causal (float) – proportion of causal variants.

  • prop_risk (float) – number of protective variants.

  • tev (float) – proportion of variance explained by gene.

  • alpha (float) – p-value threshold for power.

Returns:

power for detection under the burden model.

Return type:

power (float)

tev_power_burden_model1(n=100, j=30, prop_causal=0.8, prop_risk=0.5, alpha=1e-06, power=0.8)

Estimate the total explained variance by a region for adequate detection at a power threshold.

Parameters:
  • n (int) – total sample size.

  • j (int) – total number of variants in the gene.

  • prop_causal (float) – proportion of causal variants.

  • prop_risk (float) – number of protective variants.

  • alpha (float) – p-value threshold for power.

  • power (float) – power for detection under the burden model.

Returns:

TEV required for detection at this rate.

Return type:

opt_tev (float)

class qtl_power.rare_variants.RareVariantPower

Bases: object

Power calculator for rare-variant power.

Methods based on derivations from [PAGEANT](https://doi.org/10.1093/bioinformatics/btx770)

llr_power(alpha=1e-06, df=1, ncp=1, ncp0=0)

Power under a non-central chi-squared distribution.

Parameters:
  • alpha (float) – p-value threshold for GWAS

  • df (int) – degrees of freedom

  • ncp (float) – non-centrality parameter

  • ncp0 (float) – null non-centrality parameter

Returns:

power for association

Return type:

power (float)

sim_af_weights(j=100, a1=0.1846, b1=11.1248, n=100, clip=True, seed=42, test='SKAT')

Simulate allele frequencies from a beta distribution.

Ideally the beta distribution is derived from realized allele frequencies. The current parameters are based on 15k African ancestry individuals. For mimicing a much larger set (112k) of Non-Finnish European individuals, use the parameters a1=0.14311324240262455, b1=26.97369198989023,

Parameters:
  • j (int) – number of variants

  • a1 (float) – shape parameter of the beta distribution

  • b1 (float) – scale parameter of the beta distribution

  • n (float) – number of samples

  • clip (boolean) – perform clipping based on the current sample-size.

  • seed (int) – random seed.

  • test (string) – type of test to be performed (SKAT, Calpha, Hotelling)

Returns:

array of weights per-variant. ps (np.array): array of allele frequencies.

Return type:

ws (np.array)

sim_var_per_gene(a=1.47, b=0.0108, seed=42)

Simulate the number of variants per-gene.

Parameter values are derived from GnomAD Exonic variants on Chromosome 4 from ~15730 AFR ancestry subjects.

For a Non-Finnish European ancestry setting with larger sample size (~112350), use a=1.44306, b=0.00372.

Parameters:
  • a (float) – shape parameter for a gamma distribution

  • b (float) – scale parameter for a gamma distribution

  • seed (int) – random seed.

Returns:

number of variants per-gene.

Return type:

nvar (int)

class qtl_power.rare_variants.RareVariantVCPower

Bases: RareVariantPower

Approximation of power for rare-variant variance component tests based on results from Derkach et al (2018).

match_cumulants_ncp(c1, c2, c3, c4)

Obtain the degrees of freedom and non-centrality parameter from cumulants.

Parameters:
  • c1 (float) – first cumulant of non-central chi-squared dist.

  • c2 (float) – second cumulant of non-central chi-squared dist.

  • c3 (float) – third cumulant of non-central chi-squared dist.

  • c4 (float) – fourth cumulant of non-central chi-squared dist.

Returns:

degrees of freedom for test. ncp (float): non-centrality parameter.

Return type:

df (int)

ncp_vc_first_order_model1(ws, ps, n=100, tev=0.1)

Approximation of the non-centrality parameter under model S1 from Derkach et al.

The key assumption is independence between an alleles effect-size and its MAF, from Table S1 in Derkach et al.

Parameters:
  • ws (np.array) – numpy array of weights per-variant

  • ps (np.array) – numpy array of allele frequencies

  • n (int) – sample size

  • tev (float) – total explained variance by a locus

Returns:

degrees of freedom for variance component test ncp (float): non-centrality parameter

Return type:

df (float)

power_vc_first_order_model1(ws, ps, n=100, tev=0.1, alpha=1e-06, df=1)

Compute the power for detection under model 1 for a variance component test.

Parameters:
  • ws (np.array) – numpy array of weights per-variant

  • ps (np.array) – numpy array of allele frequencies

  • n (int) – sample size

  • tev (float) – total explained variance by a locus

  • alpha (float) – total significance level for estimation of power

  • df (float) – degree of freedom for test

Returns:

estimated power under this variance component model.

Return type:

power (float)