qtl_power package¶
Module contents¶
Initialization of qtl-power module.
Submodules¶
qtl_power.extreme_pheno module¶
Power calculations for extreme phenotype sampling designs.
- class qtl_power.extreme_pheno.ExtremePhenotype¶
Bases:
objectClass defining extreme phenotype designs.
- est_power_extreme_pheno(n=100, af=0.01, beta=0.1, niter=100, alpha=0.05, q0=0.1, q1=0.1)¶
Estimate the power from an extreme-phenotype sampling design.
- Parameters:
n (int) – total sample size.
af (float) – allele frequency of tested variant (must be <= 0.5).
beta (float) – effect-size in standard deviations.
niter (int) – number of simulation iterations.
alpha (float) – significance threshold for Fishers Exact Test.
q0 (float) – bottom quantile to establish as controls (or low-extremes).
q1 (float) – upper quantile to establish as cases (or upper extremes).
- Returns:
power of extreme sampling design
- Return type:
power (float)
- sim_extreme_pheno(n=100, af=0.01, beta=0.1, seed=42)¶
Simulate an extreme phenotype under an HWE assumption.
- Parameters:
n (int) – total sample size.
af (float) – allele frequency of tested variant (must be <= 0.5).
beta (float) – effect-size in standard deviations.
seed (int) – random seed for simulations.
- Returns:
vector of allele-counts. phenotypes (np.array): quantitative phenotypes.
- Return type:
allele_count (np.array)
qtl_power.gwas module¶
Functions to calculate power in GWAS designs.
- class qtl_power.gwas.Gwas¶
Bases:
objectParent class for GWAS Power calculation.
- static genotype_var(af, var_g)¶
Return genotype variance: var_g if provided, else 2*af*(1-af).
- llr_power(alpha=5e-08, df=1, ncp=1)¶
Power under a non-central chi-squared distribution.
\[\text{power} = 1 - F_{\chi^2(df,\,\lambda)}\!\left(q_{1-\alpha}\right)\]where \(q_{1-\alpha}\) is the \((1-\alpha)\) quantile of the central \(\chi^2(df)\) distribution and \(\lambda\) is the non-centrality parameter.
- Parameters:
alpha (float) – p-value threshold for GWAS
df (int) – degrees of freedom
ncp (float) – non-centrality parameter \(\lambda\)
- Returns:
power for association
- Return type:
power (float)
- class qtl_power.gwas.GwasBinary¶
Bases:
GwasGWAS power calculator for a case/control study design.
Under an additive model the score-test NCP is:
\[\lambda = r^2 \, N \, \beta^2 \cdot 2\,\text{af}(1-\text{af}) \cdot K(1-K)\]where \(K\) is the proportion of cases, \(\beta\) is the per-allele log-OR (small-effect linear approximation), and \(r^2\) is the LD / imputation-accuracy squared correlation.
For CNV predictors pass
var_g= \(\text{Var}(C)\) to replace \(2\,\text{af}(1-\text{af})\) with the copy-number variance.- binary_trait_beta_power(n=100, power=0.9, af=0.1, r2=1.0, alpha=5e-08, prop_cases=0.5, var_g=None)¶
Minimum detectable effect size under a case-control GWAS study design.
- Parameters:
n (int) – sample-size of unrelated individuals.
power (float) – target power level in \((0, 1)\).
af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when
var_gis provided.r2 (float) – LD \(r^2\) (\(0 \leq r^2 \leq 1\)).
alpha (float) – p-value threshold for detection.
prop_cases (float) – proportion of samples that are cases.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
minimum detectable \(|\beta|\).
- Return type:
opt_beta (float)
- binary_trait_opt_n(beta=0.1, power=0.9, af=0.1, r2=1.0, alpha=5e-08, prop_cases=0.5, var_g=None)¶
Minimum sample size to achieve the target power.
- Parameters:
beta (float) – per-allele effect size.
power (float) – target power level in \((0, 1)\).
af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when
var_gis provided.r2 (float) – LD \(r^2\) (\(0 \leq r^2 \leq 1\)).
alpha (float) – p-value threshold for GWAS.
prop_cases (float) – proportion of cases in the dataset.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
- required \(N\) (fractional; take
\(\lceil \cdot \rceil\) in practice).
- Return type:
opt_n (float)
- binary_trait_power(n=100, af=0.1, beta=0.1, r2=1.0, alpha=5e-08, prop_cases=0.1, var_g=None)¶
Power under a case-control GWAS study design.
- Parameters:
n (int) – sample-size of unrelated individuals.
af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when
var_gis provided.beta (float) – per-allele effect size.
r2 (float) – LD \(r^2\) (\(0 \leq r^2 \leq 1\)).
alpha (float) – p-value threshold for detection.
prop_cases (float) – proportion of samples that are cases.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
power in \([0, 1]\).
- Return type:
power (float)
- ncp_binary(n=100, af=0.1, beta=0.1, r2=1.0, prop_cases=0.1, var_g=None)¶
Non-centrality parameter for a case/control GWAS.
\[\lambda = r^2 \, N \, \beta^2 \cdot V_g \cdot K(1-K)\]where \(K\) =
prop_casesand \(V_g = 2\,\text{af}(1-\text{af})\) for SNPs or \(V_g = \text{Var}(C)\) whenvar_gis supplied.- Parameters:
n (int) – sample-size of unrelated individuals.
af (float) – allele frequency (\(0 \leq \text{af} \leq 1\)). Ignored when
var_gis provided.beta (float) – per-allele effect size (linear approximation to log-OR).
r2 (float) – LD \(r^2\) (\(0 \leq r^2 \leq 1\)).
prop_cases (float) – proportion of samples that are cases (\(0 < K < 1\)).
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
non-centrality parameter \(\lambda\).
- Return type:
ncp (float)
- class qtl_power.gwas.GwasBinaryModel¶
Bases:
GwasGWAS Power calculations under different encodings of genotypic risk.
- binary_trait_beta_power_model(n=100, af=0.1, model='additive', prev=0.01, alpha=5e-08, prop_cases=0.5, power=0.9)¶
Threshold effects under a specific power threshold and genetic model.
- Parameters:
n (int) – sample-size of unrelated individuals.
af (float) – allele frequency of variant.
beta (float) – effect-size of variant (in terms of relative-risk).
model (string) – genetic model for effects (additive, recessive, or dominant).
prev (float) – prevalence of the trait in question.
alpha (float) – p-value threshold for detection.
prop_cases (float) – proportion of samples that are cases.
power (float) – power under the model.
- Returns:
detectable effect-size at the power threshold and model.
- Return type:
opt_beta (float)
- binary_trait_power_model(n=100, af=0.1, beta=0.1, model='additive', prev=0.01, alpha=5e-08, prop_cases=0.5)¶
Power under a case-control GWAS study design.
- Parameters:
n (int) – sample-size of unrelated individuals.
af (float) – allele frequency of variant.
beta (float) – effect-size of variant (in terms of relative-risk).
model (string) – genetic model for effects (additive, recessive, or dominant).
prev (float) – prevalence of the trait in question.
alpha (float) – p-value threshold for detection.
prop_cases (float) – proportion of samples that are cases.
- Returns:
power under the model.
- Return type:
power (float)
- ncp_binary_model(n=100, af=0.1, beta=0.1, model='additive', prev=0.01, alpha=5e-08, prop_cases=0.5)¶
Explore how multiple models affect power in case-control traits.
- class qtl_power.gwas.GwasBinomialTrait(mu=0.5)¶
Bases:
GwasGWAS power calculator for a binomial count trait.
Each individual contributes \(n_i\) Bernoulli trials:
\[Y_i \sim \text{Binomial}(n_i,\; p_i),\quad p_i = \mu + \beta\,(g_i - 2\,\text{af})\]where \(g_i \in \{0, 1, 2\}\) is the additive genotype under HWE and the genotype is mean-centred so that \(\mu = E[p_i]\) is the population mean success probability, invariant when sweeping af.
The score-test non-centrality parameter is:
\[\lambda = r^2 \, N \, \beta^2 \cdot 2\,\text{af}(1-\text{af}) \cdot \frac{\bar{n}}{\mu(1-\mu)}\]where \(\bar{n}\) is the mean number of trials per individual and \(r^2\) is the LD / imputation-accuracy squared correlation between the causal variant and the typed/imputed tag (\(r^2 = 1\) gives the perfectly-typed case).
For CNV predictors pass
var_g= \(\text{Var}(C)\) to replace \(2\,\text{af}(1-\text{af})\) with the copy-number variance.- static beta_to_log_or(beta, mu)¶
Convert raw probability \(\beta\) to an approximate log-odds ratio.
First-order delta method on the logit transformation:
\[\log\text{OR} \approx \frac{\beta}{\mu(1-\mu)}\]Valid when \(\beta\) is small relative to \(\mu(1-\mu)\). Accuracy degrades when \(\mu\) is near 0 or 1, or when the raw \(\beta\) is large.
- Parameters:
beta (float) – per-allele change in success probability (raw units).
mu (float) – population mean success probability.
- Returns:
approximate log-odds ratio per allele.
- Return type:
log_or (float)
- static beta_to_sd_units(beta, mu, n_mean)¶
Convert raw probability \(\beta\) to phenotypic-SD units.
The per-individual rate \(Y_i/n_i\) has variance \(\mu(1-\mu)/\bar{n}\), so:
\[\beta_\text{SD} = \frac{\beta}{\sqrt{\mu(1-\mu)/\bar{n}}}\]This is exact under the identity-link model. Deeper sequencing (larger \(\bar{n}\)) makes the same raw \(\beta\) appear larger in SD units.
- Parameters:
beta (float) – per-allele change in success probability (raw units).
mu (float) – population mean success probability.
n_mean (float) – mean number of trials per individual \(\bar{n}\).
- Returns:
effect size in units of phenotypic SD.
- Return type:
beta_sd (float)
- binomial_trait_beta_power(n=100, af=0.2, n_mean=10.0, power=0.8, r2=1.0, alpha=5e-08, var_g=None)¶
Minimum detectable \(|\beta|\) at the target power level.
For SNPs (\(g \in \{0,1,2\}\)), \(\beta\) is bounded above so that \(p_i = \mu + \beta(g_i - 2\,\text{af})\) stays in \((0,1)\) for all genotypes:
\[\beta_\max = \min\!\left( \frac{1-\mu}{2(1-\text{af})},\; \frac{\mu}{2\,\text{af}} \right)\]When
var_gis supplied the copy-number range is unknown, so the solver uses \(\beta_\max = \min(\mu,\,1-\mu)\) as a conservative bound (valid at the population mean; may not hold for extreme copy numbers in the tails of the CNV distribution).- Parameters:
n (int) – number of individuals.
af (float) – allele frequency (\(0 < \text{af} < 1\)). Used for \(\beta_\max\) when
var_gisNone.n_mean (float) – mean trials per individual \(\bar{n}\).
power (float) – target power level in \((0, 1)\).
r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).
alpha (float) – p-value threshold.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
minimum detectable \(|\beta|\).
- Return type:
opt_beta (float)
- binomial_trait_opt_n(af=0.2, beta=0.05, n_mean=10.0, power=0.8, r2=1.0, alpha=5e-08, var_g=None)¶
Minimum sample size to achieve the target power.
- Parameters:
af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when
var_gis provided.beta (float) – per-allele change in success probability.
n_mean (float) – mean trials per individual \(\bar{n}\).
power (float) – target power level in \((0, 1)\).
r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).
alpha (float) – p-value threshold.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
- required \(N\) (fractional; take
\(\lceil \cdot \rceil\) in practice).
- Return type:
opt_n (float)
- binomial_trait_power(n=100, af=0.2, beta=0.05, n_mean=10.0, r2=1.0, alpha=5e-08, var_g=None)¶
Power to detect the association under the binomial trait model.
- Parameters:
n (int) – number of individuals.
af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when
var_gis provided.beta (float) – per-allele change in success probability.
n_mean (float) – mean trials per individual \(\bar{n}\).
r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).
alpha (float) – p-value threshold.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
power in \([0, 1]\).
- Return type:
power (float)
- binomial_trait_power_with_nvar(n=100, af=0.2, beta=0.05, n_mean=10.0, n_var=0.0, n_sigma=1.0, r2=1.0, alpha=5e-08, var_g=None)¶
Power with \(\pm\)
n_sigmauncertainty bands from variable trial counts.- Parameters:
n (int) – number of individuals.
af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when
var_gis provided.beta (float) – per-allele change in success probability.
n_mean (float) – mean trials per individual \(\bar{n}\).
n_var (float) – variance of trials per individual.
n_sigma (float) – number of NCP standard deviations for the bands.
r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).
alpha (float) – p-value threshold.
var_g (float, optional) – genotype / copy-number variance. Supported only when
n_var= 0; seencp_binomial_sd().
- Returns:
power at \(\lambda - \sigma_\lambda\), \(\lambda\), and \(\lambda + \sigma_\lambda\).
- Return type:
(power_low, power_mid, power_high) (tuple[float, float, float])
- static log_or_to_beta(log_or, mu)¶
Convert a log-odds ratio to an approximate raw probability \(\beta\).
Inverse of
beta_to_log_or()(same small-effect approximation applies):\[\beta \approx \log\text{OR} \cdot \mu(1-\mu)\]- Parameters:
log_or (float) – log-odds ratio per allele.
mu (float) – population mean success probability.
- Returns:
approximate per-allele change in success probability.
- Return type:
beta (float)
- static mu_from_p0(p0, af, beta)¶
Convert baseline probability \(p_0 = \Pr(\text{success} \mid g=0)\) to population mean \(\mu\).
\[\mu = p_0 + 2\,\text{af}\cdot\beta\]- Parameters:
p0 (float) – success probability for the \(g=0\) genotype.
af (float) – allele frequency.
beta (float) – per-allele change in success probability.
- Returns:
population mean success probability.
- Return type:
mu (float)
- ncp_binomial(n=100, af=0.2, beta=0.05, n_mean=10.0, r2=1.0, var_g=None)¶
Non-centrality parameter for the binomial-trait score test.
\[\lambda = r^2 \, N \, \beta^2 \cdot V_g \cdot \frac{\bar{n}}{\mu(1-\mu)}\]where \(V_g = 2\,\text{af}(1-\text{af})\) for SNPs or \(V_g = \text{Var}(C)\) when
var_gis supplied.- Parameters:
n (int) – number of individuals.
af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when
var_gis provided.beta (float) – per-allele change in success probability.
n_mean (float) – mean number of Binomial trials per individual \(\bar{n}\).
r2 (float) – LD / imputation-accuracy \(r^2\) between causal and typed variant (\(0 < r^2 \leq 1\)).
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
non-centrality parameter \(\lambda\).
- Return type:
ncp (float)
- ncp_binomial_sd(n=100, af=0.2, beta=0.05, n_mean=10.0, n_var=0.0, r2=1.0, var_g=None)¶
Compute the standard deviation of the NCP due to variable trials.
By the delta method:
\[\text{Var}(\lambda) \approx \lambda^2 \cdot \frac{\text{Var}(\tilde{g}^2 n_i)} {\bigl(E[\tilde{g}^2]\,\bar{n}\bigr)^2 \, N}\]where \(\tilde{g} = g - 2\,\text{af}\). Returns 0.0 when
n_var= 0 (fixed-trial design).The higher-moment calculation requires the full \(\{0,1,2\}\) genotype distribution and is therefore not supported when
var_gis supplied andn_var> 0.- Parameters:
n (int) – number of individuals.
af (float) – allele frequency (\(0 < \text{af} < 1\)).
beta (float) – per-allele change in success probability.
n_mean (float) – mean trials per individual \(\bar{n}\).
n_var (float) – variance of trials per individual (\(\geq 0\)).
r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).
var_g (float, optional) – genotype / copy-number variance. Supported only when
n_var= 0 (fixed-trial design); raisesNotImplementedErrorotherwise.
- Returns:
standard deviation of the realised NCP.
- Return type:
sd (float)
- power_curve(sample_sizes, af=0.2, beta=0.05, n_mean=10.0, r2=1.0, alpha=5e-08, var_g=None)¶
Power as a function of sample size (vectorised).
All NCPs are computed in one pass and a single
ncx2.cdf()call is made — no Python loop over sample sizes.- Parameters:
sample_sizes (array-like) – array of \(N\) values.
af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when
var_gis provided.beta (float) – per-allele change in success probability.
n_mean (float) – mean trials per individual \(\bar{n}\).
r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).
alpha (float) – p-value threshold.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
power at each sample size.
- Return type:
powers (np.ndarray)
- static sd_units_to_beta(beta_sd, mu, n_mean)¶
Convert SD-unit effect size back to raw probability units.
Inverse of
beta_to_sd_units():\[\beta = \beta_\text{SD} \cdot \sqrt{\frac{\mu(1-\mu)}{\bar{n}}}\]- Parameters:
beta_sd (float) – effect size in units of phenotypic SD.
mu (float) – population mean success probability.
n_mean (float) – mean number of trials per individual \(\bar{n}\).
- Returns:
per-allele change in success probability.
- Return type:
beta (float)
- class qtl_power.gwas.GwasPoisson(mu=1.0, link='log')¶
Bases:
GwasGWAS power calculator for a Poisson count trait.
The outcome \(Y_i \sim \text{Poisson}(\mu_i)\) is linked to the additive genotype \(g_i \in \{0, 1, 2\}\) (HWE) via:
Log link (default, \(\beta\) is a log-rate-ratio per allele):
\[\log(\mu_i) = \log(\mu) + \beta\,(g_i - 2\,\text{af})\]Identity link (\(\beta\) is an absolute rate change per allele):
\[\mu_i = \mu + \beta\,(g_i - 2\,\text{af})\]\(\mu\) is the population mean count at the null. The genotype is mean-centred (\(g_i - 2\,\text{af}\)) so the NCP is symmetric in af.
The score-test non-centrality parameter is:
\[\begin{split}\lambda = r^2 \, N \, \beta^2 \cdot 2\,\text{af}(1-\text{af}) \cdot \begin{cases} \mu & \text{log link} \\ 1/\mu & \text{identity link} \end{cases}\end{split}\]For CNV predictors pass
var_g= \(\text{Var}(C)\) to replace \(2\,\text{af}(1-\text{af})\) with the copy-number variance.- static beta_to_fold_change(beta)¶
Fold change in rate per allele copy (log link only).
\[\text{fold change} = e^{\beta}\]- Parameters:
beta (float) – log-rate-ratio per allele.
- Returns:
multiplicative rate ratio per allele.
- Return type:
fold_change (float)
- static beta_to_log_rr(beta, mu)¶
Convert an identity-link \(\beta\) to an approximate log-rate-ratio.
First-order delta method on the log transformation:
\[\log\text{RR} \approx \frac{\beta}{\mu}\]Accurate when \(\beta \ll \mu\).
- Parameters:
beta (float) – per-allele rate change (identity-link units).
mu (float) – population mean count.
- Returns:
approximate log-rate-ratio per allele.
- Return type:
log_rr (float)
- static log_rr_to_beta(log_rr, mu)¶
Convert a log-rate-ratio to an approximate identity-link \(\beta\).
Inverse of
beta_to_log_rr()(same small-effect approximation applies):\[\beta \approx \mu \cdot \log\text{RR}\]- Parameters:
log_rr (float) – log-rate-ratio per allele.
mu (float) – population mean count.
- Returns:
approximate per-allele rate change.
- Return type:
beta (float)
- ncp_poisson(n=100, af=0.2, beta=0.1, r2=1.0, var_g=None)¶
Non-centrality parameter for the Poisson-trait score test.
\[\begin{split}\lambda = r^2 \, N \, \beta^2 \cdot V_g \cdot \begin{cases} \mu & \text{log link} \\ 1/\mu & \text{identity link} \end{cases}\end{split}\]where \(V_g = 2\,\text{af}(1-\text{af})\) for SNPs or \(V_g = \text{Var}(C)\) when
var_gis supplied.- Parameters:
n (int) – number of individuals.
af (float) – allele frequency (\(0 < \text{af} < 1\)). Ignored when
var_gis provided.beta (float) – per-allele log-rate-ratio (log link) or rate change (identity link).
r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
non-centrality parameter.
- Return type:
ncp (float)
- poisson_trait_beta_power(n=100, af=0.2, power=0.8, r2=1.0, alpha=5e-08, var_g=None)¶
Minimum detectable \(|\beta|\) at the target power level.
The solver bracket upper bound is:
Log link: \(\log(100)\) (no analytical bound; cap avoids blowup).
Identity link (SNP): \(\mu / (2\,\text{af})\), the tightest constraint keeping all Poisson means positive (\(\mu_i = \mu + \beta(g_i - 2\,\text{af}) > 0\) at \(g_i = 0\)).
Identity link (CNV): \(\mu\) as a conservative bound when
var_gis supplied and the copy-number range is unknown.
- Parameters:
n (int) – number of individuals.
af (float) – allele frequency. Used for \(\beta_\max\) (identity link) when
var_gisNone.power (float) – target power level.
r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).
alpha (float) – p-value threshold.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
minimum detectable \(\beta\).
- Return type:
opt_beta (float)
- poisson_trait_opt_n(af=0.2, beta=0.1, power=0.8, r2=1.0, alpha=5e-08, var_g=None)¶
Minimum sample size to achieve target power.
- Parameters:
af (float) – allele frequency. Ignored when
var_gis provided.beta (float) – per-allele log-rate-ratio (log link) or rate change (identity link).
power (float) – target power level.
r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).
alpha (float) – p-value threshold.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
required \(N\) (fractional; take \(\lceil \cdot \rceil\) in practice).
- Return type:
opt_n (float)
- poisson_trait_power(n=100, af=0.2, beta=0.1, r2=1.0, alpha=5e-08, var_g=None)¶
Power to detect association under the Poisson trait model.
- Parameters:
n (int) – number of individuals.
af (float) – allele frequency. Ignored when
var_gis provided.beta (float) – per-allele log-rate-ratio (log link) or rate change (identity link).
r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).
alpha (float) – p-value threshold.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
power in \([0, 1]\).
- Return type:
power (float)
- power_curve(sample_sizes, af=0.2, beta=0.1, r2=1.0, alpha=5e-08, var_g=None)¶
Power as a function of sample size (vectorised).
All NCPs are computed in one pass and a single
ncx2.cdf()call is made — no Python loop over sample sizes.- Parameters:
sample_sizes (array-like) – array of \(N\) values.
af (float) – allele frequency. Ignored when
var_gis provided.beta (float) – per-allele log-rate-ratio (log link) or rate change (identity link).
r2 (float) – LD / imputation-accuracy \(r^2\) (\(0 < r^2 \leq 1\)).
alpha (float) – p-value threshold.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
power at each sample size.
- Return type:
powers (np.ndarray)
- class qtl_power.gwas.GwasQuant¶
Bases:
GwasPower calculations for a GWAS on a standardised quantitative trait.
The linear model is \(Y_i = \mu + \beta\,g_i + \varepsilon_i\) with \(\varepsilon_i \sim N(0, 1)\) and additive genotype \(g_i \in \{0, 1, 2\}\) (HWE). The score-test NCP is:
\[\lambda = r^2 \, N \, \beta^2 \cdot 2\,\text{af}(1 - \text{af})\]where \(r^2\) is the LD / imputation-accuracy squared correlation between the causal variant and the typed tag.
For CNV predictors pass
var_g= \(\text{Var}(C)\) to replace \(2\,\text{af}(1-\text{af})\) with the copy-number variance.- ncp_quant(n=100, af=0.1, beta=0.1, r2=1.0, var_g=None)¶
Non-centrality parameter for a quantitative trait GWAS.
\[\lambda = r^2 \, N \, \beta^2 \cdot V_g\]where \(V_g = 2\,\text{af}(1-\text{af})\) for SNPs or \(V_g = \text{Var}(C)\) when
var_gis supplied.- Parameters:
n (int) – sample-size of unrelated individuals.
af (float) – allele frequency of variant (\(0 < \text{af} < 1\)). Ignored when
var_gis provided.beta (float) – per-allele effect size in phenotypic-SD units.
r2 (float) – LD \(r^2\) between causal and tagged variant (\(0 < r^2 \leq 1\)).
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
non-centrality parameter \(\lambda\).
- Return type:
ncp (float)
- quant_trait_beta_power(n=100, power=0.9, af=0.1, r2=1.0, alpha=5e-08, var_g=None)¶
Minimum detectable effect size at the target power level.
- Parameters:
n (int) – sample-size of unrelated individuals.
power (float) – target power level in \((0, 1)\).
af (float) – allele frequency of variant (\(0 < \text{af} < 1\)). Ignored when
var_gis provided.r2 (float) – LD \(r^2\) (\(0 < r^2 \leq 1\)).
alpha (float) – p-value threshold for GWAS.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
minimum detectable \(|\beta|\).
- Return type:
opt_beta (float)
- quant_trait_opt_n(beta=0.1, power=0.9, af=0.1, r2=1.0, alpha=5e-08, var_g=None)¶
Minimum sample size to achieve the target power.
- Parameters:
beta (float) – per-allele effect size in phenotypic-SD units.
power (float) – target power level in \((0, 1)\).
af (float) – allele frequency of variant (\(0 < \text{af} < 1\)). Ignored when
var_gis provided.r2 (float) – LD \(r^2\) (\(0 < r^2 \leq 1\)).
alpha (float) – p-value threshold for GWAS.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
- required \(N\) (fractional; take
\(\lceil \cdot \rceil\) in practice).
- Return type:
opt_n (float)
- quant_trait_power(n=100, af=0.1, beta=0.1, r2=1.0, alpha=5e-08, var_g=None)¶
Power for a quantitative trait association study.
- Parameters:
n (int) – sample-size of unrelated individuals.
af (float) – allele frequency of variant (\(0 < \text{af} < 1\)). Ignored when
var_gis provided.beta (float) – per-allele effect size in phenotypic-SD units.
r2 (float) – LD \(r^2\) (\(0 < r^2 \leq 1\)).
alpha (float) – p-value threshold for GWAS.
var_g (float, optional) – genotype / copy-number variance to use in place of \(2\,\text{af}(1-\text{af})\).
- Returns:
power in \([0, 1]\).
- Return type:
power (float)
qtl_power.rare_variants module¶
Estimating power for rare-variant association methods from PAGEANT.
- class qtl_power.rare_variants.RareVariantBurdenPower¶
Bases:
RareVariantPowerApproximation of power for rare-variant burden tests based on results from Derkach et al (2018).
- ncp_burden_test_model1(n=100, j=30, jd=10, jp=0, tev=0.1)¶
Approximation of the non-centrality parameter under model S1 from Derkach et al.
The key assumption in this case is that there is independence between an alleles effect-size and EV.
- Parameters:
n (int) – total sample size.
j (int) – total number of variants in the gene.
jd (int) – number of disease variants in the gene.
jp (int) – number of protective variants in the gene.
tev (float) – proportion of variance explained by gene.
- Returns:
non-centrality parameter.
- Return type:
ncp (float)
- ncp_burden_test_model2(ws, ps, n=100, jd=10, tev=0.1)¶
Approximation of the non-centrality parameter under model S2 from Derkach et al.
The key assumption in this case is that there is independence between alleles effect-size and its MAF.
- Parameters:
ws (np.array) – numpy array of variant weights
ps (np.array) – numpy array of variant frequencies
n (int) – total sample size.
jd (int) – number of disease variants in the gene.
jp (int) – number of protective variants in the gene.
tev (float) – proportion of variance explained by gene.
- Returns:
non-centrality parameter.
- Return type:
ncp (float)
- ncp_burden_test_model3(ws, ps, n=100, jd=10, tev=0.1, eta=0.1)¶
Approximation of the non-centrality parameter under model S3 from Derkach et al.
The key assumption in this case is that alleles effect-size is strongly coupled to its MAF.
- Parameters:
n (int) – total sample size.
j (int) – total number of variants in the gene.
jd (int) – number of disease variants in the gene.
jp (int) – number of protective variants in the gene.
tev (float) – proportion of variance explained by gene.
- Returns:
non-centrality parameter.
- Return type:
ncp (float)
- opt_n_burden_model1(j=30, tev=0.01, prop_causal=0.8, prop_risk=0.5, alpha=1e-06, power=0.8)¶
Estimate the sample-size required for detection of supplied TEV in a region.
- Parameters:
j (int) – total number of variants in the gene.
tev (float) – proportion of variance explained by gene.
prop_causal (float) – proportion of causal variants.
prop_risk (float) – number of protective variants.
alpha (float) – p-value threshold for power.
power (float) – power for detection under the burden model.
- Returns:
TEV required for detection at this rate.
- Return type:
opt_tev (float)
- power_burden_model1(n=100, j=30, prop_causal=0.8, prop_risk=0.1, tev=0.1, alpha=1e-06)¶
Estimate the power under a burden model 1 from PAGEANT.
- Parameters:
n (int) – total sample size.
j (int) – total number of variants in the gene.
prop_causal (float) – proportion of causal variants.
prop_risk (float) – number of protective variants.
tev (float) – proportion of variance explained by gene.
alpha (float) – p-value threshold for power.
- Returns:
power for detection under the burden model.
- Return type:
power (float)
- power_burden_model1_real(n=100, nreps=10, **kwargs)¶
Estimate power under model 1 from PAGEANT with realistic variants per gene.
- Parameters:
n (int) – number of samples
nreps (int) – number of replicates
- Returns:
array of power estimates based on realistic number of variants.
- Return type:
est_power (np.array)
- power_burden_model2(ws, ps, n=100, j=30, prop_causal=0.8, prop_risk=0.1, tev=0.1, alpha=1e-06)¶
Estimate the power under a burden model 1 from PAGEANT.
- Parameters:
n (int) – total sample size.
j (int) – total number of variants in the gene.
prop_causal (float) – proportion of causal variants.
prop_risk (float) – number of protective variants.
tev (float) – proportion of variance explained by gene.
alpha (float) – p-value threshold for power.
- Returns:
power for detection under the burden model.
- Return type:
power (float)
- tev_power_burden_model1(n=100, j=30, prop_causal=0.8, prop_risk=0.5, alpha=1e-06, power=0.8)¶
Estimate the total explained variance by a region for adequate detection at a power threshold.
- Parameters:
n (int) – total sample size.
j (int) – total number of variants in the gene.
prop_causal (float) – proportion of causal variants.
prop_risk (float) – number of protective variants.
alpha (float) – p-value threshold for power.
power (float) – power for detection under the burden model.
- Returns:
TEV required for detection at this rate.
- Return type:
opt_tev (float)
- class qtl_power.rare_variants.RareVariantPower¶
Bases:
objectPower calculator for rare-variant power.
Methods based on derivations from [PAGEANT](https://doi.org/10.1093/bioinformatics/btx770)
- llr_power(alpha=1e-06, df=1, ncp=1, ncp0=0)¶
Power under a non-central chi-squared distribution.
- Parameters:
alpha (float) – p-value threshold for GWAS
df (int) – degrees of freedom
ncp (float) – non-centrality parameter
ncp0 (float) – null non-centrality parameter
- Returns:
power for association
- Return type:
power (float)
- sim_af_weights(j=100, a1=0.1846, b1=11.1248, n=100, clip=True, seed=42, test='SKAT')¶
Simulate allele frequencies from a beta distribution.
Ideally the beta distribution is derived from realized allele frequencies. The current parameters are based on 15k African ancestry individuals. For mimicing a much larger set (112k) of Non-Finnish European individuals, use the parameters a1=0.14311324240262455, b1=26.97369198989023,
- Parameters:
j (int) – number of variants
a1 (float) – shape parameter of the beta distribution
b1 (float) – scale parameter of the beta distribution
n (float) – number of samples
clip (boolean) – perform clipping based on the current sample-size.
seed (int) – random seed.
test (string) – type of test to be performed (SKAT, Calpha, Hotelling)
- Returns:
array of weights per-variant. ps (np.array): array of allele frequencies.
- Return type:
ws (np.array)
- sim_var_per_gene(a=1.47, b=0.0108, seed=42)¶
Simulate the number of variants per-gene.
Parameter values are derived from GnomAD Exonic variants on Chromosome 4 from ~15730 AFR ancestry subjects.
For a Non-Finnish European ancestry setting with larger sample size (~112350), use a=1.44306, b=0.00372.
- Parameters:
a (float) – shape parameter for a gamma distribution
b (float) – scale parameter for a gamma distribution
seed (int) – random seed.
- Returns:
number of variants per-gene.
- Return type:
nvar (int)
- class qtl_power.rare_variants.RareVariantVCPower¶
Bases:
RareVariantPowerApproximation of power for rare-variant variance component tests based on results from Derkach et al (2018).
- match_cumulants_ncp(c1, c2, c3, c4)¶
Obtain the degrees of freedom and non-centrality parameter from cumulants.
- Parameters:
c1 (float) – first cumulant of non-central chi-squared dist.
c2 (float) – second cumulant of non-central chi-squared dist.
c3 (float) – third cumulant of non-central chi-squared dist.
c4 (float) – fourth cumulant of non-central chi-squared dist.
- Returns:
degrees of freedom for test. ncp (float): non-centrality parameter.
- Return type:
df (int)
- ncp_vc_first_order_model1(ws, ps, n=100, tev=0.1)¶
Approximation of the non-centrality parameter under model S1 from Derkach et al.
The key assumption is independence between an alleles effect-size and its MAF, from Table S1 in Derkach et al.
- Parameters:
ws (np.array) – numpy array of weights per-variant
ps (np.array) – numpy array of allele frequencies
n (int) – sample size
tev (float) – total explained variance by a locus
- Returns:
degrees of freedom for variance component test ncp (float): non-centrality parameter
- Return type:
df (float)
- power_vc_first_order_model1(ws, ps, n=100, tev=0.1, alpha=1e-06, df=1)¶
Compute the power for detection under model 1 for a variance component test.
- Parameters:
ws (np.array) – numpy array of weights per-variant
ps (np.array) – numpy array of allele frequencies
n (int) – sample size
tev (float) – total explained variance by a locus
alpha (float) – total significance level for estimation of power
df (float) – degree of freedom for test
- Returns:
estimated power under this variance component model.
- Return type:
power (float)