Title: | Assortative Mating Simulation and Multivariate Bernoulli Variates |
---|---|
Description: | Simulation of phenotype / genotype data under assortative mating. Includes functions for generating Bahadur order-2 multivariate Bernoulli variables with general and diagonal-plus-low-rank correlation structures. Further details are provided in: Border and Malik (2022) <doi:10.1101/2022.10.13.512132>. |
Authors: | Richard Border [aut, cre] |
Maintainer: | Richard Border <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.0 |
Built: | 2025-02-15 03:08:29 UTC |
Source: | https://github.com/rborder/rbahadur |
Compute Diagonal plus Low Rank equilibrium covariance structure
am_covariance_structure(beta, AF, r)
am_covariance_structure(beta, AF, r)
beta |
vector of standardized diploid allele-substitution effects |
AF |
vector of allele frequencies |
r |
cross-mate phenotypic correlation |
Vector 'U' such that $D + U U^T$ corresponds to the expected haploid LD-matrix given the specified genetic architecture (encoded by 'beta' and 'AF') and cross-mate phenotypic correlation 'r'. It is assumed that the total phenotypic variance at generation zero is one.
set.seed(1) h2_0 = .5; m = 200; n = 1000; r =.5; min_MAF=.1 betas <- rnorm(m,0,sqrt(h2_0/m)) afs <- runif(m, min_MAF, 1-min_MAF) output <- am_covariance_structure(betas, afs, r)
set.seed(1) h2_0 = .5; m = 200; n = 1000; r =.5; min_MAF=.1 betas <- rnorm(m,0,sqrt(h2_0/m)) afs <- runif(m, min_MAF, 1-min_MAF) output <- am_covariance_structure(betas, afs, r)
Compute heritability ('h2_eq'), genetic variance ('vg_0'), and cross-mate genetic correlation ('rg_eq') at equilibrium under univariate primary-phenotypic assortative mating. These equations can be derived from Nagylaki's results (see below) under the assumption that number of causal variants is large (i.e., taking the limit as the number of causal variants approaches infinity).
h2_eq(r, h2_0) rg_eq(r, h2_0) vg_eq(r, vg_0, h2_0)
h2_eq(r, h2_0) rg_eq(r, h2_0) vg_eq(r, vg_0, h2_0)
r |
cross-mate phenotypic correlation |
h2_0 |
generation zero (panmictic) heritability |
vg_0 |
generation zero (panmictic) additive genetic variance component |
A single numerical quantity representing the equilibrium heritability (h2_eq
),
the equilibrium cross-mate genetic correlation (rg_eq
), or the equilibrium genetic
variance (vg_eq
).
Nagylaki, T. Assortative mating for a quantitative character. J. Math. Biology 16, 57–74 (1982). https://doi.org/10.1007/BF00275161
set.seed(1) vg_0= .6; h2_0 = .5; r =.5 h2_eq(r, h2_0) rg_eq(r, h2_0) vg_eq(r, vg_0, h2_0)
set.seed(1) vg_0= .6; h2_0 = .5; r =.5 h2_eq(r, h2_0) rg_eq(r, h2_0) vg_eq(r, vg_0, h2_0)
Simulate genotype/phenotype data under equilibrium univariate AM.
am_simulate(h2_0, r, m, n, afs = NULL, min_MAF = 0.1, haplotypes = FALSE)
am_simulate(h2_0, r, m, n, afs = NULL, min_MAF = 0.1, haplotypes = FALSE)
h2_0 |
generation zero (panmictic) heritability |
r |
cross-mate phenotypic correlation |
m |
number of biallelic causal variants |
n |
sample size |
afs |
(optional). Allele frequencies to use. If not provided, |
min_MAF |
(optional) minimum minor allele frequency for causal variants.
Ignored if if |
haplotypes |
logical. If TRUE, includes (phased) haploid genotypes in output. Defaults to FALSE |
A list including the following objects:
y
: phenotype vector
g
: heritable component of the phenotype vector
X
: matrix of diploid genotypes
AF
: vector of allele frequencies
beta_std
: standardized genetic effects
beta_raw
: unstandardized genetic effects
H
: matrix of haploid genotypes (returned only if haplotypes
=TRUE)
set.seed(1) h2_0 = .5; m = 200; n = 1000; r =.5 ## simulate genotype/phenotype data sim_dat <- am_simulate(h2_0, r, m, n) str(sim_dat) ## empirical h2 vs expected equilibrium h2 (emp_h2 <- var(sim_dat$g)/var(sim_dat$y)) h2_eq(r, h2_0)
set.seed(1) h2_0 = .5; m = 200; n = 1000; r =.5 ## simulate genotype/phenotype data sim_dat <- am_simulate(h2_0, r, m, n) str(sim_dat) ## empirical h2 vs expected equilibrium h2 (emp_h2 <- var(sim_dat$g)/var(sim_dat$y)) h2_eq(r, h2_0)
Generate second Bahadur order multivariate Bernoulli random variates with Diagonal Plus Low Rank (dplr) correlation structures.
rb_dplr(n, mu, U)
rb_dplr(n, mu, U)
n |
number of observations |
mu |
vector of means |
U |
outer product component matrix |
This generates multivariate Bernoulli (MVB) random vectors with mean
vector 'mu' and correlation matrix where
is a diagonal
matrix with values dictated by 'U'. 'mu' must take values in the open unit interval
and 'U' must induce a valid second Bahadur order probability distribution. That is,
there must exist an MVB probability distribution with first moments 'mu' and
standardized central second moments
such that all higher order central
moments are zero.
An -by-
matrix of binary random variates, where
is
the length of 'mu'.
set.seed(1) h2_0 = .5; m = 200; n = 1000; r =.5; min_MAF=.1 ## draw standardized diploid allele substitution effects beta <- scale(rnorm(m))*sqrt(h2_0 / m) ## draw allele frequencies AF <- runif(m, min_MAF, 1 - min_MAF) ## compute unstandardized effects beta_unscaled <- beta/sqrt(2*AF*(1-AF)) ## generate corresponding haploid quantities beta_hap <- rep(beta, each=2) AF_hap <- rep(AF, each=2) ## compute equilibrium outer product covariance component U <- am_covariance_structure(beta, AF, r) ## draw multivariate Bernoulli haplotypes H <- rb_dplr(n, AF_hap, U) ## convert to diploid genotypes G <- H[,seq(1,ncol(H),2)] + H[,seq(2,ncol(H),2)] ## empirical allele frequencies vs target frequencies emp_afs <- colMeans(G)/2 plot(AF, emp_afs) ## construct phenotype heritable_y <- G%*%beta_unscaled nonheritable_y <- rnorm(n, 0, sqrt(1-h2_0)) y <- heritable_y + nonheritable_y ## empirical h2 vs expected equilibrium h2 (emp_h2 <- var(heritable_y)/var(y)) h2_eq(r, h2_0)
set.seed(1) h2_0 = .5; m = 200; n = 1000; r =.5; min_MAF=.1 ## draw standardized diploid allele substitution effects beta <- scale(rnorm(m))*sqrt(h2_0 / m) ## draw allele frequencies AF <- runif(m, min_MAF, 1 - min_MAF) ## compute unstandardized effects beta_unscaled <- beta/sqrt(2*AF*(1-AF)) ## generate corresponding haploid quantities beta_hap <- rep(beta, each=2) AF_hap <- rep(AF, each=2) ## compute equilibrium outer product covariance component U <- am_covariance_structure(beta, AF, r) ## draw multivariate Bernoulli haplotypes H <- rb_dplr(n, AF_hap, U) ## convert to diploid genotypes G <- H[,seq(1,ncol(H),2)] + H[,seq(2,ncol(H),2)] ## empirical allele frequencies vs target frequencies emp_afs <- colMeans(G)/2 plot(AF, emp_afs) ## construct phenotype heritable_y <- G%*%beta_unscaled nonheritable_y <- rnorm(n, 0, sqrt(1-h2_0)) y <- heritable_y + nonheritable_y ## empirical h2 vs expected equilibrium h2 (emp_h2 <- var(heritable_y)/var(y)) h2_eq(r, h2_0)
Generate Bahadur order-2 multivariate Bernoulli random variates with unstructured correlations.
rb_unstr(n, mu, C)
rb_unstr(n, mu, C)
n |
number of observations |
mu |
vector of means |
C |
correlation matrix |
This generates multivariate Bernoulli (MVB) random vectors with mean vector 'mu' and correlation matrix 'C'. 'mu' must take values in the open unit interval and 'C' must induce a valid second Bahadur order probability distribution. That is, there must exist an MVB probability distribution with first moments 'mu' and standardized central second moments 'C' such that all higher order central moments are zero.
An -by-
matrix of binary random variates, where
is the
length of 'mu'.
set.seed(1) h2_0 = .5; m = 200; n = 500; r =.5; min_MAF=.1 ## draw standardized diploid allele substitution effects beta <- scale(rnorm(m))*sqrt(h2_0 / m) ## draw allele frequencies AF <- runif(m, min_MAF, 1 - min_MAF) ## compute unstandardized effects beta_unscaled <- beta/sqrt(2*AF*(1-AF)) ## generate corresponding haploid quantities beta_hap <- rep(beta, each=2) AF_hap <- rep(AF, each=2) ## compute equilibrium outer product covariance component U <- am_covariance_structure(beta, AF, r) ## construct Correlation matrix S <- diag(1/sqrt(AF_hap*(1-AF_hap))) DPLR <- U%o%U diag(DPLR) <- 1 C <- cov2cor(S%*%DPLR%*%S) ## draw multivariate Bernoulli haplotypes H <- rb_unstr(n, AF_hap, C) ## convert to diploid genotypes G <- H[,seq(1,ncol(H),2)] + H[,seq(2,ncol(H),2)] ## empirical allele frequencies vs target frequencies emp_afs <- colMeans(G)/2 plot(AF, emp_afs) ## construct phenotype heritable_y <- G%*%beta_unscaled nonheritable_y <- rnorm(n, 0, sqrt(1-h2_0)) y <- heritable_y + nonheritable_y ## empirical h2 vs expected equilibrium h2 (emp_h2 <- var(heritable_y)/var(y)) h2_eq(r, h2_0)
set.seed(1) h2_0 = .5; m = 200; n = 500; r =.5; min_MAF=.1 ## draw standardized diploid allele substitution effects beta <- scale(rnorm(m))*sqrt(h2_0 / m) ## draw allele frequencies AF <- runif(m, min_MAF, 1 - min_MAF) ## compute unstandardized effects beta_unscaled <- beta/sqrt(2*AF*(1-AF)) ## generate corresponding haploid quantities beta_hap <- rep(beta, each=2) AF_hap <- rep(AF, each=2) ## compute equilibrium outer product covariance component U <- am_covariance_structure(beta, AF, r) ## construct Correlation matrix S <- diag(1/sqrt(AF_hap*(1-AF_hap))) DPLR <- U%o%U diag(DPLR) <- 1 C <- cov2cor(S%*%DPLR%*%S) ## draw multivariate Bernoulli haplotypes H <- rb_unstr(n, AF_hap, C) ## convert to diploid genotypes G <- H[,seq(1,ncol(H),2)] + H[,seq(2,ncol(H),2)] ## empirical allele frequencies vs target frequencies emp_afs <- colMeans(G)/2 plot(AF, emp_afs) ## construct phenotype heritable_y <- G%*%beta_unscaled nonheritable_y <- rnorm(n, 0, sqrt(1-h2_0)) y <- heritable_y + nonheritable_y ## empirical h2 vs expected equilibrium h2 (emp_h2 <- var(heritable_y)/var(y)) h2_eq(r, h2_0)