Simulations

Extremely naive simulation functions to generate genotype data for illustration of other features in the anhima package.

anhima.sim.simulate_biallelic_genotypes(n_variants, n_samples, af_dist, p_missing=0.1, ploidy=2)[source]

Simulate genotypes at biallelic variants for a population in Hardy-Weinberg equilibrium

Parameters:

n_variants : int

The number of variants.

n_samples : int

The number of samples.

af_dist : frozen continuous random variable

The distribution of allele frequencies.

p_missing : float, optional

The fraction of missing genotype calls.

ploidy : int, optional

The sample ploidy.

Returns:

genotypes : ndarray, int8

An array of shape (n_variants, n_samples, ploidy) where each element of the array is an integer corresponding to an allele index (-1 = missing, 0 = reference allele, 1 = alternate allele).

anhima.sim.simulate_genotypes_with_ld(n_variants, n_samples, correlation=0.2)[source]

A very simple function to simulate a set of genotypes, where variants are in some degree of linkage disequilibrium with their neighbours.

Parameters:

n_variants : int

The number of variants to simulate data for.

n_samples : int

The number of individuals to simulate data for.

correlation : float, optional

The fraction of samples to copy genotypes between neighbouring variants.

Returns:

gn : ndarray, int8

A 2-dimensional array of shape (n_variants, n_samples) where each element is a genotype call coded as a single integer counting the number of non-reference alleles.

anhima.sim.simulate_relatedness(genotypes, relatedness=0.5, n_iter=1000, copy=True)[source]

Simulate relatedness by randomly copying genotypes between individuals.

Parameters:

genotypes : array_like

An array of shape (n_variants, n_samples, ploidy) where each element of the array is an integer corresponding to an allele index (-1 = missing, 0 = reference allele, 1 = first alternate allele, 2 = second alternate allele, etc.).

relatedness : float, optional

Fraction of variants to copy genotypes for.

n_iter : int, optional

Number of times to randomly copy genotypes between individuals.

copy : bool, optional

If False, modify genotypes in place.

Returns:

genotypes : ndarray, shape (n_variants, n_samples, ploidy)

The input genotype array but with relatedness simulated.