Site frequencies

Site frequency spectra.

See also the examples at:

anhima.sf.site_frequency_spectrum(derived_ac)[source]

Calculate the site frequency spectrum, given derived allele counts for a set of biallelic variant sites.

Parameters:

derived_ac : array_like, int

A 1-dimensional array of shape (n_variants,) where each array element holds the count of derived alleles found for a single variant across some set of samples.

Returns:

sfs : ndarray, int

An array of integers where the value of the kth element is the number of variant sites with k derived alleles.

anhima.sf.site_frequency_spectrum_scaled(derived_ac)[source]

Calculate the site frequency spectrum, scaled such that a constant value is expected across the spectrum for neutral variation and a population at constant size.

Parameters:

derived_ac : array_like, int

A 1-dimensional array of shape (n_variants,) where each array element holds the count of derived alleles found for a single variant across some set of samples.

Returns:

sfs_scaled : ndarray, int

An array of integers where the value of the kth element is the number of variant sites with k derived alleles, multiplied by k.

Notes

Under neutrality and constant population size, site frequency is expected to be constant across the spectrum, and to approximate the value of the population-scaled mutation rate theta.

anhima.sf.site_frequency_spectrum_folded(biallelic_ac)[source]

Calculate the folded site frequency spectrum, given reference and alternate allele counts for a set of biallelic variants.

Parameters:

biallelic_ac : array_like int

A 2-dimensional array of shape (n_variants, 2), where each row holds the reference and alternate allele counts for a single biallelic variant across some set of samples.

Returns:

sfs_folded : ndarray, int

An array of integers where the value of the kth element is the number of variant sites with k observations of the minor allele.

anhima.sf.site_frequency_spectrum_folded_scaled(biallelic_ac, m=None)[source]

Calculate the folded site frequency spectrum, scaled such that a constant value is expected across the spectrum for neutral variation and a population at constant size.

Parameters:

biallelic_ac : array_like int

A 2-dimensional array of shape (n_variants, 2), where each row holds the reference and alternate allele counts for a single biallelic variant across some set of samples.

m : int, optional

The total number of alleles observed at each variant site. Equal to the number of samples multiplied by the ploidy. If not provided, will be inferred to be the maximum value of the sum of reference and alternate allele counts present in biallelic_ac.

Returns:

sfs_folded_scaled : ndarray, int

An array of integers where the value of the kth element is the number of variant sites with k observations of the minor allele, multiplied by the scaling factor (k * (m - k) / m).

Notes

Under neutrality and constant population size, site frequency is expected to be constant across the spectrum, and to approximate the value of the population-scaled mutation rate theta.

This function is useful where the ancestral and derived status of alleles is unknown.

anhima.sf.plot_site_frequency_spectrum(sfs, bins=None, m=None, clip_endpoints=True, ax=None, label=None, plot_kwargs=None)[source]

Plot a site frequency spectrum.

Parameters:

sfs : array_like, int

Site frequency spectrum. Can be folded or unfolded, scaled or unscaled.

bins : int or sequence of ints, optional

Number of bins or bin edges to aggregate frequencies. If not given, no binning will be applied.

m : int, optional

The total number of alleles observed at each variant site. Equal to the number of samples multiplied by the ploidy. If given, will be used to scale the X axis as allele frequency instead of allele count. used to scale the X axis as allele frequency instead of allele count.

clip_endpoints : bool, optional

If True, remove the first and last values from the site frequency spectrum.

ax : axes, optional

The axes on which to plot. If not given, a new figure will be created.

label : string, optional

Label for this data series.

plot_kwargs : dict, optional

Passed through to ax.plot().

Returns:

ax : axes

The axes on which the plot was drawn.