Site frequencies¶
Site frequency spectra.
See also the examples at:
- anhima.sf.site_frequency_spectrum(derived_ac)[source]¶
Calculate the site frequency spectrum, given derived allele counts for a set of biallelic variant sites.
Parameters: derived_ac : array_like, int
A 1-dimensional array of shape (n_variants,) where each array element holds the count of derived alleles found for a single variant across some set of samples.
Returns: sfs : ndarray, int
An array of integers where the value of the kth element is the number of variant sites with k derived alleles.
- anhima.sf.site_frequency_spectrum_scaled(derived_ac)[source]¶
Calculate the site frequency spectrum, scaled such that a constant value is expected across the spectrum for neutral variation and a population at constant size.
Parameters: derived_ac : array_like, int
A 1-dimensional array of shape (n_variants,) where each array element holds the count of derived alleles found for a single variant across some set of samples.
Returns: sfs_scaled : ndarray, int
An array of integers where the value of the kth element is the number of variant sites with k derived alleles, multiplied by k.
See also
site_frequency_spectrum, site_frequency_spectrum_folded, site_frequency_spectrum_folded_scaled, plot_site_frequency_spectrum
Notes
Under neutrality and constant population size, site frequency is expected to be constant across the spectrum, and to approximate the value of the population-scaled mutation rate theta.
- anhima.sf.site_frequency_spectrum_folded(biallelic_ac)[source]¶
Calculate the folded site frequency spectrum, given reference and alternate allele counts for a set of biallelic variants.
Parameters: biallelic_ac : array_like int
A 2-dimensional array of shape (n_variants, 2), where each row holds the reference and alternate allele counts for a single biallelic variant across some set of samples.
Returns: sfs_folded : ndarray, int
An array of integers where the value of the kth element is the number of variant sites with k observations of the minor allele.
- anhima.sf.site_frequency_spectrum_folded_scaled(biallelic_ac, m=None)[source]¶
Calculate the folded site frequency spectrum, scaled such that a constant value is expected across the spectrum for neutral variation and a population at constant size.
Parameters: biallelic_ac : array_like int
A 2-dimensional array of shape (n_variants, 2), where each row holds the reference and alternate allele counts for a single biallelic variant across some set of samples.
m : int, optional
The total number of alleles observed at each variant site. Equal to the number of samples multiplied by the ploidy. If not provided, will be inferred to be the maximum value of the sum of reference and alternate allele counts present in biallelic_ac.
Returns: sfs_folded_scaled : ndarray, int
An array of integers where the value of the kth element is the number of variant sites with k observations of the minor allele, multiplied by the scaling factor (k * (m - k) / m).
See also
site_frequency_spectrum, site_frequency_spectrum_scaled, site_frequency_spectrum_folded, plot_site_frequency_spectrum
Notes
Under neutrality and constant population size, site frequency is expected to be constant across the spectrum, and to approximate the value of the population-scaled mutation rate theta.
This function is useful where the ancestral and derived status of alleles is unknown.
- anhima.sf.plot_site_frequency_spectrum(sfs, bins=None, m=None, clip_endpoints=True, ax=None, label=None, plot_kwargs=None)[source]¶
Plot a site frequency spectrum.
Parameters: sfs : array_like, int
Site frequency spectrum. Can be folded or unfolded, scaled or unscaled.
bins : int or sequence of ints, optional
Number of bins or bin edges to aggregate frequencies. If not given, no binning will be applied.
m : int, optional
The total number of alleles observed at each variant site. Equal to the number of samples multiplied by the ploidy. If given, will be used to scale the X axis as allele frequency instead of allele count. used to scale the X axis as allele frequency instead of allele count.
clip_endpoints : bool, optional
If True, remove the first and last values from the site frequency spectrum.
ax : axes, optional
The axes on which to plot. If not given, a new figure will be created.
label : string, optional
Label for this data series.
plot_kwargs : dict, optional
Passed through to ax.plot().
Returns: ax : axes
The axes on which the plot was drawn.