Linkage disequilibrium¶

Utilities for calculating and plotting linkage disequilbrium.

See also

pairwise_ld_decay

Notes

Similar to pairwise_ld_decay() except that not all pairs of variants are sampled to speed up computation and use less memory. Variants are divided into non-overlapping windows of size window_size. Genotype LD is calculated for all pairs within each window.

anhima.ld.plot_ld_decay_by_separation(cor, sep, max_separation=100, percentiles=(5, 95), ax=None, median_plot_kwargs=None, percentiles_plot_kwargs=None)[source]¶

Plot the decay of linkage disequilibrium with separation between variants.

Parameters:

cor : array_like

A 1-dimensional array of squared correlation coefficients between pairs of variants.

sep : array_like

A 1-dimensional array of separations (in number of variants) between pairs of variants.

max_separation : int, optional

Maximum separation to consider.

percentiles : sequence of integers, optional

Percentiles to plot in addition to the median.

ax : axes, optional

The axes on which to draw. If not provided, a new figure will be created.

median_plot_kwargs : dict, optional

Keyword arguments to pass through when plotting the median line.

percentiles_plot_kwargs : dict, optional

Keyword arguments to pass through when plotting the percentiles.

Returns:

ax : axes

The axes on which the plot was drawn.

anhima.ld.plot_ld_decay_by_distance(cor, dist, bins, percentiles=(5, 95), ax=None, median_plot_kwargs=None, percentiles_plot_kwargs=None)[source]¶

Plot the decay of linkage disequilibrium with physical distance between variants.

Parameters:

cor : array_like

A 1-dimensional array of squared correlation coefficients between pairs of variants.

dist : array_like

A 1-dimensional array of physical distances between pairs of variants.

bins : int or sequence of ints

Number of bins or bin edges. Bins of distance to calculate LD within.

percentiles : sequence of integers, optional

Percentiles to plot in addition to the median.

ax : axes, optional

The axes on which to draw. If not provided, a new figure will be created.

median_plot_kwargs : dict, optional

Keyword arguments to pass through when plotting the median line.

percentiles_plot_kwargs : dict, optional

Keyword arguments to pass through when plotting the percentiles.

Returns:

ax : axes

The axes on which the plot was drawn.

anhima.ld.ld_prune_pairwise(gn, window_size=100, window_step=10, max_r_squared=0.2)[source]¶

Given a set of genotypes at biallelic variants, find a subset of the variants which are in approximate linkage equilibrium with each other.

Parameters:

gn : array_like

A 2-dimensional array of shape (n_variants, n_samples) where each element is a genotype call coded as a single integer counting the number of non-reference alleles.

window_size : int, optional

The number of variants to work with at a time.

window_step : int, optional

The number of variants to shift the window by.

max_r_squared : float, optional

The maximum value of the genotype correlation coefficient, above which variants will be excluded.

Returns:

included : ndarray, bool

A boolean array of the same length as the number of variants, where a True value indicates the variant at the corresponding index is included, and a False value indicates the corresponding variant is excluded.

Notes

The algorithm is as follows. A window of window_size variants is taken from the beginning of the genotypes array. The genotype correlation coefficient is calculated between each pair of variants in the window. The first variant in the window is considered, and any other variants in the window with linkage above max_r_squared with respect to the first variant is excluded. The next non-excluded variant in the window is then considered, and so on. The window then shifts along by window_step variants, and the process is repeated.