Doubleton sharing

Doubleton sharing, a.k.a., analysis of f2 variants.

See also the examples at:

anhima.f2.count_shared_doubletons(subpops_ac)[source]

Count subpopulation pairs sharing doubletons (where one allele is observed in each subpopulation).

Parameters:

subpops_ac : array_like, int

An array of shape (n_variants, n_subpops) holding alternate allele counts for each subpopulation.

Returns:

counts : ndarray, int or float

A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations.

anhima.f2.normalise_doubleton_counts(counts, n_samples, ploidy=2)[source]

Normalise doubleton counts by dividing by the number of distinct pairs of haplotypes in each population comparison.

Parameters:

counts : array_like, ints

A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations.

n_samples : int or sequence of ints

The number of samples in each sub-population.

ploidy : int, optional

The sample ploidy.

Returns:

normed_counts : ndarray, float

Normalised counts of shared doubletons.

Notes

This function corrects for the fact that there are fewer pairs of haplotypes when looking for doubletons within a single subpopulation of size n than there are when comparing two different subpopulations of size n.

This function may also help to correct for the case where the number of samples from each subpopulation is not equal. However, note that if this is the case then there may still also be some bias in how doubletons have been ascertained.

anhima.f2.plot_shared_doubletons(counts, subpop_labels=None, subpop_colors=u'bgrcmyk', axs=None, figsize_factor=1, ylim=None, relative=False, flip=False)[source]

Plot counts of doubleton sharing between subpopulations as a bar chart.

Parameters:

counts : array_like, ints

A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations.

subpop_labels : sequence of strings, optional

Labels for the subpopulations.

subpop_colors : sequence of colors, optional

Colors for the subpopulations.

axs : sequence of axes, optional

The axes to use. If not provided, a new figure will be created.

figsize_factor : float, optional

Figure size in inches per subpopulation. Only used if axs is None.

ylim : pair of ints or floats, optional

Limits for the Y axes of all subplots.

relative : bool, optional

If True, normalise counts by dividing by the sum along each row.

flip : bool, optional

If True, invert the Y axis.

Returns:

axs : sequence of axes

The axes on which the plot was drawn.

anhima.f2.plot_total_doubletons(counts, subpop_labels=None, width=0.8, orientation=u'vertical', n_samples=None, ax=None, bar_kwargs=None)[source]

Plot total counts of doubletons per subpopulations as a bar chart.

Parameters:

counts : array_like, ints

A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations.

subpop_labels : sequence of strings, optional

Labels for the subpopulations.

width : float, optional

The relative width of each bar.

orientation : {‘vertical’, ‘horizontal’}

The bar orientation.

n_samples : int or sequence of ints

The number of samples in each sub-population.

ax : axes, optional

The axes on which to plot. If not provided, a new figure will be created.

bar_kwargs : dict, optional

Keyword arguments passed through to ax.bar().

Returns:

ax : axes

The axes on which the plot was drawn.

anhima.f2.plot_f2_fig(counts, subpop_labels=None, subpop_colors=u'bgrcmyk', fig=None, figsize_factor=1, relative=False, normed=False, n_samples=None, ploidy=2)[source]

Plot a combined figure of shared doubleton counts and total counts per subpopulation.

Parameters:

counts : array_like, ints

A square matrix of shape (n_subpops, n_subpops) where the array element at index (i, j) holds the count of shared doubletons between the ith and jth subpopulations.

subpop_labels : sequence of strings, optional

Labels for the subpopulations.

subpop_colors : sequence of colors, optional

Colors for the subpopulations.

fig : figure, optional

The figure to use. If not provided, a new figure will be created.

figsize_factor : float, optional

Figure size in inches per subpopulation. Only used if fig is None.

relative : bool, optional

If True, plot counts relative to the sum along each row.

normed : bool, optional

If True, normalise counts by dividing by the number of possible pairs of haplotypes.

n_samples : int or sequence of ints

The number of samples in each sub-population.

ploidy : int, optional

The sample ploidy. (Only relevant if normed is True.)

Returns:

fig : figure

The figure on which the plot was drawn.