Locating samples and variants¶
Utilities for locating samples and variants.
See also the examples at:
- anhima.loc.view_sample(a, selection, all_samples=None)[source]¶
View a single column from the array a corresponding to a selected sample.
Parameters: a : array_like
An array with 2 or more dimensions, where the second dimension corresponds to samples.
selection : int or object
A sample identifier or column index.
all_samples : sequence, optional
A sequence (e.g., list) of sample identifiers corresponding to the second dimension of a, used to map selection to a column index. If not given, assume selection is a column index.
Returns: b : ndarray
An array obtained from a by taking the column corresponding to the selected sample.
- anhima.loc.take_samples(a, selection, all_samples=None)[source]¶
Extract columns from the array a corresponding to selected samples.
Parameters: a : array_like
An array with 2 or more dimensions, where the second dimension corresponds to samples.
selection : sequence of ints or objects
A sequence of sample identifiers or column indices.
all_samples : sequence, optional
A sequence (e.g., list) of sample identifiers corresponding to the second dimension of a, used to map selection to column indices. If not given, assume selection is a sequence of column indices.
Returns: b : ndarray
An array obtained from a by taking columns corresponding to the selected samples.
- anhima.loc.query_variants(expression, variants)[source]¶
Evaluate expression with respect to the given variants.
Parameters: expression : string
The query expression to apply. The expression will be evaluated by numexpr against the provided variants.
variants : dict-like
The variables to include in scope for the expression evaluation.
Returns: result : ndarray
The result of evaluating expression against variants.
- anhima.loc.compress_variants(a, condition)[source]¶
Extract rows from the array a corresponding to a boolean condition.
Parameters: a : array_like
An array to extract rows from (e.g., genotypes).
condition : array_like, bool
A 1-D boolean array of the same length as the first dimension of a.
Returns: b : ndarray
An array obtained from a by taking rows corresponding to the selected variants.
See also
- anhima.loc.take_variants(a, indices, mode=u'raise')[source]¶
Extract rows from the array a corresponding to indices.
Parameters: a : array_like
An array to extract rows from (e.g., genotypes).
indices : sequence of integers
The variant indices to extract.
mode : {‘raise’, ‘wrap’, ‘clip’}, optional
Specifies how out-of-bounds indices will behave.
Returns: b : ndarray
An array obtained from a by taking rows corresponding to the selected variants.
See also
- anhima.loc.locate_position(pos, p)[source]¶
Locate the index of coordinate p within sorted array of genomic positions pos.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig, with no duplicates.
p : int
The position to locate.
Returns: index : int or None
The index of p in pos if present, else None.
See also
- anhima.loc.view_position(a, pos, p)[source]¶
View a slice along the first dimension of a corresponding to a genome position.
Parameters: a : array_like
The array to extract from.
pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig, with no duplicates.
p : int
The position to locate.
Returns: b : ndarray
A view of a obtained by slicing along the first dimension.
See also
- anhima.loc.locate_interval(pos, start_position=0, stop_position=None)[source]¶
Locate the start and stop indices within the pos array that include all positions within the start_position and stop_position range.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
start_position : int
Start position of interval.
stop_position : int
Stop position of interval
Returns: loc : slice
A slice object with the start and stop indices that include all positions within the interval.
See also
- anhima.loc.view_interval(a, pos, start_position, stop_position)[source]¶
View a contiguous slice along the first dimension of a corresponding to a genome interval defined by start_position and stop_position.
Parameters: a : array_like
The array to extract from.
pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
start_position : int
Start position of interval.
stop_position : int
Stop position of interval
Returns: b : ndarray
A view of a obtained by slicing along the first dimension.
See also
- anhima.loc.locate_positions(pos1, pos2)[source]¶
Find the intersection of two sets of positions.
Parameters: pos1, pos2 : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig, with no duplicates.
Returns: cond1 : ndarray, bool
An array of the same length as pos1 where an element is True if the corresponding item in pos1 is also found in pos2.
cond2 : ndarray, bool
An array of the same length as pos2 where an element is True if the corresponding item in pos2 is also found in pos1.
See also
- anhima.loc.locate_intervals(pos, start_positions, stop_positions)[source]¶
Locate items within the pos array that fall within any of the intervals given by start_positions and stop_positions.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
start_positions : array_like, int
Start positions of intervals.
stop_positions : array_like, int
Stop positions of intervals
Returns: cond1 : ndarray, bool
An array of the same length as pos where an element is True if the corresponding item in pos is also found in any of the intervals.
cond2 : ndarray, bool
An array of the same length as the number of intervals, where an element is True if the corresponding interval contains one or more positions in pos.
See also
- anhima.loc.plot_variant_locator(pos, step=1, ax=None, start_position=None, stop_position=None, flip=False, line_args=None)[source]¶
Plot lines indicating the physical genome location of variants. By default the top x axis is in variant index space, and the bottom x axis is in genome position space.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
step : int, optional
Plot a line for every step variants.
ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
start_position : int, optional
The start position for the region over which to work.
stop_position : int, optional
The stop position for the region over which to work.
flip : bool, optional
Flip the plot upside down.
line_args : dict-like
Additional keyword arguments passed through to plt.Line2D.
Returns: ax : axes
The axes on which the plot was drawn
- anhima.loc.windowed_variant_counts(pos, window_size, start_position=None, stop_position=None)[source]¶
Count variants in non-overlapping windows over the genome.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
window_size : int
The size in base-pairs of the windows.
start_position : int, optional
The start position for the region over which to work.
stop_position : int, optional
The stop position for the region over which to work.
Returns: counts : ndarray, int
The number of variants in each window.
bin_edges : ndarray, int
The edge positions of each window. Note that this has length len(counts)+1. To determine bin centers use (bin_edges[:-1] + bin_edges[1:]) / 2. To determine bin widths use np.diff(bin_edges).
See also
windowed_variant_counts_plot, windowed_variant_density
- anhima.loc.plot_windowed_variant_counts(pos, window_size, start_position=None, stop_position=None, ax=None, plot_kwargs=None)[source]¶
Plot windowed variant counts.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
window_size : int
The size in base-pairs of the windows.
start_position : int, optional
The start position for the region over which to work.
stop_position : int, optional
The stop position for the region over which to work.
ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
plot_kwargs : dict-like
Additional keyword arguments passed through to plt.plot.
Returns: ax : axes
The axes on which the plot was drawn.
See also
windowed_variant_counts, windowed_variant_density_plot
- anhima.loc.windowed_variant_density(pos, window_size, start_position=None, stop_position=None)[source]¶
Calculate per-base-pair density of variants in non-overlapping windows over the genome.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
window_size : int
The size in base-pairs of the windows.
start_position : int, optional
The start position for the region over which to work.
stop_position : int, optional
The stop position for the region over which to work.
Returns: density : ndarray, int
The density of variants in each window.
bin_edges : ndarray, int
The edge positions of each window. Note that this has length len(density)+1. To determine bin centers use (bin_edges[:-1] + bin_edges[1:]) / 2. To determine bin widths use np.diff(bin_edges).
See also
windowed_variant_density_plot, windowed_variant_counts
- anhima.loc.plot_windowed_variant_density(pos, window_size, start_position=None, stop_position=None, ax=None, plot_kwargs=None)[source]¶
Plot windowed variant density.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
window_size : int
The size in base-pairs of the windows.
start_position : int, optional
The start position for the region over which to work.
stop_position : int, optional
The stop position for the region over which to work.
ax : axes, optional
The axes on which to draw. If not provided, a new figure will be created.
plot_kwargs : dict-like
Additional keyword arguments passed through to plt.plot.
Returns: ax : axes
The axes on which the plot was drawn.
See also
windowed_variant_density, windowed_variant_counts_plot
- anhima.loc.windowed_statistic(pos, values, window_size, start_position=None, stop_position=None, statistic=u'mean')[source]¶
Calculate a statistic for values binned in non-overlapping windows over the genome.
Parameters: pos : array_like
A sorted 1-dimensional array of genomic positions from a single chromosome/contig.
values : array_like
A 1-D array of the same length as pos.
window_size : int
The size in base-pairs of the windows.
start_position : int, optional
The start position for the region over which to work.
stop_position : int, optional
The stop position for the region over which to work.
statistic : string or function
The function to apply to values in each bin.
Returns: stats : ndarray
The values of the statistic within each bin.
bin_edges : ndarray
The edge positions of each window. Note that this has length len(stats)+1. To determine bin centers use (bin_edges[:-1] + bin_edges[1:]) / 2. To determine bin widths use np.diff(bin_edges).
- anhima.loc.evenly_downsample_variants(a, k)[source]¶
Evenly downsample an array along the first dimension to length k (or as near as possible), assuming the first dimension corresponds to variants.
Parameters: a : array_like
The array to downsample.
k : int
The target number of variants.
Returns: b : array_like
A downsampled view of a.
- anhima.loc.randomly_downsample_variants(a, k)[source]¶
Evenly downsample an array along the first dimension to length k, assuming the first dimension corresponds to variants.
Parameters: a : array_like
The array to downsample.
k : int
The k number of variants.
Returns: b : array_like
A downsampled copy of a.