Miscellaneous utilities¶
Miscellaneous utilities.
- anhima.util.block_apply(f, dataset, block_size=None, out=None)[source]¶
Apply function f to dataset split along the first axis into contiguous slices of block_size. The result should be equivalent to calling f(dataset) directly, however may require less total memory, especially if dataset is an HDF5 dataset.
Parameters: f : function
The function to apply.
dataset : array_like or HDF5 dataset
The input dataset.
block_size : int, optional
The size (in number of items along axis) of the blocks passed to f.
out : array_like or HDF5 dataset, optional
If given, used to store the output.
Returns: out : ndarray
The result of applying f to dataset blockwise.
- anhima.util.block_take2d(dataset, row_indices, col_indices=None, block_size=None)[source]¶
Select rows and optionally columns from a Numpy array or HDF5 dataset with 2 or more dimensions.
Parameters: dataset : array_like or HDF5 dataset
The input dataset.
row_indices : sequence of ints
The indices of the selected rows. N.B., will be sorted in ascending order.
col_indices : sequence of ints, optional
The indices of the selected columns. If not provided, all columns will be returned.
block_size : int, optional
The size (in number of rows) of the block of data to process at a time.
Returns: out : ndarray
An array containing the selected rows and columns.
Notes
This function is mainly a work-around for the fact that fancy indexing via h5py is currently slow, and fancy indexing along more than one axis is not supported. The function works by reading the entire dataset in blocks of block_size rows, and processing each block in memory using numpy.
- anhima.util.block_compress2d(dataset, row_condition, col_condition=None, block_size=None)[source]¶
Select rows and optionally columns from a Numpy array or HDF5 dataset with 2 or more dimensions.
Parameters: dataset : array_like or HDF5 dataset
The input dataset.
row_condition : array_like, bool
A boolean array indicating the selected rows.
col_indices : array_like, bool, optonal
A boolean array indicated the selected columns. If not provided, all columns will be returned.
block_size : int, optional
The size (in number of rows) of the block of data to process at a time.
Returns: out : ndarray
An array containing the selected rows and columns.
Notes
This function is mainly a work-around for the fact that fancy indexing via h5py is currently slow, and fancy indexing along more than one axis is not supported. The function works by reading the entire dataset in blocks of block_size rows, and processing each block in memory using numpy.