Generic_Util package#

Submodules#

Generic_Util.benchmarking module#

Functions covering typical code-timing scenarios, such as a “with” statement context, an n-executions timer, and a convenient function for comparing and summarising n execution times of different implementations of the same function.

Generic_Util.benchmarking.compare_implementations(fs_with_shared_args: dict[str, Callable], n=200, wait=1, verbose=True, fs_with_own_args: dict[str, tuple[Callable, list, dict]] | None = None, args: list | None = None, kwargs: dict | None = None)#: Benchmark multiple implementations of the same function called n times (each with the same args and kwargs), with a break between functions. Recommended later output view if verbose is False: print(table.to_markdown(index = False)). :param fs_with_own_args: alternative to fs_with_shared_args, args and kwargs arguments: meant for additional functions taking different *args and **kwargs.

Generic_Util.benchmarking.merge_bench_tables(*tables, verbose=True)#

Generic_Util.benchmarking.time_context(name: str = None)#: “with” statement context for timing the execution of the enclosed code block, i.e. with time_context('Name of code block'): ...

Generic_Util.benchmarking.time_n(f: Callable, n=2, *args, **kwargs)#: Run f (with given arguments) n times and return the execution intervals

Generic_Util.iter module#

Iterable-focussed functions, covering multiple varieties of flattening, iterable combining, grouping and predicate/property-based processing (including topological sorting), element-comparison-based operations, value interspersal, and finally batching.

Generic_Util.iter.all_combinations(xs: Sequence, min_size=1, max_size=None) → list#: Produce all combinations of elements of xs between min_size and max_size subset size (i.e. the powerset elements of certain sizes).

Generic_Util.iter.batch_iter(n: int, xs: Iterable[_a]) → Generator[_a, None, None]#: Batch an iterable in batches of size n (possibly except the last). If len(xs) is knowable use batch_seq instead.

Generic_Util.iter.batch_iter_by(by: Callable[[_a], float], size: int, xs: Iterable[_a], keep_totals=False) → Generator[_a, None, None]#: Batch an iterable by sum of some weight/score/cost of each element, with no batch exceeding size. :param keep_totals: if True, the function returns tuples of (batch_total_value, batch) instead of just batches

Generic_Util.iter.batch_seq(n: int, xs: Sequence[_a], n_is_number_of_batches=False) → Generator[_a, None, None]#: Batch an iterable of knowable length in batches of size n (possibly except the last). :param n_is_number_of_batches: If True, divide into n batches of equal length (possibly except the last) rather than in batches of n elements

Generic_Util.iter.batch_seq_by_into(by: Callable[[_a], float], k: int, xs: Sequence[_a], keep_totals=False, optimal_but_reordered=False) → Generator[_a, None, None]#: Batch an iterable of knowable length into k batches containing elements whose sum of some weight/score/cost (by) is roughly equal. :param keep_totals: if True, the function returns tuples of (batch_total_value, batch) instead of just batches :param optimal_but_reordered: if True, the function produces the optimal (hence not order-preserving) batching

of summable xs into a given number of batches containing roughly equal totals. If False, the process retains order but may return more batches than requested.

Generic_Util.iter.complement_intervals(intervals: ndarray[Any, dtype[int]], true_length: int, closed_interval_ends=True) → ndarray[Any, dtype[int]]#

Invert a series of index intervals (in the form of an (n,2)-array), i.e. return an (m,2)-array of intervals starting after the ends of the given ones and ending before their starts.

Parameters:

true_length – 1 more than the final index for the overall range these intervals are within (which might be intervals[-1,1]); if coming from isin_sorted_intervals, then simply len(xs)
closed_interval_ends – whether BOTH input and output intervals are (and will be) closed, i.e. their end-index is INCLUDED in the interval, rather than being the first index after it

Generic_Util.iter.deep_extract(xss_: Iterable[Iterable], *key_path) → Generator#: Given a nested combination of iterables and a path of keys in it, return a deep_flatten-ed list of entries under said path. Note: deep_extract(xss_, *key_path) == deep_flatten(Generic_Util.operator.get_nested(xss_, *key_path))

Generic_Util.iter.deep_flatten(xss_: Iterable) → Generator#: Flatten any nested combination of iterables to its leaves. The flattening ignores keys of Mappings and stops at tuples, strings, bytes or non-Iterables. Same as a recursive chain.from_iterable but handling values of Mappings instead of keys.

Generic_Util.iter.diff(xs: Iterable[_a], ys: Iterable[_a]) → list[_a]#

Difference of iterables. .. rubric:: Notes

not a set difference, so strictly removing as many xs duplicate entries as there are in ys
preserves order in xs

Generic_Util.iter.eq_elems(xs: Iterable[_a], ys: Iterable[_a]) → bool#: Equality of iterables by their elements.

Generic_Util.iter.first(c: Callable[[_a], bool], xs: Iterable[_a], default: _a | None = None) → _a#: Return the first value in xs which satisfies condition c.

Generic_Util.iter.first_i(c: Callable[[_a], bool], xs: Sequence[_a], default: _a | None = None) → _a#: Return the index of the first value in xs which satisfies condition c.

Generic_Util.iter.flatten(xss: Iterable[Iterable], as_generator=False) → list | Generator#: Standard flattening (returning a list by default).

Generic_Util.iter.foldq(f: Callable[[_b, _a], _b], g: Callable[[_b, _a, list[_a]], list[_a]], c: Callable[[_a], bool], xs: Sequence[_a], acc: _b) → tuple[_b, list[_a]]#

Fold-like higher-order function where xs is traversed by consumption conditional on c, and remaining xs are updated by g (therefore consumption order is not known a priori):

the first/next item to be ingested is the first in the remaining xs to fulfil condition c

at every x ingestion, the item is removed from (a copy of) xs, and all the remaining ones are potentially modified by function g

this function always returns a tuple of (acc, remaining_xs), unlike the stricter foldq_, which raises an exception for leftover xs

Note: fold(f, xs, acc) == foldq(f, lambda acc, x, xs: xs, lambda x: True, xs, acc).

Sequence of suitable names leading to the current one: consumption_fold, condition_update_fold, cu_fold, q_fold, qfold or foldq

Parameters:

f – ‘Traditional’ fold function :: acc -> x -> acc
g – ‘Update’ function for all remaining xs at every iteration :: acc -> x -> xs -> xs
c – ‘Condition’ function to select the next x (first which satisfies it) :: x -> Bool
xs – Structure to consume
acc – Starting value for the accumulator

Returns:

(acc, remaining_xs)

Generic_Util.iter.foldq_(f: Callable[[_b, _a], _b], g: Callable[[_b, _a, list[_a]], list[_a]], c: Callable[[_a], bool], xs: list[_a], acc: _b) → _b#: Stricter version of foldq (see its description for details); only returns the accumulator and raises an exception on leftover xs. :raises ValueError on leftover xs

Generic_Util.iter.group_by(f: Callable[[_a], _b], xs: Iterable[_a]) → dict[_b, list[_a]]#

Generalisation of partition to any-output key-function. .. rubric:: Notes

‘Retrieval’ functions from the operator package are typical f values (itemgetter(...), attrgetter(...) or methodcaller(...))
This is NOT Haskell’s groupBy function

Generic_Util.iter.intersperse(xs: Sequence[_a], ys: list[_a], n: int, prepend=False, append=False) → Sequence[_a]#: Intersperse elements of ys every n elements of xs.

Generic_Util.iter.intersperse_val(xs: Sequence[_a], y: _a, n: int, prepend=False, append=False) → Sequence[_a]#: Intersperse y every n elements of xs.

Generic_Util.iter.isin_sorted(xs: ndarray[Any, dtype[_a]], ys: ndarray[Any, dtype[_a]]) → ndarray[Any, dtype[bool_]]#: Optimised (10x faster) 1D version of np.isin assuming BOTH xs and ys are (ascendingly) sorted arrays of the same type (both int or both float): return a boolean array of the same length as xs indicating whether the corresponding element at that index is present in ys.

Generic_Util.iter.isin_sorted_intervals(xs: ndarray[Any, dtype[_a]], ys: ndarray[Any, dtype[_a]], strict=True) → ndarray[Any, dtype[int64]]#

Assuming BOTH xs and ys are (ascendingly) sorted arrays of the same type (both int or both float): return a 2D array of the intervals (in terms of indices of xs) of subsequences shared with ys.

Notes

The interval-end indices are of the last matching value, not of the next (non-matching) one; run res[:,1] += 1 to switch the behaviour
If the desired intervals are the opposite of the matching ones, call complement_intervals(intervals, len(xs), closed_interval_ends = True)

Parameters:: strict – Whether xs and ys subsequences need to match exactly (e.g. 1223 will not match 123 and vice-versa if strict). If the strictness of the assumption is not respected, the function will produce extra len-1 intervals for each duplicate.

Generic_Util.iter.partition(p: Callable[[_a], bool], xs: Iterable[_a]) → tuple[Iterable[_a], Iterable[_a]]#: Haskell’s partition function, partitioning xs by some boolean predicate p: partition p xs == (filter p xs, filter (not . p) xs).

Generic_Util.iter.topological_sort(nodes_incoming_edges_tuples: Iterable[tuple[_a, list[_b]]]) → list[_a]#: Topological sort, i.e. sort (non-uniquely) DAG nodes by directed path, e.g. sort packages by dependency order.

Generic_Util.iter.unique(xs: Iterable[_a]) → list[_a]#: Order-preserving uniqueness. If the values of xs are hashable, use unique_a instead.

Generic_Util.iter.unique_a(xs: ndarray[Any, dtype[ScalarType]]) → ndarray[Any, dtype[ScalarType]]#: Order-preserving uniqueness for an array (of hashable objects). Much faster than a compiled version of the non-hashing-using algorithm, even including the final sort.

Generic_Util.iter.unique_by(f: Callable[[_a], Any], xs: Iterable[_a]) → list[_a]#: Order-preserving uniqueness by some property f. If the values of xs are hashable, it is recommended to use unique_a.

Generic_Util.iter.unzip(list_of_ntuples: Iterable[Iterable], as_generator=False) → list[list] | Generator[list, None, None]#: Standard unzip (i.e. zip(*…)) but with concretised outputs (as lists).

Generic_Util.iter.update_dict_with(d0: dict, d1: dict, f: Callable[[_a, _b], _a | _b]) → dict[Any, Union[_a, _b]]#: Update a dictionary’s entries with those of another using a given function, e.g. appending (operator.add is ideal for this). NOTE: This modifies d0, so a prior copy or deepcopy is advisable.

Generic_Util.iter.zip_maps(*maps: list[Mapping[_a, _b]]) → Generator[tuple[_a, tuple], None, None]#

Generic_Util.misc module#

Functions with less generic purpose than in the above; currently mostly to do with min/max-based operations.

Generic_Util.misc.interval_overlap(ab: tuple[float, float], cd: tuple[float, float]) → float#: Compute the overlap of two intervals (expressed as tuples of start and end values)

Generic_Util.misc.min_max(xs: Sequence[_a]) → tuple[_a, _a]#

Mathematically most efficient joint identification of min and max (minimum comparisons = 3n/2 - 2). .. note:

- This function is numba-compilable, e.g. as ``njit(nTup(f8,f8)(f8[::1]))(min_max)`` (see ``Generic_Util.numba.types`` for ``nTup`` shorthand),
- If using numpy arrays, min and max are cached for O(1) lookup, and one would imagine this is the used algorithm

Generic_Util.operator module#

Functions regarding item retrieval, and syntactic sugar for patterns of function application.

Generic_Util.operator.fst(ab: tuple[_a, _b]) → _a#: Same as operator.itemgetter(0), but intended (and type-annotated) specifically for 2-tuple use

Generic_Util.operator.get_nested(xss_: Iterable[Iterable], *key_path) → Generator#: Follow a path of keys for a nested combination of iterables

Generic_Util.operator.on(f: Callable, xs: Iterable[_a], g: Callable[[_a, ...], _b], *args, **kwargs)#

Transform xs by element-wise application of g and call f with them as its arguments. E.g. on(operator.gt, (a, b), len) .. rubric:: Notes

*args, **kwargs are for g, not f
‘Retrieval’ functions from the operator package are reasonable g values (itemgetter(...), attrgetter(...) or methodcaller(...)),
BUT on_a and on_m are shorthands for the attribute and method cases

Generic_Util.operator.on_a(f: Callable, xs: Iterable, a: str)#: Extract attribute a from xs elements and call f with them as its arguments. E.g. on_a(operator.eq, [a, b], '__class__')

Generic_Util.operator.on_m(f: Callable, xs: Iterable, m: str, *args, **kwargs)#

Call method m on xs elements and call f with their results as its arguments. E.g. on_m(operator.gt, [a, b], 'count', 'hello') .. rubric:: Notes

*args, **kwargs are for method m, not f

Generic_Util.operator.snd(ab: tuple[_a, _b]) → _b#: Same as operator.itemgetter(1), but intended (and type-annotated) specifically for 2-tuple use

Module contents#

Generic-Util: Convenient functions not found in Python’s Standard Library (regarding benchmarking, iterables, functional tools, operators and numba compilation)