mt.pandas.series

Extra functions to augment pandas.Series.

Functions

  • Series4json(): Converts a json-like object in to a pandas.Series.

  • json4Series(): Converts a pandas.Series into a json-like object.

  • to_categorical(): Converts a string series representing a categorical field into a zero-indexed categorical field and a one-hot field in json format.

  • series_apply(): Applies a function on every item of a pandas.Series, optionally with a progress bar.

  • series_parallel_apply(): Parallel-applies a function on every item of a pandas.Series, optionally with a progress bar.

  • stats(): Computes some statistics out of a scalar series.

mt.pandas.series.Series4json(obj)

Converts a json-like object in to a pandas.Series.

Parameters:

obj (json_like) – a json-like object

Returns:

output Series

Return type:

pandas.Series

Notes

A json-like object contains 2 array_likes, each has length K. The first array represents the index array. The second array represents the value array.

mt.pandas.series.json4Series(obj)

Converts a pandas.Series into a json-like object.

Parameters:

obj (pandas.Series) – input Series

Returns:

a json-like object

Return type:

json_like

Notes

A json-like object contains 2 array_likes, each has length K. The first array represents the index array. The second array represents the value array.

mt.pandas.series.to_categorical(series, value_list, missing_values='raise_exception', logger=None)

Converts a string series representing a categorical field into a zero-indexed categorical field and a one-hot field in json format.

Parameters:
  • series (pandas.Series) – an indexed series whose values are strings representing categories

  • value_list (list) – list of accepted values to be converted into integers. The first value is converted into 0, the second value into 2, and so on. The length of the list is the number of categories.

  • missing_values ({'raise_exception', 'remove_and_warn', 'remove_in_silence'}) – policy for treating missing values upon encountered. If ‘raise_exception’ is specified, raise a ValueError when encountering the first missing value. If ‘remove_and_warn’ is specified, all missing values are warned to the logger, but records containing missing values are removed from output. If ‘remove_in_silence’ is specified, all records containing missing values are removed from output in slience.

  • logger (logging.Logger to equivalent, optional) – logger for debugging purposes

Returns:

df – indexed dataframe with the same index as series. However, some records may be removed due to containing missing values. Field cat_id contains the converted zero-indexed categorical id intenger. Field one_hot contains a numpy 1d array representing the one-hot representation of the categorical id. The number of components of the one-hot vector is the same as the length of value_list.

Return type:

pandas.DataFrame(columns=[‘cat_id’, ‘one_hot’])

Raises:

ValueError – if something has gone wrong

mt.pandas.series.series_apply(s: Series, func, bar_unit='it', logger: IndentedLoggerAdapter | None = None, func_uses_logger: bool = False) Series

Applies a function on every item of a pandas.Series, optionally with a progress bar.

Parameters:
  • s (pandas.Series) – a series

  • func (function) – a function to map each item of the series to something

  • bar_unit (str, optional) – unit name to be passed to the progress bar. If None is provided, no bar is displayed.

  • logger (mt.logg.IndentedLoggerAdapter, optional) – logger for debugging purposes. Only used if bar_unit is not None.

  • func_uses_logger (bool) – whether or not the function takes logger as an additional keyword argument. Valid only if logger is used.

Returns:

output series by invoking s.apply. And a progress bar is shown if asked.

Return type:

pandas.Series

mt.pandas.series.series_parallel_apply(s: Series, func, n_cores: int = -1, parallelism: str = 'multiprocess', logger: IndentedLoggerAdapter | None = None, scoped_msg: str | None = None) Series

Parallel-applies a function on every item of a pandas.Series, optionally with a progress bar.

The method wraps class:pandas_parallel_apply.SeriesParallel. The progress bars are shown if and only if a logger is provided.

Parameters:
  • s (pandas.Series) – a series

  • func (function) – a function to map each item of the series to something. It must be pickable for parallel processing.

  • n_cores (int) – number of CPUs to use. Passed as-is to pandas_parallel_apply.SeriesParallel.

  • parallelism ({'multithread', 'multiprocess'}) – multi-threading or multi-processing. Passed as-is to pandas_parallel_apply.SeriesParallel.

  • logger (mt.logg.IndentedLoggerAdapter, optional) – logger for debugging purposes.

  • scoped_msg (str, optional) – whether or not to scoped_info the progress bars. Only valid if a logger is provided

Returns:

output series by invoking s.apply.

Return type:

pandas.Series

See also

pandas_parallel_apply.SeriesParallel

the wrapped class for the parallel_apply purpose

mt.pandas.series.stats(s: Series) dict

Computes some statistics out of a scalar series.

Parameters:

s (pandas.Series) – a scalar series of non-null numeric values

Returns:

an output dictionary containing keys ‘min’, ‘max’, ‘mean’, ‘std’

Return type:

dict