mt.pandas.pdh5

Loading and saving to column-based pdh5 format.

Functions

mt.pandas.pdh5.save_pdh5(filepath: str, df: DataFrame, file_mode: int | None = 436, show_progress: bool = False, **kwargs)

Saves a dataframe into a .pdh5 file.

Parameters:
  • filepath (str) – path to the file to be written to

  • df (pandas.DataFrame) – the dataframe to write from

  • file_mode (int, optional) – file mode of the newly written file

  • show_progress (bool) – show a progress spinner in the terminal

async mt.pandas.pdh5.load_pdh5_asyn(filepath: str, show_progress: bool = False, file_read_delayed: bool = False, max_rows: int | None = None, context_vars: dict = {}, **kwargs) DataFrame

Loads the dataframe of a .pdh5 file.

Parameters:
  • filepath (str) – path to the file to be read from

  • show_progress (bool) – show a progress spinner in the terminal

  • file_read_delayed (bool) – If True, columns of dftype ‘json’, ‘ndarray’, ‘Image’ and ‘SparseNdarray’ are proxied for reading later, returning cells are instances of Pdh5Cell instead. If False, these columns are read thoroughly, which can be slow.

  • max_rows (int, optional) – limit the maximum number of rows to be read from the file

  • context_vars (dict) – a dictionary of context variables within which the function runs. It must include context_vars[‘async’] to tell whether to invoke the function asynchronously or not. Ignored for ‘.pdh5’ format.

Returns:

df – the loaded dataframe

Return type:

pandas.DataFrame

Classes

  • Pdh5Cell: A read-only cell of a pdh5 column.

class mt.pandas.pdh5.Pdh5Cell(col: Pdh5Column, row_id: int)

A read-only cell of a pdh5 column.

Inheritance

digraph inheritance36576215af { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "Pdh5Cell" [URL="#mt.pandas.pdh5.Pdh5Cell",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="A read-only cell of a pdh5 column."]; }