Models

SearchResult

class malva_client.models.SearchResult(raw_data, client)[source]

Bases: MalvaDataFrame

Container for search results that extends MalvaDataFrame for direct analysis.

Data loading is lazy: if raw_data contains no ‘results’ (e.g. it is the initial POST response or a lightweight status response), the DataFrame is populated on first access to .df by fetching /api/expression-data/<job_id>/results from the server.

__init__(raw_data, client)[source]
property job_id: str

Get the job ID for this search

property status: str

Get the current status of the search

property results: Dict[str, Any]

Get the raw search results

property total_cells: int

Get total number of cells found

enrich_with_metadata()[source]

Enrich the search results with sample metadata

Return type:

SearchResult

Returns:

Self with enriched metadata

MalvaDataFrame

class malva_client.models.MalvaDataFrame(expr_df, client, sample_metadata=None)[source]

Bases: object

Wrapper around pandas DataFrame with sample metadata enrichment and analysis methods

__init__(expr_df, client, sample_metadata=None)[source]
property df: DataFrame

Access the underlying pandas DataFrame

filter_by(**kwargs)[source]

Filter data by any combination of metadata fields (optimized for large datasets) Uses simplified field names (e.g., ‘organ’ instead of ‘specimen_from_organism.organ’)

Parameters:

**kwargs – Field=value pairs for filtering

Return type:

MalvaDataFrame

Returns:

New MalvaDataFrame with filtered data

Example

df.filter_by(organ=’brain’, disease=’normal’, cell_type=’neuron’) df.filter_by(species=’Homo sapiens’, study=’BrainDepressiveDisorder’)

aggregate_by(group_by, agg_func='mean', expr_column='rel')[source]

Aggregate expression data by specified grouping variables

Parameters:
  • group_by (Union[str, List[str]]) – Column name(s) to group by

  • agg_func (str) – Aggregation function (‘mean’, ‘median’, ‘sum’, ‘count’, ‘std’)

  • expr_column (str) – Expression column to aggregate

Return type:

DataFrame

Returns:

DataFrame with aggregated results

Example

df.aggregate_by(‘cell_type’) df.aggregate_by([‘organ’, ‘cell_type’])

plot_expression_by(group_by, limit=None, sort_by='mean', ascending=False, **kwargs)[source]

Plot expression levels grouped by a metadata field

Parameters:
  • group_by (str) – Column to group by for plotting

  • limit (Optional[int]) – Maximum number of categories to show (shows top N)

  • sort_by (str) – How to sort categories (‘mean’, ‘median’, ‘count’, ‘alphabetical’)

  • ascending (bool) – Whether to sort in ascending order (False shows highest first)

  • **kwargs – Additional arguments passed to matplotlib

plot_expression_summary(group_by, limit=10, plot_type='box', **kwargs)[source]
to_pandas()[source]

Convert to regular pandas DataFrame

Return type:

DataFrame

available_fields()[source]

Get categorized list of available fields for filtering/grouping

Return type:

Dict[str, List[str]]

Returns:

Dictionary with categorized field names

available_filter_fields()[source]

Get simplified list of commonly used fields for filtering

Return type:

List[str]

field_info()[source]

Get detailed information about available fields

Return type:

DataFrame

Returns:

DataFrame with field names, types, unique values count, and examples

unique_values(field, limit=20)[source]

Get unique values for a specific field

Parameters:
  • field (str) – Field name to get unique values for

  • limit (int) – Maximum number of values to return

Return type:

List

Returns:

Sorted list of unique values

value_counts(field, limit=10)[source]

Get value counts for a field

Return type:

Series

CellExpressionMatrixResult

class malva_client.models.CellExpressionMatrixResult(archive_path=None, export_record=None, client=None, *, cells=None, features=None, matrix_entries=None, normalization_factors=None, sample_metadata=None, barcodes=None, job_id=None, source='direct')[source]

Bases: object

Per-cell result returned by MalvaClient.retrieve_cells().

New results are built directly from search and metadata endpoints. The older ZIP-backed constructor remains supported for compatibility.

__init__(archive_path=None, export_record=None, client=None, *, cells=None, features=None, matrix_entries=None, normalization_factors=None, sample_metadata=None, barcodes=None, job_id=None, source='direct')[source]
property cells: DataFrame

row_index, sample_id, cell_id.

Type:

Rows of the matrix

property features: DataFrame

feature_index, job_id, feature, label, and source.

Type:

Columns of the matrix

property normalization_factors: DataFrame

Per-cell size factors aligned by row_index, when available.

property sample_metadata: DataFrame

Sample metadata table keyed by sample_id, when available.

property matrix_entries: DataFrame

Sparse matrix entries as row_index, feature_index, value.

Values are raw per-cell expression or k-mer hit counts. Missing row/feature pairs are zero.

get_cell_ids(sample_ids=None)[source]

Return sample_id/cell_id pairs from the matrix rows.

Parameters:

sample_ids (Union[int, List[int], None]) – Optional encoded sample ID or list of sample IDs.

Return type:

DataFrame

positive_cells(feature=None, sample_ids=None)[source]

Return cells with non-zero expression in any retrieved feature or one feature.

Parameters:
  • feature (Union[str, int, None]) – Feature label/name or feature_index. If omitted, returns all cells that are present in the retrieved matrix rows.

  • sample_ids (Union[int, List[int], None]) – Optional encoded sample ID or list of sample IDs.

Return type:

DataFrame

to_dataframe(normalized=False, include_sample_metadata=False)[source]

Convert the sparse matrix to a long DataFrame.

Parameters:
  • normalized (bool) – Add normalized_value = value / size_factor when normalization factors are available.

  • include_sample_metadata (bool) – Merge sample metadata by sample_id.

Return type:

DataFrame

for_sample(sample_id, normalized=False, include_sample_metadata=False)[source]

Return long matrix entries for one encoded sample ID.

Return type:

DataFrame

to_single_cell_result(feature=None, sample_ids=None, normalized=False)[source]

Convert one retrieved feature to the legacy SingleCellResult shape.

This is useful for existing downstream code that expects columns cell_id, expression, and sample_id.

Return type:

SingleCellResult

project(dataset_id, sample_ids=None, feature=None, **kwargs)[source]

Project retrieved positive cells onto a coexpression index.

Parameters:
  • dataset_id (str) – Coexpression index or dataset identifier.

  • sample_ids (Union[int, List[int], None]) – Optional encoded sample ID or IDs to restrict cells.

  • feature (Union[str, int, None]) – Optional feature to restrict to cells positive for that feature. If omitted, uses all retrieved positive cells.

  • **kwargs – Additional coexpression parameters, such as top_n_genes.

Return type:

CoexpressionResult

SingleCellResult

class malva_client.models.SingleCellResult(results_data, client=None)[source]

Bases: object

Represents search results at the single cell level (not aggregated by cell type)

__init__(results_data, client=None)[source]

Initialize SingleCellResult

Parameters:
  • results_data (Dict[str, Any]) – Raw results from the API

  • client (Optional[MalvaClient]) – MalvaClient instance for metadata enrichment

property is_completed: bool

Check if the search is completed

property is_pending: bool

Check if the search is still pending

property has_results: bool

Check if results are available

property cell_count: int

Get total number of cells in results

property sample_count: int

Get number of unique samples in results

property query_gene: str | None

Get the queried gene symbol or sequence

to_dataframe()[source]

Convert results to a pandas DataFrame

Returns:

cell_id, expression, sample_id

Return type:

DataFrame with columns

get_cell_ids()[source]

Get list of all cell IDs

Return type:

List[int]

get_expression_values()[source]

Get list of all expression values

Return type:

List[float]

get_sample_ids()[source]

Get list of all sample IDs

Return type:

List[int]

filter_by_expression(min_expression=0, max_expression=inf)[source]

Filter results by expression thresholds

Parameters:
  • min_expression (float) – Minimum expression value

  • max_expression (float) – Maximum expression value

Return type:

SingleCellResult

Returns:

New SingleCellResult with filtered data

filter_by_samples(sample_ids)[source]

Filter results to specific samples

Parameters:

sample_ids (List[int]) – List of sample IDs to keep

Return type:

SingleCellResult

Returns:

New SingleCellResult with filtered data

get_top_expressing_cells(n=100)[source]

Get top N expressing cells

Parameters:

n (int) – Number of top cells to return

Return type:

SingleCellResult

Returns:

New SingleCellResult with top expressing cells

aggregate_by_sample()[source]

Aggregate expression data by sample

Return type:

DataFrame

Returns:

DataFrame with sample-level statistics

get_expression_stats()[source]

Get basic statistics about expression values

Return type:

Dict[str, float]

Returns:

Dictionary with expression statistics

enrich_with_metadata(sample_metadata=True)[source]

Enrich results with metadata from the client

Parameters:

sample_metadata (bool) – Whether to include sample metadata

Return type:

DataFrame

Returns:

DataFrame with enriched metadata

save_to_csv(filename, include_metadata=True)[source]

Save results to CSV file

Parameters:
  • filename (str) – Output filename

  • include_metadata (bool) – Whether to include metadata enrichment

CoverageResult

class malva_client.models.CoverageResult(raw_data, client=None)[source]

Bases: object

Represents genomic coverage data from the Malva genome browser.

Coverage data is organized as a matrix with genomic positions as rows and cell types as columns. Each cell contains a coverage value.

__init__(raw_data, client=None)[source]

Initialize CoverageResult

Parameters:
  • raw_data (Dict[str, Any]) – Raw coverage data from the API

  • client (Optional[MalvaClient]) – MalvaClient instance for follow-up requests

to_dataframe()[source]

Convert coverage data to a pandas DataFrame.

Return type:

DataFrame

Returns:

DataFrame with positions as index and cell types as columns

get_filter_options()[source]

Get available filter options for this coverage result.

Return type:

Dict[str, Any]

Returns:

Dictionary with filter options

download_wig(output_path, **filters)[source]

Download coverage data as a WIG file.

Parameters:
  • output_path (str) – Path to save the WIG file

  • **filters – Optional metadata filters

Return type:

str

Returns:

Path to the saved file

plot(cell_types=None, **kwargs)[source]

Plot coverage across the genomic region.

Parameters:
  • cell_types (Optional[List[str]]) – Specific cell types to plot (default: all)

  • **kwargs – Additional arguments passed to matplotlib

CoexpressionResult

class malva_client.models.CoexpressionResult(raw_data, client=None)[source]

Bases: object

Full coexpression analysis result from the Malva coexpression API.

Wraps the response from POST /api/coexpression/query-by-job and provides DataFrame conversions, top-gene retrieval, and plotting helpers for correlated genes, GO enrichment, and UMAP scores.

__init__(raw_data, client=None)[source]

Initialize CoexpressionResult.

Parameters:
  • raw_data (Dict[str, Any]) – Raw response from the coexpression endpoint

  • client (Optional[MalvaClient]) – MalvaClient instance for follow-up requests

genes_to_dataframe()[source]

Convert correlated genes to a DataFrame.

Returns:

gene, correlation, p_value (plus any extra fields returned by the server)

Return type:

DataFrame with columns

scores_to_dataframe()[source]

Convert UMAP scores to a DataFrame.

Return type:

DataFrame

Returns:

DataFrame with metacell-level score data

umap_to_dataframe()[source]

Convert UMAP score data to a DataFrame with x/y coordinates.

Falls back to scores_to_dataframe() when coordinates are embedded in the scores payload.

Return type:

DataFrame

Returns:

DataFrame with UMAP coordinates and scores

go_to_dataframe()[source]

Convert GO enrichment results to a DataFrame.

Return type:

DataFrame

Returns:

DataFrame with columns such as go_id, name, fdr, etc.

cell_type_enrichment_to_dataframe()[source]

Convert cell-type enrichment to a DataFrame.

Return type:

DataFrame

Returns:

DataFrame with cell-type enrichment data

tissue_breakdown_to_dataframe()[source]

Convert tissue breakdown to a DataFrame.

Return type:

DataFrame

Returns:

DataFrame with tissue breakdown data

get_top_genes(n=20, min_correlation=0.0)[source]

Get the top n correlated gene names.

Parameters:
  • n (int) – Number of genes to return

  • min_correlation (float) – Minimum correlation to include

Return type:

List[str]

Returns:

List of gene names

plot_umap(color_by='positive_fraction', point_size=None, cmap='viridis', figsize=(10, 8))[source]

Scatter plot of UMAP coordinates coloured by a score column.

Parameters:
  • color_by (str) – Column in the scores data to use for colouring

  • point_size (Optional[float]) – Marker size (auto-scaled if None)

  • cmap (str) – Matplotlib colormap name

  • figsize (Tuple[int, int]) – Figure size as (width, height)

Returns:

matplotlib Figure

plot_top_genes(n=20, figsize=(8, 6))[source]

Horizontal bar chart of the top correlated genes.

Parameters:
  • n (int) – Number of genes to show

  • figsize (Tuple[int, int]) – Figure size as (width, height)

Returns:

matplotlib Figure

plot_go_enrichment(n=15, figsize=(8, 6))[source]

Bar chart of GO enrichment results (−log10 FDR).

Parameters:
  • n (int) – Number of GO terms to show

  • figsize (Tuple[int, int]) – Figure size as (width, height)

Returns:

matplotlib Figure

UMAPCoordinates

class malva_client.models.UMAPCoordinates(raw_data, client=None)[source]

Bases: object

Lightweight container for UMAP coordinates from the coexpression API.

Wraps the compact parallel-array format returned by GET /api/coexpression/umap/<dataset_id> and provides conversion to a pandas DataFrame and a simple scatter-plot method.

__init__(raw_data, client=None)[source]

Initialize UMAPCoordinates.

Parameters:
  • raw_data (Dict[str, Any]) – Raw response from the UMAP endpoint

  • client (Optional[MalvaClient]) – MalvaClient instance for follow-up requests

to_dataframe()[source]

Convert to a pandas DataFrame.

Returns:

x, y, metacell_id, n_cells, sample, cluster

Return type:

DataFrame with columns

plot(color_by='cluster', point_size=None, cmap='tab20', figsize=(10, 8))[source]

Scatter plot of UMAP coordinates.

Parameters:
  • color_by (str) – Column to color points by (default 'cluster')

  • point_size (Optional[float]) – Marker size (auto-scaled if None)

  • cmap (str) – Matplotlib colormap name

  • figsize (Tuple[int, int]) – Figure size as (width, height)

Returns:

matplotlib Figure