Models¶

SearchResult¶

class malva_client.models.SearchResult(raw_data, client)[source]¶

Bases: MalvaDataFrame

Container for search results that extends MalvaDataFrame for direct analysis.

Data loading is lazy: if raw_data contains no ‘results’ (e.g. it is the initial POST response or a lightweight status response), the DataFrame is populated on first access to .df by fetching /api/expression-data/<job_id>/results from the server.

__init__(raw_data, client)[source]¶

property job_id: str¶: Get the job ID for this search

property status: str¶: Get the current status of the search

property results: Dict[str, Any]¶: Get the raw search results

property total_cells: int¶: Get total number of cells found

enrich_with_metadata()[source]¶

Enrich the search results with sample metadata

Return type:: SearchResult
Returns:: Self with enriched metadata

MalvaDataFrame¶

class malva_client.models.MalvaDataFrame(expr_df, client, sample_metadata=None)[source]¶

Bases: object

Wrapper around pandas DataFrame with sample metadata enrichment and analysis methods

__init__(expr_df, client, sample_metadata=None)[source]¶

property df: DataFrame¶: Access the underlying pandas DataFrame

filter_by(**kwargs)[source]¶

Filter data by any combination of metadata fields (optimized for large datasets) Uses simplified field names (e.g., ‘organ’ instead of ‘specimen_from_organism.organ’)

Parameters:: **kwargs – Field=value pairs for filtering
Return type:: MalvaDataFrame
Returns:: New MalvaDataFrame with filtered data

Example

df.filter_by(organ=’brain’, disease=’normal’, cell_type=’neuron’) df.filter_by(species=’Homo sapiens’, study=’BrainDepressiveDisorder’)

aggregate_by(group_by, agg_func='mean', expr_column='rel')[source]¶

Aggregate expression data by specified grouping variables

Parameters:

group_by (Union[str, List[str]]) – Column name(s) to group by
agg_func (str) – Aggregation function (‘mean’, ‘median’, ‘sum’, ‘count’, ‘std’)
expr_column (str) – Expression column to aggregate

Return type:

DataFrame

Returns:

DataFrame with aggregated results

Example

df.aggregate_by(‘cell_type’) df.aggregate_by([‘organ’, ‘cell_type’])

plot_expression_by(group_by, limit=None, sort_by='mean', ascending=False, **kwargs)[source]¶

Plot expression levels grouped by a metadata field

Parameters:

group_by (str) – Column to group by for plotting
limit (Optional[int]) – Maximum number of categories to show (shows top N)
sort_by (str) – How to sort categories (‘mean’, ‘median’, ‘count’, ‘alphabetical’)
ascending (bool) – Whether to sort in ascending order (False shows highest first)
**kwargs – Additional arguments passed to matplotlib

plot_expression_summary(group_by, limit=10, plot_type='box', **kwargs)[source]¶

to_pandas()[source]¶

Convert to regular pandas DataFrame

Return type:: DataFrame

available_fields()[source]¶

Get categorized list of available fields for filtering/grouping

Return type:: Dict[str, List[str]]
Returns:: Dictionary with categorized field names

available_filter_fields()[source]¶

Get simplified list of commonly used fields for filtering

Return type:: List[str]

field_info()[source]¶

Get detailed information about available fields

Return type:: DataFrame
Returns:: DataFrame with field names, types, unique values count, and examples

unique_values(field, limit=20)[source]¶

Get unique values for a specific field

Parameters:

field (str) – Field name to get unique values for
limit (int) – Maximum number of values to return

Return type:

List

Returns:

Sorted list of unique values

value_counts(field, limit=10)[source]¶

Get value counts for a field

Return type:: Series

CellExpressionMatrixResult¶

class malva_client.models.CellExpressionMatrixResult(archive_path=None, export_record=None, client=None, *, cells=None, features=None, matrix_entries=None, normalization_factors=None, sample_metadata=None, barcodes=None, job_id=None, source='direct')[source]¶

Bases: object

Per-cell result returned by MalvaClient.retrieve_cells().

New results are built directly from search and metadata endpoints. The older ZIP-backed constructor remains supported for compatibility.

__init__(archive_path=None, export_record=None, client=None, *, cells=None, features=None, matrix_entries=None, normalization_factors=None, sample_metadata=None, barcodes=None, job_id=None, source='direct')[source]¶

property cells: DataFrame¶

row_index, sample_id, cell_id.

Type:: Rows of the matrix

property features: DataFrame¶

feature_index, job_id, feature, label, and source.

Type:: Columns of the matrix

property normalization_factors: DataFrame¶: Per-cell size factors aligned by row_index, when available.

property sample_metadata: DataFrame¶: Sample metadata table keyed by sample_id, when available.

property matrix_entries: DataFrame¶

Sparse matrix entries as row_index, feature_index, value.

Values are raw per-cell expression or k-mer hit counts. Missing row/feature pairs are zero.

get_cell_ids(sample_ids=None)[source]¶

Return sample_id/cell_id pairs from the matrix rows.

Parameters:: sample_ids (Union[int, List[int], None]) – Optional encoded sample ID or list of sample IDs.
Return type:: DataFrame

positive_cells(feature=None, sample_ids=None)[source]¶

Return cells with non-zero expression in any retrieved feature or one feature.

Parameters:

feature (Union[str, int, None]) – Feature label/name or feature_index. If omitted, returns all cells that are present in the retrieved matrix rows.
sample_ids (Union[int, List[int], None]) – Optional encoded sample ID or list of sample IDs.

Return type:

DataFrame

to_dataframe(normalized=False, include_sample_metadata=False)[source]¶

Convert the sparse matrix to a long DataFrame.

Parameters:

normalized (bool) – Add normalized_value = value / size_factor when normalization factors are available.
include_sample_metadata (bool) – Merge sample metadata by sample_id.

Return type:

DataFrame

for_sample(sample_id, normalized=False, include_sample_metadata=False)[source]¶

Return long matrix entries for one encoded sample ID.

Return type:: DataFrame

to_single_cell_result(feature=None, sample_ids=None, normalized=False)[source]¶

Convert one retrieved feature to the legacy SingleCellResult shape.

This is useful for existing downstream code that expects columns cell_id, expression, and sample_id.

Return type:: SingleCellResult

project(dataset_id, sample_ids=None, feature=None, **kwargs)[source]¶

Project retrieved positive cells onto a coexpression index.

Parameters:

dataset_id (str) – Coexpression index or dataset identifier.
sample_ids (Union[int, List[int], None]) – Optional encoded sample ID or IDs to restrict cells.
feature (Union[str, int, None]) – Optional feature to restrict to cells positive for that feature. If omitted, uses all retrieved positive cells.
**kwargs – Additional coexpression parameters, such as top_n_genes.

Return type:

CoexpressionResult

SingleCellResult¶

class malva_client.models.SingleCellResult(results_data, client=None)[source]¶

Bases: object

Represents search results at the single cell level (not aggregated by cell type)

__init__(results_data, client=None)[source]¶

Initialize SingleCellResult

Parameters:

results_data (Dict[str, Any]) – Raw results from the API
client (Optional[MalvaClient]) – MalvaClient instance for metadata enrichment

property is_completed: bool¶: Check if the search is completed

property is_pending: bool¶: Check if the search is still pending

property has_results: bool¶: Check if results are available

property cell_count: int¶: Get total number of cells in results

property sample_count: int¶: Get number of unique samples in results

property query_gene: str | None¶: Get the queried gene symbol or sequence

to_dataframe()[source]¶

Convert results to a pandas DataFrame

Returns:: cell_id, expression, sample_id
Return type:: DataFrame with columns

get_cell_ids()[source]¶

Get list of all cell IDs

Return type:: List[int]

get_expression_values()[source]¶

Get list of all expression values

Return type:: List[float]

get_sample_ids()[source]¶

Get list of all sample IDs

Return type:: List[int]

filter_by_expression(min_expression=0, max_expression=inf)[source]¶

Filter results by expression thresholds

Parameters:

min_expression (float) – Minimum expression value
max_expression (float) – Maximum expression value

Return type:

SingleCellResult

Returns:

New SingleCellResult with filtered data

filter_by_samples(sample_ids)[source]¶

Filter results to specific samples

Parameters:: sample_ids (List[int]) – List of sample IDs to keep
Return type:: SingleCellResult
Returns:: New SingleCellResult with filtered data

get_top_expressing_cells(n=100)[source]¶

Get top N expressing cells

Parameters:: n (int) – Number of top cells to return
Return type:: SingleCellResult
Returns:: New SingleCellResult with top expressing cells

aggregate_by_sample()[source]¶

Aggregate expression data by sample

Return type:: DataFrame
Returns:: DataFrame with sample-level statistics

get_expression_stats()[source]¶

Get basic statistics about expression values

Return type:: Dict[str, float]
Returns:: Dictionary with expression statistics

enrich_with_metadata(sample_metadata=True)[source]¶

Enrich results with metadata from the client

Parameters:: sample_metadata (bool) – Whether to include sample metadata
Return type:: DataFrame
Returns:: DataFrame with enriched metadata

save_to_csv(filename, include_metadata=True)[source]¶

Save results to CSV file

Parameters:

filename (str) – Output filename
include_metadata (bool) – Whether to include metadata enrichment

CoverageResult¶

class malva_client.models.CoverageResult(raw_data, client=None)[source]¶

Bases: object

Represents genomic coverage data from the Malva genome browser.

Coverage data is organized as a matrix with genomic positions as rows and cell types as columns. Each cell contains a coverage value.

__init__(raw_data, client=None)[source]¶

Initialize CoverageResult

Parameters:

raw_data (Dict[str, Any]) – Raw coverage data from the API
client (Optional[MalvaClient]) – MalvaClient instance for follow-up requests

to_dataframe()[source]¶

Convert coverage data to a pandas DataFrame.

Return type:: DataFrame
Returns:: DataFrame with positions as index and cell types as columns

get_filter_options()[source]¶

Get available filter options for this coverage result.

Return type:: Dict[str, Any]
Returns:: Dictionary with filter options

download_wig(output_path, **filters)[source]¶

Download coverage data as a WIG file.

Parameters:

output_path (str) – Path to save the WIG file
**filters – Optional metadata filters

Return type:

str

Returns:

Path to the saved file

plot(cell_types=None, **kwargs)[source]¶

Plot coverage across the genomic region.

Parameters:

cell_types (Optional[List[str]]) – Specific cell types to plot (default: all)
**kwargs – Additional arguments passed to matplotlib

CoexpressionResult¶

class malva_client.models.CoexpressionResult(raw_data, client=None)[source]¶

Bases: object

Full coexpression analysis result from the Malva coexpression API.

Wraps the response from POST /api/coexpression/query-by-job and provides DataFrame conversions, top-gene retrieval, and plotting helpers for correlated genes, GO enrichment, and UMAP scores.

__init__(raw_data, client=None)[source]¶

Initialize CoexpressionResult.

Parameters:

raw_data (Dict[str, Any]) – Raw response from the coexpression endpoint
client (Optional[MalvaClient]) – MalvaClient instance for follow-up requests

genes_to_dataframe()[source]¶

Convert correlated genes to a DataFrame.

Returns:: gene, correlation, p_value (plus any extra fields returned by the server)
Return type:: DataFrame with columns

scores_to_dataframe()[source]¶

Convert UMAP scores to a DataFrame.

Return type:: DataFrame
Returns:: DataFrame with metacell-level score data

umap_to_dataframe()[source]¶

Convert UMAP score data to a DataFrame with x/y coordinates.

Falls back to scores_to_dataframe() when coordinates are embedded in the scores payload.

Return type:: DataFrame
Returns:: DataFrame with UMAP coordinates and scores

go_to_dataframe()[source]¶

Convert GO enrichment results to a DataFrame.

Return type:: DataFrame
Returns:: DataFrame with columns such as go_id, name, fdr, etc.

cell_type_enrichment_to_dataframe()[source]¶

Convert cell-type enrichment to a DataFrame.

Return type:: DataFrame
Returns:: DataFrame with cell-type enrichment data

tissue_breakdown_to_dataframe()[source]¶

Convert tissue breakdown to a DataFrame.

Return type:: DataFrame
Returns:: DataFrame with tissue breakdown data

get_top_genes(n=20, min_correlation=0.0)[source]¶

Get the top n correlated gene names.

Parameters:

n (int) – Number of genes to return
min_correlation (float) – Minimum correlation to include

Return type:

List[str]

Returns:

List of gene names

plot_umap(color_by='positive_fraction', point_size=None, cmap='viridis', figsize=(10, 8))[source]¶

Scatter plot of UMAP coordinates coloured by a score column.

Parameters:

color_by (str) – Column in the scores data to use for colouring
point_size (Optional[float]) – Marker size (auto-scaled if None)
cmap (str) – Matplotlib colormap name
figsize (Tuple[int, int]) – Figure size as (width, height)

Returns:

matplotlib Figure

plot_top_genes(n=20, figsize=(8, 6))[source]¶

Horizontal bar chart of the top correlated genes.

Parameters:

n (int) – Number of genes to show
figsize (Tuple[int, int]) – Figure size as (width, height)

Returns:

matplotlib Figure

plot_go_enrichment(n=15, figsize=(8, 6))[source]¶

Bar chart of GO enrichment results (−log10 FDR).

Parameters:

n (int) – Number of GO terms to show
figsize (Tuple[int, int]) – Figure size as (width, height)

Returns:

matplotlib Figure

UMAPCoordinates¶

class malva_client.models.UMAPCoordinates(raw_data, client=None)[source]¶

Bases: object

Lightweight container for UMAP coordinates from the coexpression API.

Wraps the compact parallel-array format returned by GET /api/coexpression/umap/<dataset_id> and provides conversion to a pandas DataFrame and a simple scatter-plot method.

__init__(raw_data, client=None)[source]¶

Initialize UMAPCoordinates.

Parameters:

raw_data (Dict[str, Any]) – Raw response from the UMAP endpoint
client (Optional[MalvaClient]) – MalvaClient instance for follow-up requests

to_dataframe()[source]¶

Convert to a pandas DataFrame.

Returns:: x, y, metacell_id, n_cells, sample, cluster
Return type:: DataFrame with columns

plot(color_by='cluster', point_size=None, cmap='tab20', figsize=(10, 8))[source]¶

Scatter plot of UMAP coordinates.

Parameters:

color_by (str) – Column to color points by (default 'cluster')
point_size (Optional[float]) – Marker size (auto-scaled if None)
cmap (str) – Matplotlib colormap name
figsize (Tuple[int, int]) – Figure size as (width, height)

Returns:

matplotlib Figure