Query Parameters

How Search Works

Malva indexes every sample with fixed-length k-mers (k = 24 nucleotides). When you submit a query sequence, the engine:

  1. Extracts all k-mers from the query.

  2. Looks up each k-mer in the index and identifies which cells contain it.

  3. Aggregates per-cell hit counts and normalises them into expression scores.

The three user-facing parameters let you filter which k-mers participate in the search and whether both strands are considered.

Parameters

min_kmer_presence

Exclude k-mers that appear in fewer than this many cells across the entire database.

  • 0 (default) — no lower filter; all k-mers used

  • 10100 — removes k-mers that are extremely rare (likely sequencing errors or ultra-low-coverage regions)

max_kmer_presence

Exclude k-mers that appear in more than this many cells.

  • 100000 (default) — retains nearly all k-mers

  • 1000050000 — removes highly repetitive / ubiquitous k-mers that would inflate scores for unrelated cell types

Strandedness

By default the forward strand only is searched. Set stranded=False to include reverse-complement k-mers as well.

Useful for ambiguous queries or when the orientation of your sequence is not known.

Probe Design Guidelines

Use the full coding sequence or a representative exon. The default parameters work well.

results = client.search("BRCA1")

# Or with explicit filtering
results = client.search_sequences(
    "ATCGATCGATCG" * 20,
    max_kmer_presence=50000,
)

Design a probe of ~48 nt centred on the junction (24 nt from each exon). Reducing max_kmer_presence ensures only junction-specific k-mers contribute.

junction = "ACGTACGT" * 6  # 48 nt spanning junction
results = client.search_sequences(
    junction,
    max_kmer_presence=10000,
)

Same principle: design a probe that spans the back-splice junction.

bsj_probe = "CTAG" * 12  # 48 nt across BSJ
results = client.search_sequences(
    bsj_probe,
    max_kmer_presence=10000,
)

Centre a 48 nt probe on the variant position.

# Example 48 nt probe with the variant centred in the sequence.
snv_probe = "ACGTACGTACGTACGTACGTACGT" + "T" + "CGTACGTACGTACGTACGTACGT"
results = client.search_sequences(
    snv_probe,
    max_kmer_presence=50000,
)

Warning

3’-biased protocols — Many scRNA-seq protocols (10x Chromium, Drop-seq) capture only the 3’ end of transcripts. If your query targets a region far from the 3’ UTR, you may observe lower signal or no hits even when the gene is expressed. Consider designing probes against the 3’ end of the transcript when working with 3’-biased datasets.

Default Behaviour

When min_kmer_presence, max_kmer_presence, or stranded are omitted, the server applies the defaults shown above (0, 100000, forward only). These work well for the majority of expression queries. Adjust them when you need to reduce noise from rare k-mers (min_kmer_presence) or filter out repetitive/ubiquitous k-mers (max_kmer_presence).