SPLADE Reference¶

SPLADE is covered in depth in Retrieval Methods. This page is a compact math and mechanics reference.

Scoring Function¶

Score(t) = max_over_positions [ log(1 + ReLU(logit_t)) ]

ReLU — removes negative scores (enforces non-negativity)
log(1 + x) — compresses large activations, controls sparsity
max pooling — takes the highest score across all token positions for term t

Pipeline at a Glance¶

Input text
  → BERT/MLM encoder
  → For each token position: score all V vocabulary terms
  → ReLU + log + max-pool across positions
  → Sparse vector {term: weight, ...}  (most weights = 0)
  → Store in standard inverted index

Term Expansion Example¶

Input	Active terms after expansion
`"car repair"`	car, repair, automobile, mechanic, engine, maintenance
`"better search models"`	search, retrieval, ranking, model, information, query

Documents containing "automobile maintenance" match query "car repair" even with zero word overlap.

SPLADE vs BM25 vs Dense¶

	BM25	SPLADE	Dense bi-encoder
Term weights	Hand-crafted formula	Learned by transformer	N/A (dense vector)
Semantic expansion	None	Yes (learned)	Implicit
Index type	Inverted index	Inverted index	Vector index (ANN)
Training needed	No	Yes	Yes
Interpretable	High	Moderate	Low

Mental model: SPLADE = BM25 with neural term weights and learned vocabulary expansion.