Context Distillation¶

1. Overview¶

Context distillation is a technique that internalizes the behavior induced by a system prompt directly into a model's weights, eliminating the need to include that prompt at inference time. It was introduced by Bai et al. (2022) as a component of the Constitutional AI pipeline.

Core idea: A model conditioned on an alignment prompt (e.g., "Be helpful, harmless, and honest") produces better outputs than the same model without it. Context distillation fine-tunes the model on those better outputs — but without the prompt as input — so the model learns to behave as if the alignment prompt is always present.

This is distinct from knowledge distillation (compressing a larger model into a smaller one). In context distillation, the teacher and student are the same base model; the only difference is whether the alignment prompt is in the input.

2. How It Works¶

Step 1 — Generate aligned responses:

Sample responses from the model conditioned on an alignment context \(c\) (a system prompt encoding desired behavior):

\[y \sim p_\theta(y \mid x, c)\]

Step 2 — Fine-tune without the context:

Train the model to produce those same responses given only the raw input \(x\):

\[\mathcal{L} = -\mathbb{E}_{(x,y)}\left[\log p_\theta(y \mid x)\right]\]

The objective is to minimize the KL divergence between the prompted and unprompted distributions:

\[\min_\theta\; \mathrm{KL}\!\left[p(y \mid x, c)\;\|\; p_\theta(y \mid x)\right]\]

Step 3 — Deploy without the context prompt:

The fine-tuned model produces aligned outputs without needing \(c\) in every request.

3. Role in Constitutional AI¶

In Bai et al. (2022), context distillation is used in the SL-CAI (Supervised Learning — Constitutional AI) stage:

Generate responses using a helpful-only model prompted with a set of principles (the "constitution")
Ask the model to critique and revise those responses (also prompted)
Fine-tune the model on the final revised responses without the constitution in the input

The result is a model that follows constitutional principles in its weights, not just its context window. This supervised stage precedes the RLHF stage in the full CAI pipeline.

4. Advantages and Limitations¶

Advantages:

Inference efficiency: No prompt overhead on every request — reduces latency and token cost
Consistency: Behavioral constraints are in weights, not in a prompt that could be overridden or injected against
Reduced prompt injection risk: Attackers cannot override safety behaviors by manipulating the context if those behaviors are weight-encoded

Limitations:

Generalization bounds: The student model only learns what the teacher's prompted distribution covers — rare or out-of-distribution queries may not generalize
Static alignment: Changes to the desired behavior require retraining; prompt-based systems can be updated instantly
Capability risk: SFT on a narrow distribution can degrade the model's broader capabilities if the dataset isn't diverse enough
No explicit reward signal: Unlike RLHF, there is no mechanism to push responses beyond the quality ceiling of the prompted teacher outputs

Technique	Alignment signal	Prompt at inference	Iterative improvement
Prompt engineering	System prompt	Yes	No
Context distillation	System prompt → SFT	No	No
RLHF	Human preferences → RM	Optional	Yes (RL loop)
Constitutional AI	AI self-critique + SFT/RLHF	No	Yes

Context distillation sits between pure prompting (fragile, reversible) and full RLHF (expensive, iterative). It is most useful as a preprocessing step that creates a well-initialized policy before RLHF, or as a standalone technique when RLHF is too expensive.

Source: Bai et al. (2022) — Constitutional AI: Harmlessness from AI Feedback [arXiv:2212.08073]

Context Distillation¶

1. Overview¶

2. How It Works¶

3. Role in Constitutional AI¶

4. Advantages and Limitations¶

5. Comparison with Related Techniques¶