Evaluation workflow¶
This page is a practical checklist for comparing routers on the same query set. For the underlying concepts (metrics, supervision, trade-offs), see Training and evaluation.
1) Prepare an evaluation file¶
Use JSONL and include stable IDs if you plan to join with labels later:
{"query_id":"q1","query":"What is machine learning?"}
{"query_id":"q2","query":"Explain transformers."}
2) Run routing-only for each router¶
Route-only runs are fast and avoid API costs:
llmrouter infer --router knnrouter --config configs/model_config_test/knnrouter.yaml --input eval.jsonl --output knn.jsonl --output-format jsonl --route-only
llmrouter infer --router svmrouter --config configs/model_config_test/svmrouter.yaml --input eval.jsonl --output svm.jsonl --output-format jsonl --route-only
3) Aggregate results¶
At minimum, you can compute: - routing distribution (how often each model is selected) - disagreement between routers
If you also have routing labels (for example, a best_model per query), you can compute simple accuracy by comparing model_name against that label.
4) (Optional) Full inference¶
Full inference can be useful for end-to-end validation, but it is slower and requires API credentials.
llmrouter infer --router knnrouter --config configs/model_config_test/knnrouter.yaml --input eval.jsonl --output knn_full.jsonl --output-format jsonl
Next¶
- Notebook workflows live under
main/notebooks: https://github.com/ulab-uiuc/LLMRouter/tree/main/notebooks - Router capabilities and READMEs: Routers