Skip to content

Batch inference

Batch inference is the easiest way to compare routers on many queries and to generate routing logs for analysis.

Input and output schemas

See Data formats for supported --input formats and output fields.

--route-only is fast and requires no API keys.

llmrouter infer --router knnrouter --config configs/model_config_test/knnrouter.yaml --input queries.jsonl --output routed.jsonl --output-format jsonl --route-only

Full inference batch run

This routes each query and then calls the selected model via an OpenAI-compatible endpoint.

Prereqs: - API_KEYS is set (see Installation) - each candidate model in llm_data has api_endpoint and model, or you set a global api_endpoint in the YAML (see Config reference)

llmrouter infer --router knnrouter --config configs/model_config_test/knnrouter.yaml --input queries.jsonl --output results.jsonl --output-format json

Tips for large runs

  • Prefer JSONL output: --output-format jsonl (stream-friendly, appendable).
  • If you only need routing decisions, always include --route-only.
  • Use --verbose if you are debugging errors (it prints more context).

Troubleshooting

  • API endpoint not found: add api_endpoint to llm_data[model_name] or set api_endpoint in the YAML.
  • Many success=false: check your input file format and make sure query is present.