Architecture¶
LLMRouter is organized around a small set of core modules: a unified CLI, router implementations, trainer implementations, and a config-driven data loader.
Note
Source links below point to the main branch (the docs site is built from website).
Key source files (main branch)¶
- CLI entrypoint: llmrouter/cli/router_main.py
- Inference registry + helpers: llmrouter/cli/router_inference.py
- Training registry + helpers: llmrouter/cli/router_train.py
- Chat UI (Gradio): llmrouter/cli/router_chat.py
- Router base class: llmrouter/models/meta_router.py
- Data loader: llmrouter/data/data_loader.py
- API calling (LiteLLM): llmrouter/utils/api_calling.py
- Plugin discovery: llmrouter/plugin_system.py
Design principles (why the code is shaped this way)¶
LLMRouter is designed to make it easy to compare routing algorithms under a shared interface and shared evaluation story.
- A stable routing contract
- Every router implements
route_single/route_batchand returns a model choice viamodel_name(or an alias key). -
This lets you swap from KNN to GNN to LLM-based routing without changing the surrounding CLI/workflow.
-
Separation of decision vs execution
- Routing is "which model should answer?"
- Execution is "call that model (or your gateway) and get a response"
-
--route-onlyexists so you can evaluate routing decisions without coupling to provider endpoints or API keys. -
Config-driven reproducibility
- Data paths, model paths, and hyperparameters live in YAML configs.
-
This makes experiments easy to version, share, and rerun.
-
Model metadata as an interface boundary
llm_datadecouples routing decisions (model_name) from provider-specific identifiers (model) and endpoints (api_endpoint).-
This is what allows the same router policy to be reused across different backends.
-
Extensibility without forking
- The plugin system registers custom routers into the same registries used by built-in routers.
- This is intentional: new algorithms should be first-class citizens, not a special case.
High-level lifecycle¶
- A YAML config defines data paths, model paths, and router hyperparameters.
- The CLI resolves a router name into a concrete router class (and trainer class, if training).
- The router loads config and data (if initialized with
yaml_path) and exposesroute_single/route_batch. - Trainers implement the training loop for trainable routers.
- Inference either returns a routing decision (
--route-only) or calls an external LLM endpoint.
Configuration-driven router initialization¶
Most routers inherit from MetaRouter. When you construct a router with yaml_path:
- YAML is loaded into
self.cfg DataLoaderattaches datasets as attributes on the router instance (for examplequery_data_train,llm_data)- Missing files are logged as warnings, so routers can still initialize even if some data is optional
See Data and metrics for what those datasets look like.
Registries and naming¶
LLMRouter uses registries to map CLI router names to implementations:
ROUTER_REGISTRY(inference): maps--routername to router classROUTER_TRAINER_REGISTRY(training): maps--routername to(router_class, trainer_class)- Aliases are handled by registering multiple keys pointing to the same implementation (for example
routerdcanddcrouter)
Use llmrouter list-routers to see what your environment can actually load (especially when optional dependencies are involved).
Inference flow¶
Route-only (--route-only)¶
route_singleruns and returns a routing result.- CLI extracts
model_namefrom one of the expected keys. - Output includes a
routing_resultobject and the selectedmodel_name.
Full inference (default)¶
LLMRouter routes and then calls the selected model via an OpenAI-compatible endpoint:
route_singleruns and returns amodel_name.- The API model id is resolved from
llm_data[model_name].model(fallback:model_name). - The endpoint base URL is resolved in this order:
llm_data[model_name].api_endpoint(per model), elseapi_endpointin the YAML configcall_apisends the request via LiteLLM and returnsresponse.
Tip
If you only want routing decisions (no API calls, no secrets), always use --route-only.
Multi-round routers and answer_query¶
Some routers expose answer_query for an end-to-end multi-round pipeline. In that case, inference uses answer_query instead of calling route_single + call_api.
RouterR1 special case¶
router_r1 requires credentials inside YAML (hparam.api_base and hparam.api_key) and does not support --route-only.
Routing contract (what route_single should return)¶
Inference extracts the chosen model from one of these keys:
model_namepredicted_llmpredicted_llm_name
If you write a custom router, make sure route_single returns one of the keys above.