RouteProfile: Elucidating the Design Space of LLM Profiles for Routing

1University of Illinois at Urbana-Champaign
2Nanyang Technological University

Abstract

As the large language model (LLM) ecosystem expands, individual models exhibit varying capabilities across queries, benchmarks, and domains, motivating the development of LLM routing. While prior work has largely focused on router mechanism design, LLM profiles, which capture model capabilities, remain underexplored. In this work, we ask: How does LLM profile design affect routing performance across different routers? Addressing this question helps clarify the role of profiles in routing, disentangle profile design from router design, and enable fairer comparison and more principled development of routing systems. To this end, we view LLM profiling as a structured information integration problem over heterogeneous interaction histories and develop a general design space of LLM profiles, named \method, along four key dimensions: organizational form, representation type, aggregation depth, and learning configuration. Through systematic evaluation across three representative routers under both standard and new-LLM generalization settings, we show that structured profiles consistently outperform flat ones, query-level signals are more reliable than coarse domain-level signals, and generalization to newly introduced models benefits most from structured profiles under trainable configurations. Overall, our work highlights LLM profile design as an important direction for future routing research.

RouteProfile Design Space

We view LLM profiling as a structured information integration problem over heterogeneous interaction histories spanning model family metadata, domain coverage, benchmark evaluations, and query-level instances. RouteProfile characterizes the design space of LLM profiles along four key dimensions:

Organizational Form specifies how interaction histories are organized before integration. In a flat form, available information is directly concatenated into plain text or a single vector. In a structured form, relational information among models, tasks, domains, and queries is explicitly modeled through a graph, enabling richer relational reasoning.

Representation Type determines the information fusion mechanism. Text representations are produced by prompting an LLM to summarize neighborhood information into natural language descriptions. Embedding representations are dense vectors computed through neural message passing, capturing fine-grained semantic signals.

Aggregation Depth controls the scope of information integration within the graph, ranging from local (0-hop, no neighborhood) to multi-hop aggregation that incorporates higher-order context from the interaction graph.

Learning Configuration indicates whether the profiling process is training-free or trainable. In a trainable setting, the aggregation function can be optimized via self-supervised masked reconstruction on the interaction graph, learning to produce more discriminative model representations.

RouteProfile Design Space Overview

Overview of the RouteProfile design space. LLM profiles are constructed from interaction histories along four dimensions: organizational form, representation type, aggregation depth, and learning configuration.

Instantiated Design Choices

We present four concrete instantiations of the RouteProfile design space, each varying along the four dimensions. These profiles serve as representative configurations for systematic evaluation across routing settings.

Method Org. Form Repr. Type Agg. Depth Learning
Flat Aggregation Flat Text 0 Training-free
Embedding-Based GNN Structured Embedding Multi-hop Training-free
Text-Based GNN Structured Text Multi-hop Training-free
Trainable GNN Structured Embedding Multi-hop Trainable

Experiments

We systematically evaluate the proposed design space across three representative routers — SimRouter, MLPRouter, and GraphRouter — under both standard routing and new-LLM generalization (cold-start) settings. Experiments are designed to answer three research questions:

RQ1: Is routing quality limited by how candidate LLMs are profiled, rather than by router choice alone?

RQ1 routing performance table

Routing performance across different profile designs (RQ1). Structured profiles consistently outperform flat ones across all routers.

RQ1 aggregation hop effect

Effect of aggregation hop on routing performance. Depth helps overall, but its value depends on the profile design and router.

Routing performance depends strongly on how candidate models are profiled. Structured profiles consistently outperform flat baselines across all three routers for both text-based and embedding-based representations, demonstrating that routing quality is constrained not only by the router mechanism, but also by the quality of the LLM profiles themselves.

The effect of aggregation depth depends on profile design and router. In the training-free setting, increasing aggregation hops generally improves performance across both text-based and embedding-based profiles. However, in the trainable setting, additional hops benefit SimRouter while degrading performance in MLPRouter and GraphRouter — an effect we attribute to over-smoothing, which is more pronounced in discriminative routing contexts.

RQ2: Which data sources truly improve LLM profiles and which instead introduce noise?

RQ2 data source effect

Effect of data source on routing performance (RQ2). Query-level signal is a more reliable source than domain-level signal.

Query-level signal is a more reliable source than domain-level signal. Including query-level nodes yields more consistent improvements across all profile configurations and routers. The strongest results for both Text-GNN and Emb-GNN are achieved when task and query signals are combined, confirming that fine-grained interaction signals are more informative than coarse domain summaries.

Domain evidence is not a stable gain source. Adding domain nodes does not reliably improve routing and can even weaken otherwise strong profiles, suggesting that coarse-grained domain information is harder to integrate effectively and may introduce noise rather than useful discrimination.

RQ3: How do different profile designs generalize to newly introduced models under cold-start conditions?

RQ3 cold-start generalization

Routing performance under new-LLM (cold-start) setting (RQ3). Trainable GNNs achieve the strongest cold-start performance across routers.

Generalization to new LLMs requires structured and trainable profiles. Flat profiles yield near-zero cold-start performance across all routers, indicating that without relational structure, profiles fail to generalize to unseen models. Structured profiles outperform flat baselines on both average and cold-start performance, and trainable configurations further amplify this advantage — achieving substantially higher cold-start scores that are critical for new-model generalization.

Generalization depends on profile–router co-design. Cold-start gains are not realized uniformly across routers: GraphRouter achieves the strongest overall cold-start performance, while different structured profile families interact differently with SimRouter and MLPRouter. This suggests that effective routing on newly introduced models depends not only on having stronger profiles, but also on pairing them with routers that can exploit those profiles effectively.

BibTeX

@article{RouteProfile,
  author    = {Jingjun Xu and Hongji Pu and Tao Feng and Haozhen Zhang and Jiaxuan You and Ge Liu},
  title     = {RouteProfile: Elucidating the Design Space of LLM Profiles for Routing},
  journal   = {arXiv preprint},
  year      = {2025},
}