PersonalizedRouter: A Personalized LLM Router Based on User Preferences

Zhongjie Dai*, Tao Feng*, Jiaxuan You
University of Illinois Urbana-Champaign, Urbana, IL, USA
*Indicates Equal Contribution

Abstract

The growing number of Large Language Models (LLMs) with diverse capabilities and response styles provides users with a wider range of choices, which presents challenges in selecting appropriate LLMs, as user preferences vary across performance, cost, and response style. Current LLM selection methods typically optimize for a single fixed objective, such as performance, cost, or a trade-off between them, and neglect to learn user preferences from interaction data that includes task context, queries, candidate LLMs, and user decisions. To address these limitations in supporting users, we propose PersonalizedRouter, a graph-based framework, which models multiple users and performs personalized LLM selection by leveraging interaction data. To capture contextual information between user queries and optimal LLMs, PersonalizedRouter converts the interaction data into a heterogeneous graph, where the relationships between different types of nodes are represented by edges. To further assess the adaptability for multiple users, we design two strategies to simulate different user interaction data: the multi-cost-efficiency simulation strategy and the LLM-as-a-Judge strategy. The experimental results from two simulation settings demonstrate that our PersonalizedRouter outperforms existing LLM selection methods and surpasses the strongest methods by a large margin of 16.97%. Moreover, PersonalizedRouter exhibits few-shot learning capabilities, effectively adapting to new users and achieving at least 71.30% of the performance of the fully trained model.

Introduction

In recent years, the rapid growth of model scale and advances in training techniques have fueled the explosive emergence of large language models (LLMs), offering users diverse choices such as ChatGPT, Gemini, and LLaMA. Although large-scale language models have shown remarkable performance on many tasks, they tend to be inefficient when dealing with simple problems. In some scenarios, small-scale language models can achieve comparable performance while requiring fewer resources. Moreover, different LLMs excel at different tasks, exhibiting varying performance and cost efficiency on the specific application, and some domain-specific expert models achieve superior results in specialized tasks. In addition to differences in response quality and cost, LLMs also exhibit diverse response styles, which influence users’ understanding of the query. In multi-user scenarios, users often have distinct preferences that are difficult to directly model, making it challenging for a single LLM to serve all users consistently. Therefore, our paper aims to raise attention to this pressing research question: Given multiple user preferences, how can we design an LLM router that is personalized for each individual user?

To comprehensively evaluate the adaptability of LLM selection methods in multi-user scenarios, we design two simulation strategies that model diverse user behaviors to generate corresponding interaction data. Multi-cost-efficiency simulation strategy, which calculates a reward score for each response based on users' varying preferences between performance and inference cost. LLM-as-a-Judge strategy, which leverages a set of system prompts to instruct an LLM to simulate different user groups with various subjective preferences and select the best response from responses.

PersonalizedRouter

Overview of project

We first utilize the candidate LLMs to generate responses based on the multi-task dataset. Next, under two simulation strategies, we obtain the corresponding interaction data. As illustrated in the middle part, PersonalizedRouter transforms the user interaction data into a graph, where nodes represent the user, task, query, and LLM, and the edges capture the relationships between different node types. In the right part, We leverage a GNN to embed both node and edge features, updating and capturing the user’s hidden features. Ultimately, we select the optimal LLM from the predicted probability distribution.

Experiments

Overview of project
Overview of project

We conduct experiments using interaction datasets generated by these two strategies. The experiment results show that PersonalizedRouter significantly outperforms existing LLM selection methods. Our approach achieves at least a 16.97% advantage compared to the best baseline. To simulate real-world scenarios where new users continuously join the system, we introduce a new user experimental setting, in which interaction data from new users are excluded from the training process. The experimental results demonstrate that PersonalizedRouter also generalizes well to new users, as it outperforms the best baseline by at least 75.00% and achieves 71.30% of the performance of the trained model, validating the effectiveness of our framework.

BibTeX

TODO:BibTex Code