Chatbot Arena (previously known as LMSYS) is a platform designed for benchmarking large language models (LLMs) by comparing their performance in real-time, side-by-side. It allows users to interact with two anonymous models in parallel, enabling them to evaluate which model performs better for a given task.
After each interaction, the names of the models are revealed — providing transparency and insight into the AI capabilities of both open-source and proprietary models.
Chatbot Arena uses an Elo rating system — which is similar to what is used in chess — to rank the models. Each interaction contributes to the models’ ratings, helping establish a leaderboard that showcases the best-performing LLMs.
All this has made Chatbot Arena a valuable resource for developers and AI enthusiasts to understand which models excel in specific tasks — such as conversation, coding, or complex problem-solving.
Beyond traditional chatbot-style interactions, Chatbot Arena also evaluates models for other tasks, such as red-teaming and coding. Moreover, it provides an opportunity for users to test both closed-source and open-source models — including popular ones like ChatGPT and Claude. The system is continuously updated, with new models and features being added regularly.
While Chatbot Arena offers a fun way to compare LLMs, it also serves a serious purpose in AI development. By crowdsourcing evaluations, the platform helps democratize AI testing and provide insights into model performance. And that should deliver better AI experiences for everyone… which is a good thing.