Test large language models (LLMs) by comparing their performance in real-time, side-by-side
Formerly called Chatbot Arena, LMArena is a platform designed for benchmarking large language models (LLMs) by comparing their performance in real-time, side-by-side. It allows users to interact with two anonymous models in parallel, enabling them to evaluate which model performs better for a given task.
After each interaction, the names of the models are revealed — providing transparency and insight into the AI capabilities of both open-source and proprietary models.
LMArena uses an Elo rating system — which is similar to what is used in chess — to rank the models. Each interaction contributes to the models’ ratings, helping establish a leaderboard that showcases the best-performing LLMs.
All this has made LMArena a valuable resource for developers and AI enthusiasts to understand which models excel in specific tasks — such as conversation, coding, or complex problem-solving.
Beyond traditional chatbot-style interactions, LMArena also evaluates models for other tasks, such as red-teaming and coding. Moreover, it provides an opportunity for users to test both closed-source and open-source models — including popular ones like ChatGPT and Claude. The system is continuously updated, with new models and features being added regularly.
While LMArena offers a fun way to compare LLMs, it also serves a serious purpose in AI development. By crowdsourcing evaluations, the platform helps democratize AI testing and provide insights into model performance. And that should deliver better AI experiences for everyone… which is a good thing.
Test large language models (LLMs) by comparing their performance in real-time, side-by-side
Visit LMArena ↗
ChatGPT
All-round AI assistant generating human-like responses to user queries and tasks
Gemini
Generates responses from text, images, audio, and video inputs using advanced multimodal AI
Claude
Assists users in reasoning, coding, writing, and analyzing data with advanced AI models
Grok
Delivers witty, real-time AI responses with advanced reasoning and image generation
DeepSeek
Delivers advanced AI models for coding and reasoning at low costs
Perplexity
Delivers cited AI answers from web searches instantly