
Formerly called Chatbot Arena, LMArena is a platform designed for benchmarking large language models (LLMs) by comparing their performance in real-time, side-by-side. It allows users to interact with two anonymous models in parallel, enabling them to evaluate which model performs better for a given task.
After each interaction, the names of the models are revealed — providing transparency and insight into the AI capabilities of both open-source and proprietary models.
LMArena uses an Elo rating system — which is similar to what is used in chess — to rank the models. Each interaction contributes to the models’ ratings, helping establish a leaderboard that showcases the best-performing LLMs.
All this has made LMArena a valuable resource for developers and AI enthusiasts to understand which models excel in specific tasks — such as conversation, coding, or complex problem-solving.
Beyond traditional chatbot-style interactions, LMArena also evaluates models for other tasks, such as red-teaming and coding. Moreover, it provides an opportunity for users to test both closed-source and open-source models — including popular ones like ChatGPT and Claude. The system is continuously updated, with new models and features being added regularly.
While LMArena offers a fun way to compare LLMs, it also serves a serious purpose in AI development. By crowdsourcing evaluations, the platform helps democratize AI testing and provide insights into model performance. And that should deliver better AI experiences for everyone… which is a good thing.
Hoody AI
Provides anonymous access to multiple LLMs in one dashboard.
Sidekick AI
Automates meeting bookings via email forwarding and smart pages
Private LLM
Runs local AI chatbots offline on Apple devices, ensuring privacy
Chat 4O AI
Generates images, videos, and assists with AI-powered chats and tasks
AI Perfect Assistant
Integrates AI into Office apps for instant content generation and editing
PromptLayer
Streamlines prompt engineering through visual management and evaluations