Sonar by Perplexity is an API that integrates real-time, AI-powered web search into applications, delivering accurate answers with citations. Built on Meta’s Llama 3.3 70B, it processes queries at 1,200 tokens per second, leveraging Cerebras’ inference infrastructure for near-instant responses. The API offers two tiers: Sonar, a cost-effective option for quick queries, and Sonar Pro, designed for complex, multi-step questions with a 200,000-token context window and double the citations. It supports customizable source filters and JSON mode, making it developer-friendly. Sonar Pro achieved an F-score of 0.858 on the SimpleQA benchmark, outperforming OpenAI’s GPT-4 and Anthropic’s Claude.
The base Sonar tier is priced at $5 per 1,000 searches, with $1 per 750,000 input or output words, while Sonar Pro costs $3 per 750,000 input words and $15 per 750,000 output words. This makes it significantly cheaper than competitors, reportedly up to seven times more affordable. Companies like Zoom use Sonar Pro to provide real-time answers during video calls, enhancing user productivity. The API integrates with tools like LangChain and supports MCP server implementation for seamless AI workflows. Recent updates include three search modes (High, Medium, Low) to balance cost and performance.
Sonar’s real-time web access ensures answers are current, unlike models relying on static data. It excels in industries like healthcare and finance, where compliance and accuracy are critical. However, it may struggle with niche queries, returning vague responses, and Sonar Pro’s multi-search approach can lead to unpredictable costs. User feedback on platforms like Reddit highlights its speed and readability but notes occasional gaps in specialized research. Developers can access detailed documentation and start integrating within minutes.
For best results, use the base Sonar for quick factual queries and Sonar Pro for in-depth research. Monitor token usage to control costs, and leverage source customization for tailored results.