Diffbot is an AI-powered platform that extracts structured data from websites and maintains a Knowledge Graph with over 2 billion entities. It uses machine learning and computer vision to analyze web pages, offering APIs for data extraction, crawling, and natural language processing. Designed for developers, researchers, and businesses, it serves over 400 organizations, including Sequoia Capital and BuzzFeed.
The Knowledge Graph is Diffbot’s flagship feature, containing 246 million organizations, 1.6 billion articles, 3 million retail products, and more, with detailed fields like revenue, locations, and sentiment. The “Search” API allows querying this database for real-time data feeds, while the “Enhance” feature enriches existing datasets with additional details. The “Extract” API processes individual URLs, returning structured data without requiring custom rules. The “Crawl” API automates website scraping, transforming entire sites into structured databases. The NLP API extracts entities and relationships from unstructured text, supporting tasks like sentiment analysis.
Pricing operates on a credit-based system, with plans ranging from free to enterprise tiers. Each API call consumes credits, such as one credit per extracted page or 25 credits per Knowledge Graph record. Free plans include limited credits, while higher tiers offer discounted rates for larger volumes. Documentation is comprehensive, but the platform assumes technical expertise, which may challenge non-developers.
Competitors include Scrapy, an open-source scraping framework, and Octoparse, a user-friendly scraping tool. Scrapy is free but requires coding, while Octoparse offers a visual interface but lacks Diffbot’s Knowledge Graph scale. Some users report that Diffbot’s credit system lacks transparency without contacting sales, and complex queries can be difficult to master.
To use Diffbot effectively, start with the free plan to test APIs. Focus on the “Extract” or “Crawl” features for small projects, and explore the Knowledge Graph for broader research. Review the documentation thoroughly to understand credit usage and query syntax.