Published by Dusan Belic on July 24, 2025

Diffbot

Extracts structured data from websites using AI and machine learning

Diffbot

Diffbot is an AI-powered platform that extracts structured data from websites and maintains a Knowledge Graph with over 2 billion entities. It uses machine learning and computer vision to analyze web pages, offering APIs for data extraction, crawling, and natural language processing. Designed for developers, researchers, and businesses, it serves over 400 organizations, including Sequoia Capital and BuzzFeed.

The Knowledge Graph is Diffbot’s flagship feature, containing 246 million organizations, 1.6 billion articles, 3 million retail products, and more, with detailed fields like revenue, locations, and sentiment. The “Search” API allows querying this database for real-time data feeds, while the “Enhance” feature enriches existing datasets with additional details. The “Extract” API processes individual URLs, returning structured data without requiring custom rules. The “Crawl” API automates website scraping, transforming entire sites into structured databases. The NLP API extracts entities and relationships from unstructured text, supporting tasks like sentiment analysis.

Pricing operates on a credit-based system, with plans ranging from free to enterprise tiers. Each API call consumes credits, such as one credit per extracted page or 25 credits per Knowledge Graph record. Free plans include limited credits, while higher tiers offer discounted rates for larger volumes. Documentation is comprehensive, but the platform assumes technical expertise, which may challenge non-developers.

Competitors include Scrapy, an open-source scraping framework, and Octoparse, a user-friendly scraping tool. Scrapy is free but requires coding, while Octoparse offers a visual interface but lacks Diffbot’s Knowledge Graph scale. Some users report that Diffbot’s credit system lacks transparency without contacting sales, and complex queries can be difficult to master.

To use Diffbot effectively, start with the free plan to test APIs. Focus on the “Extract” or “Crawl” features for small projects, and explore the Knowledge Graph for broader research. Review the documentation thoroughly to understand credit usage and query syntax.

Diffbot Homepage

Categories Coding & Development

Video Overview ▶️

What are the key features? ⭐

Knowledge Graph: Contains over 2 billion entities, including 246M organizations and 1.6B articles, for querying structured data.
Extract API: Analyzes URLs to return structured data like article details or product information without rules.
Crawl API: Transforms entire websites into structured databases of products, articles, or discussions.
Natural Language API: Extracts entities, relationships, and sentiment from unstructured text.
Enhance API: Enriches existing datasets with additional details from the Knowledge Graph.

Who is it for? 🤔

Diffbot is best for developers, data analysts, and businesses needing structured web data for applications, research, or market intelligence. It suits those building AI-driven apps, conducting competitive analysis, or tracking news and sentiment, particularly in tech, finance, and media sectors.

Examples of what you can use it for 💭

Market Researcher: Uses Knowledge Graph to track company revenue and news for competitive analysis.
E-commerce Developer: Employs Crawl API to extract product details from retail sites for price monitoring.
Journalist: Queries articles via Search API to analyze sentiment on trending topics.
Data Scientist: Applies NLP API to extract entities from forum posts for sentiment studies.
CRM Manager: Utilizes Enhance API to enrich client data with organizational details.