Twelve Labs

Multimodal video search infrastructure for your application

Twelve Labs is a multimodal AI technology designed to offer unprecedented understanding of video content, emulating human-level comprehension. The technology allows for various applications, including the ability to search through vast video libraries using natural language, generate descriptive texts about video content, and classify videos into categories quickly and accurately.

By providing APIs for intelligent video applications, Twelve Labs serves as a powerful tool for developers and businesses looking to harness the full potential of their video content — making it easier to find, analyze, and categorize videos based on their visual and auditory elements.

At the core of Twelve Labs’ offerings lies its state-of-the-art video foundation models, Marengo and Pegasus, which create rich video embeddings enabling downstream tasks such as search, text generation, and classification. These models can handle massive video libraries and are customizable, allowing users to fine-tune them for specific content domains.

This advanced AI is designed to replace traditional, time-consuming video tagging and analysis methods — offering scalability, world-class accuracy, and security. And by enabling more intuitive and effective use of video data, Twelve Labs aims to revolutionize how businesses and developers interact with video content — providing tools that can quickly interpret complex visual and auditory information.