Using new advancements in connecting text and images, we helped a large media production company implement effective video search without the need for meta-data.
Our customers is a large media production company with a platform for creating and storing content. There are previous functionality to search for content, but they all required tags and meta-data. With the invention of CLIP, we wanted to implement a scalable video-search functionality that can search trough thousands of videos quickly.
The most challenging aspect of this project is the amount of data. Edisen is an established media production company that store an enormous amount of videos in their system. To process everything with a complex algorithm like CLIP requires a lot of computation. Also, searching works using vector multiplication and that demands memory.
We delivered two microservices that were easy to deploy in the current platform. One microservice for the algorithm and post-processing, and one search engine. To deal with the challenge of scale, we implemented dynamic sampling where we only store results for relevant frames and an approximate nearest neighbor search.