What is Which Frame??
This application allows users to search through video content by describing the visual elements they wish to find, leveraging the power of CLIP - a state-of-the-art neural network that can understand and relate natural language to visual data. Users can input a textual query, such as "a person with sunglasses and earphones", and the system will analyze the video frames to identify the one(s) that best match the description, presenting the relevant timestamp(s) to the user
Highlights
- mantic video search using natural language queries
- CLIP-based neural network to understand and match visual content
- Ability to locate specific objects, people, or scenes within video
- Supports a wide range of query types, from simple object descriptions to more complex visual scenarios
- Provides timestamp information to quickly navigate to the relevant sections of the video
Platforms
- Web

