Auto Sizzle Reel
Introduction
Back in 2020 I was working with HBO Max product innovation team on a research project to figure out if we could use AI to help automatically generate a sizzle reel from full feature films. I felt like this a great opportunity to work with a real customer to solve a real problem. I have been a big fan of HBO growing up and was very excited about the opportunity to work with them.
I was working at Warner Media as a Principal Architect and was given the opportunity to build an AI/ML platform called ContentAI with John Ritsema. You can say it was a bit ahead of its time. In this blog post I'll share some of the key insights and learnings from the project.
The Idea
How can we use AI to help automatically generate a sizzle reel from full feature films?
Goals
- Automatically generate a sizzle reel from a feature film
- Give the content creative services team metadata to easily find the scenes/moments they can use to stitch together the sizzle reel themselves.
Requirements
There are a few building blocks that make a good sizzle reel for our use case. The sizzle reels we wanted to generate had the following requirements:
Metadata
- Total duration should be around 60 seconds
- Shots should be no longer than 3 seconds
- Shots should have the following characteristics
- a celebrity in it
- a lot of movement
- no lip flap (actors talking)
- various shot types (establishing shots, medium shots, close-ups, etc.)
- various types of high visual effects
- various types of tags (kissing, family, action, etc.)
Sizzle Reel Construction
When designing a sizzle reel, we need to consider the following:
- The length of the sizzle reel
- The number of shots
- The type of scenes
- The order of scenes
- The transitions between scenes
We need to tell a story with the scenes. We want to keep the viewer engaged and wanting more. We do this by selecting scenes that build tension and then resolving them in subsequent scenes.
In the future we can tailor the sizzle reel generation to specific viewer's tastes.
The Extracted Metadata
We have a full feature film with no metadata that has time code for each shot. We needed to figure out what we use to get data from the film.
OpenCV Azure Video Indexer Amazon Rekognition Google Video Intelligence
We could used off the shelf models, or existing AI managed services from AWS, GCP, or Azure.