The Spigot is our AI newsletter. This week we look at lip syncing with AI, and Apple's Keyframer AI tool for designers.
In this issue:
Continuing the theme of the last newsletter, I’m still investigating character consistency. Pika’s new lip sync feature sounds promising as part of a pipeline like this:
Since we focus on stylized animation here at the Spigot, I tried our Panda from the previous newsletter.
On the first try, Pika said it couldn’t detect a face. So it seems that their facial landmarking is trained on realistic faces only. Fair enough.
I tried a few different pandas, but it’s clear that what it really wants is a photorealistic person.
So I went back to Midjourney and asked it to make an image of a person wearing a panda hat instead, just to keep it on theme. Here she is:
I prompted Pika with this image and the result was much better. It detected her face well, except for the areas near the fuzzy edges of the costume.
The result of this was much cleaner. But it was a bit unsettling for two reasons:
So then i took the Midjourney image still into Davinci Resolve and animated a slow dolly in. Pika smartly allows a video prompt. Prompting it with my dolly cam yielded much better results. Not only is the dolly more cinematic, but the face is a lot more animated.
Pikas lip sync is really effective for what it claims; it’s for lip syncing. It doesn’t animate the emotions or the face. And it doesn’t claim to animate stylized characters.
We are in the era of stacking AI solutions. So with more time, I could stack a bunch of things like runway to get a great input video that already has some emotion and movement to it, and then use this just for the lip sync portion. You can see the potential here for certain applications, even if the early versions aren’t quite polished.
“Keyframer is an LLM-powered application we developed for creating animations from static images. Leveraging the code-generation capabilities of LLMs, along with the semantic structure of Static Vector Graphics (SVGs)”
As you may know, SVGs are vector-based, so they can be created entirely from text data.
Most popular AI image generators create frames that are generated with each prompt, so creative control consists of prompting and re-prompting. It’s a very unreliable way to tune visual output.
But what Apple did is provide a natural language input, and then use the LLM to inform the xml data to make variations of svg images. They all match the style of the original exactly, because of the use of the xml text. And as implied by the name, it can create animated sequences.
Over here at Aquifer, we are bullish on the hybrid approach; using AI to power renderers to then make a consistent output. This is a novel and useful way to leverage an LLM to create super consistent images for websites and interfaces.
The Spigot is also sent as a newsletter, you can join here to keep current on AI in cinema and animation:
Check out the latest Aquifer announcements
In this issue:
Continuing the theme of the last newsletter, I’m still investigating character consistency. Pika’s new lip sync feature sounds promising as part of a pipeline like this:
Since we focus on stylized animation here at the Spigot, I tried our Panda from the previous newsletter.
On the first try, Pika said it couldn’t detect a face. So it seems that their facial landmarking is trained on realistic faces only. Fair enough.
I tried a few different pandas, but it’s clear that what it really wants is a photorealistic person.
So I went back to Midjourney and asked it to make an image of a person wearing a panda hat instead, just to keep it on theme. Here she is:
I prompted Pika with this image and the result was much better. It detected her face well, except for the areas near the fuzzy edges of the costume.
The result of this was much cleaner. But it was a bit unsettling for two reasons:
So then i took the Midjourney image still into Davinci Resolve and animated a slow dolly in. Pika smartly allows a video prompt. Prompting it with my dolly cam yielded much better results. Not only is the dolly more cinematic, but the face is a lot more animated.
Pikas lip sync is really effective for what it claims; it’s for lip syncing. It doesn’t animate the emotions or the face. And it doesn’t claim to animate stylized characters.
We are in the era of stacking AI solutions. So with more time, I could stack a bunch of things like runway to get a great input video that already has some emotion and movement to it, and then use this just for the lip sync portion. You can see the potential here for certain applications, even if the early versions aren’t quite polished.
“Keyframer is an LLM-powered application we developed for creating animations from static images. Leveraging the code-generation capabilities of LLMs, along with the semantic structure of Static Vector Graphics (SVGs)”
As you may know, SVGs are vector-based, so they can be created entirely from text data.
Most popular AI image generators create frames that are generated with each prompt, so creative control consists of prompting and re-prompting. It’s a very unreliable way to tune visual output.
But what Apple did is provide a natural language input, and then use the LLM to inform the xml data to make variations of svg images. They all match the style of the original exactly, because of the use of the xml text. And as implied by the name, it can create animated sequences.
Over here at Aquifer, we are bullish on the hybrid approach; using AI to power renderers to then make a consistent output. This is a novel and useful way to leverage an LLM to create super consistent images for websites and interfaces.
The Spigot is also sent as a newsletter, you can join here to keep current on AI in cinema and animation:
In today's competitive digital landscape, where audience attention is fleeting, you need to be able to get eye-catching content out quickly. That’s why Aquifer provides advanced camera and video settings that merely take seconds to select and export.
Read MoreGenerate complete emotive facial expressions and lip sync with just audio and selective tuning in just minutes.
Read More