The Spigot - Vol.2 - Making characters talk in AI, and Apple's experiment with AI-assisted design

The Spigot is our AI newsletter. This week we look at lip syncing with AI, and Apple's Keyframer AI tool for designers.

 min. read
August 6, 2024

In this issue:

  • Lip Sync: We use a stack of AI models to make a character talk!
  • Icons Easily Made in AI: Apple’s creative experiment with LLMs and design.

THE HOSE

Lip Syncing meets AI

Continuing the theme of the last newsletter, I’m still investigating character consistency. Pika’s new lip sync feature sounds promising as part of a pipeline like this:

  1. Prompt Midjourney with an image to get a consistent character across several images.
  2. Optional: take those images into a video editor and make simple dolly moves, and export as a video.
  3. Prompt Pika with this image or video to make that character talk in multiple shots.
  4. Edit those shots back into a movie to get something consistent.

Panda Lip Sync

Since we focus on stylized animation here at the Spigot, I tried our Panda from the previous newsletter.

On the first try, Pika said it couldn’t detect a face. So it seems that their facial landmarking is trained on realistic faces only. Fair enough.

I tried a few different pandas, but it’s clear that what it really wants is a photorealistic person.

Person Lip Sync w Panda Traits!

So I went back to Midjourney and asked it to make an image of a person wearing a panda hat instead, just to keep it on theme. Here she is:

I prompted Pika with this image and the result was much better. It detected her face well, except for the areas near the fuzzy edges of the costume.

The result of this was much cleaner. But it was a bit unsettling for two reasons:

  1. Nothing moved in the scene other than the lips. Not even the parts of the face that would normally accompany talking, like jaw and cheeks.
  2. The lips flap at impossibly fast speeds.

Let’s move the camera!

So then i took the Midjourney image still into Davinci Resolve and animated a slow dolly in. Pika smartly allows a video prompt. Prompting it with my dolly cam yielded much better results. Not only is the dolly more cinematic, but the face is a lot more animated.

Pikas lip sync is really effective for what it claims; it’s for lip syncing. It doesn’t animate the emotions or the face. And it doesn’t claim to animate stylized characters.

We are in the era of stacking AI solutions. So with more time, I could stack a bunch of things like runway to get a great input video that already has some emotion and movement to it, and then use this just for the lip sync portion.  You can see the potential here for certain applications, even if the early versions aren’t quite polished.

DRIPS & DROPS

Apple Keyframer

“Keyframer is an LLM-powered application we developed for creating animations from static images. Leveraging the code-generation capabilities of LLMs, along with the semantic structure of Static Vector Graphics (SVGs)”

As you may know, SVGs are vector-based, so they can be created entirely from text data.

Most popular AI image generators create frames that are generated with each prompt, so creative control consists of prompting and re-prompting. It’s a very unreliable way to tune visual output.

But what Apple did is provide a natural language input, and then use the LLM to inform the xml data to make variations of svg images. They all match the style of the original exactly, because of the use of the xml text. And as implied by the name, it can create animated sequences.

Over here at Aquifer, we are bullish on the hybrid approach; using AI to power renderers to then make a consistent output. This is a novel and useful way to leverage an LLM to create super consistent images for websites and interfaces.

The Spigot is also sent as a newsletter, you can join here to keep current on AI in cinema and animation:

Join The Spigot for free

Ready to start creating?
Talk with our team.

black and white image of man smiling with brown hair
Black and white photo of blonde woman smiling
Editorial

The Spigot - Vol.2 - Making characters talk in AI, and Apple's experiment with AI-assisted design

April 28, 2024

In this issue:

  • Lip Sync: We use a stack of AI models to make a character talk!
  • Icons Easily Made in AI: Apple’s creative experiment with LLMs and design.

THE HOSE

Lip Syncing meets AI

Continuing the theme of the last newsletter, I’m still investigating character consistency. Pika’s new lip sync feature sounds promising as part of a pipeline like this:

  1. Prompt Midjourney with an image to get a consistent character across several images.
  2. Optional: take those images into a video editor and make simple dolly moves, and export as a video.
  3. Prompt Pika with this image or video to make that character talk in multiple shots.
  4. Edit those shots back into a movie to get something consistent.

Panda Lip Sync

Since we focus on stylized animation here at the Spigot, I tried our Panda from the previous newsletter.

On the first try, Pika said it couldn’t detect a face. So it seems that their facial landmarking is trained on realistic faces only. Fair enough.

I tried a few different pandas, but it’s clear that what it really wants is a photorealistic person.

Person Lip Sync w Panda Traits!

So I went back to Midjourney and asked it to make an image of a person wearing a panda hat instead, just to keep it on theme. Here she is:

I prompted Pika with this image and the result was much better. It detected her face well, except for the areas near the fuzzy edges of the costume.

The result of this was much cleaner. But it was a bit unsettling for two reasons:

  1. Nothing moved in the scene other than the lips. Not even the parts of the face that would normally accompany talking, like jaw and cheeks.
  2. The lips flap at impossibly fast speeds.

Let’s move the camera!

So then i took the Midjourney image still into Davinci Resolve and animated a slow dolly in. Pika smartly allows a video prompt. Prompting it with my dolly cam yielded much better results. Not only is the dolly more cinematic, but the face is a lot more animated.

Pikas lip sync is really effective for what it claims; it’s for lip syncing. It doesn’t animate the emotions or the face. And it doesn’t claim to animate stylized characters.

We are in the era of stacking AI solutions. So with more time, I could stack a bunch of things like runway to get a great input video that already has some emotion and movement to it, and then use this just for the lip sync portion.  You can see the potential here for certain applications, even if the early versions aren’t quite polished.

DRIPS & DROPS

Apple Keyframer

“Keyframer is an LLM-powered application we developed for creating animations from static images. Leveraging the code-generation capabilities of LLMs, along with the semantic structure of Static Vector Graphics (SVGs)”

As you may know, SVGs are vector-based, so they can be created entirely from text data.

Most popular AI image generators create frames that are generated with each prompt, so creative control consists of prompting and re-prompting. It’s a very unreliable way to tune visual output.

But what Apple did is provide a natural language input, and then use the LLM to inform the xml data to make variations of svg images. They all match the style of the original exactly, because of the use of the xml text. And as implied by the name, it can create animated sequences.

Over here at Aquifer, we are bullish on the hybrid approach; using AI to power renderers to then make a consistent output. This is a novel and useful way to leverage an LLM to create super consistent images for websites and interfaces.

The Spigot is also sent as a newsletter, you can join here to keep current on AI in cinema and animation:

Join The Spigot for free

The Aquifer Blog

How to preview and export to pro-level video options with Aquifer

In today's competitive digital landscape, where audience attention is fleeting, you need to be able to get eye-catching content out quickly. That’s why Aquifer provides advanced camera and video settings that merely take seconds to select and export.

Read More
How to animate emotive characters with Aquifer

Generate complete emotive facial expressions and lip sync with just audio and selective tuning in just minutes.

Read More
Ready to start creating?
Try Aquifer for free.
Get access to the app start creating high-quality instant animation.
Download the app