There's No Such Thing As Letters in Speech...
Notice how I kept putting the word
'sound' in quotes above? That's because a common mistake for beginners is to
associate LETTERS with SOUNDS.
Principle #1: Letters are not sounds. Sounds are not
letters. There are NO letters in lipsync animation.
They serve similar roles, but in wildly divergent forms. LETTERS
are representative symbols on a page (with a corresponding, arbitrarily assigned sound)
that, when strung together to form words, communicate a thought. But letters aren't made for speech. They're for
writing. And we're not animating writing, but speech. SOUNDS are utterances (with a corresponding arbitrarily assigned letter value used to transcribe the sound)
that, when interpreted as understood words, communicate a thought. Sounds are for speech, but serve no use in writing.
See the similarities and differences? So when you animate speech, don't
animate letters. There are no letters in speech, only sounds, and the shape
our faces take to make those sounds.
I know this sounds like an argument in semantics, but trust me, the distinction is very real. And when you learn to
approach lipsync animation from the perspective of animating sound shapes instead of letters, your world will
be a much brighter place.
So What Does that Mean For Animation?
Let's take a look at an example: the line "you hafta get" from the
November 2001 soundtrack takes about 25
frames to say. At first look, it seems like there ought to be the following
keys for the phrase:
Y (a pucker shape)
That is a very literal interpretation of what it takes to show a person
saying "you hafta get".
But if you go ahead and keyframe the lipsync that way, you'll soon realize that this
will result in a very poppy mouth when animated. Some of those
poses will be onscreen for only a single frame, which is too much information and not enough time for the viewer to interpret it.
A quick analysis will show that you go
from one mouth shape that is quite open (Ah in hafta) to a pretty closed one (the F in hafta)
and then back open again (for the end of hafta). The result is the mouth popping from
open to closed back to open in just 3 frames. That's not fun to watch, folks.
But What About My Letter..um... I Mean "Sound" Shapes?
beginners will make a 'phoneme' that is an exact replication of one's face
saying that single 'letter' in isolation. So we make E phonemes saying E by itself.
And we model "K" phonemes based off our own face in a mirror saying "kuh". At first that seems more than logical enough.
The problem with that is that
when you say the "t" sound by itself ('tuh'), your face doesn't look at all like it would
if you say something like "skate". And that "t" in 'skate' gives a face shape that is completely different
than the "t" sound shapes in "petstore". And THAT "t" is very different from the "t"
shape you make when you say "goatee".
Principle #2: Mouth Shapes for Sounds Must Be Animated In Context
By context I mean this:
The preceding sound shape affects the current sound shape.
Likewise, the following sound shape is anticipated in the current sound shape.
So the shapes shown must all be in context
with the shape/sound the preceds it and follows it. When you get stuck on
the idea of making all the "t" sounds in a soundtrack the same shape,
regardless of the prior or following sound/shape context in the dialogue,
then you're setting yourself up for a very poppy mouth when animated.
Remember Rule#1- animating speech is not animating letters. It's animating the *flow* of
shapes that are needed to make the present sounds within what's being
OK, Mr. Fancypants. So Just How Should I Animate Lipsync?
The better approach is to interpret speech, to grasp the essential elements of the communication
as recorded in the sound track. To "squint your ears" and try and pick up the overall feel of the speech.
Let's take a look at art history.
years up until the late 19th century, the effort in rennaissance art was the
meticulous and accurate recreation of reality. Realism was the goal, and literalism
in interpreting a painting was the norm. Then a bunch of artists got an idea about
capturing just the overall sense of an image. They became less interested in capturing
every leaf on a tree, but began to focus on how the light and shadow
and color hues projected that tree into another realm. This new realm of seeing was an interpretive realm where leaves
didn't matter as much as form, color, tone and contrast. At first these guys were
derided as lazy artists, too shiftless to bother with the details. But soon the world
got hold of these new paintings and were amazed to see such life and beauty where before
there was just leaves. The age of Impressionism was born, and we're all
the better off for it.
So how does that apply to us and lipsync?
Here's how: Just as the impressionist
painters got away from a literal realism in capturing a picture, we too need
to get impressionistic when it comes to lipsync animation.
Principle #3: Interpret the Lipsync Animation Like an Impressionist
If in your animation you can just get the major
impressions across you can let the little stuff slide if you
want. Just like the impressionist would hint at a cluster of leaves with a single daub of his brush,
you too should let words and sound shapes slur into the next word or sound shape. Mix the target
facial weights together to show a flow. Get away from showing leaves and start showing contrast and form.
Talking is more of a flowing thought than
an alliterative function of letters.
Impressionism Applied To Real Live LipSync...
Let's look again at our example phrase- "you hafta get".
A more impressionistic interpretation would be to
emphasize the following major accents:
Go ahead and say that out loud. "Ooo" as in "scoop", "aaFF" as in "after" and "Eh"
as in "pet".
Sounds alot like "you hafta get", doesn't it?
Now go one further.
Grab a handheld mirror.
Now, comfortably (ie: don't play
act or over emphasize it), just say "you hafta get".
Watch how your mouth
looks as you say it again.
Now, say "oo-aaFF-eh" a few times.
See how very close the two are in how they look?
You want another example of this
Say to your mirror "I love you".
Then say to it "Elephant
You never knew that the connection between la' mour and pachydermal
podiatry was this close!
The Devil is in the Details...
Let's take an even closer look at this from a lipsync animation point of view.
For the phrase "you
hafta get" there is one special pose along with two major open poses and two
major closed poses.
The special pose is the pucker/ooo at the beginning of
The first major open is the "aa" at the beginning of Hafta.
major open pose is the "Eh" of Get.
Likewise, the first major closed pose is
the FF of Hafta.
The second closed pose is the T in Get. (It's
not a true closed pose, but it's close enough for us to define it as such
because it is more closed than open.)
Anyhow, by choosing to do nothing more
than hit these opens and closes you can get nearly all you need. (heck, the Muppets have gotten by on that for 30+ years!)
These main target points
are like the broad brushes in an impressionist painting. They define shape, contrast,
form, direction. The details of texture come later with the specific choices you make on top of the
broad brushed open and closed pose shapes and timings. The
opens and closes are the foundation of your more specific choices.
Principle #4: Get the Opens and Closes Done Right and Build On Those
Even if all you ever do is properly hit
the opens and closes and wide shapes of the mouth at the right time you are
already more than 75% of the way to great lipsync. You can get alot out of
very little lipsync animation. And if you doubt it, animated properties with projected texture map mouths
like "Veggietales" have proven that this is indeed true.
Here's a breakdown of some specific choices...
You'll want to start by
letting the "Yuh" of You flow into the more open "aa" at the beginning of
Hafta. Skip the specific "ooo" at the end of You because it is not very
strong. It's there, but it gets said while the mouth is transitioning into
the beginning of hafta. Basically it slurs into the next word.
The H of
Hafta is burried in the back of the throat, so the lips don't really need to
show it. So skip showing a specific H target for it.
Picking up from the
moderately strong "aa" of hafta, hit the F for two frames to let it read.
It's the major closed point of the phrase, so that needs to line up and read
Then skip the ending "ah" of hafta altogether, as well as the G of
Get. Both happen under the breath, they're slurred under the transition from
FF to the Eh accent of Get.
Hit that last open pose of Eh.
Then end with an appropriately shaped nearly
closed mouth to catch the idea of a T.
You've basically now animated
Ooo-aaFF-Eht. And you know what? It's enough. And the best part is it flows, it feels
natural, and it doesn't pop.
There's Gotta be More. What about those T's and Stuff?
The short answer to this question is: don't sweat it unless you really need to.
I haven't at all addressed the tongue in any of this. But if your character has a tongue,
then you can get all the inner mouth sound shapes you need with that. The inner mouth
sound shapes are:
So add your tongue work in here, keeping it as impressionistic as everything else,
and you can handle the 'little stuff' quite easily. A good tip is to keep tongue movements
very quick. Don't have the tongue take longer than 2 frames to get from a position back to another,
unless you have a specific reason. Else wise it will look for all the world like your
character is saying the "LL" sound. The word "bad" turns into "bald". "Good" becomes "gold".
Keep the tongue light and quick, just like your wits.
Miscellaneous Tips & Tricks & Principles...
1) Don't go from wide open to closed in one frame and vice versa. Definitely
don't go from open to closed to open in 3 frames.
2) Don't hold a mouth shape static. An "Ah" shape should shift into a
slightly different "Ah" as it's being held.
3) Keep M's and F's for 2 frames. If it's tight, steal from the previous
4) Keep and eye on your targets and make sure they're not too linear in
going from one sound shape to the next.
5) Hit the sound shape at least 2 frames before the sound is heard. Even if you're
right on the nose, it will feel late when played at full speed. Humans see things faster
than they hear them, so we pick up our cues from the shape before the sound.
6) Break up the mouth angles. Shift the mouth up and down, tilt it left or right, get some snarls in there. Show emotion
as the character speaks. We can speak and smile, speak and frown, speak and yawn at the same time.
Built rigs that allow you to keep that kind of life in your lipsync animation.
7) Upper teeth do not move. They're nailed to your skull.
8) Jaws rotate, not slide, in chaarcters with clearly defined head/neck areas.
9) When building your sound shapes and facial controls, don't forget the cheeks and the nose!
The cheeks move when we speak, as does our nose. The cheeks and nose are the great connectors
in facial animation, crossing the bridge from mouth animation to eye and brow animation.
By keeping your nose and cheeks in the action you tie together the entire face of the character,
creating a far more believable character who can act.
10) Don't be afraid to go extreme. Avoid the Princess Fiona Final Fantasy Syndrome(tm).
Keep the energy of the sound track in mind when you're doing the mouth shapes. Louder sounds
with more energy should be shown with the mouth open wider, sound shapes more extreme. Watch TV
announcers talk. Those faces are movin' baby!
Before You Go...
I hope this has helped some. We've broken down one phrase for this paper and I'm sure
it all makes perfect sense now- for that one phrase. :o)
Now the trick for you is to learn how
to adapt this impressionist kind of thinking into other phrases, other animations,
other characters. Just try to keep in mind my four "Principles" that I've stated. If you
can keep those in mind then you're
well on your way to animating lipsync in a convincing, flowing manner that
will feel natural and have life.
Last of all, the best thing I can suggest is that you keep
practicing. My breakdown can get you going in the right direction, but
experience is the best teacher.