Virtual Body Language
by Jeffrey Ventrella
|In the summer of 1997, Will Harvey sent me a fax from Switzerland describing a vision he had for a virtual world. He was on a ski trip, and that day he had been riding on a ski-lift, and chatting with a charming young woman. The whole sensorium—the magnificent view, the fun, bouncy mechanics of the ski-lift, and of course, the charming companion—left an impression in Will's mind. Will is a game developer and computer scientist who had his first success at a young age and went on to become a Silicon Valley entrepreneur. In 1995 Will hired me to work at Rocket Science Games in San Francisco. The dot com bubble was quivering, and two years later Rocket Science went out of business. That’s when Will had his vision on the slopes, and that’s when he asked me to join him in prototyping a virtual world that was later named "There". In the following years, Will and I were to have many creative coding sessions and illuminating conversations about avatars, virtual worlds physics, and emergent gameplay.|
I recall sitting down to write the very first line of code
for the prototype. My first goal was to get something to
show up on the screen that represented the user.
"Start simple", I thought. "This is going to be a long
I started by whipping together a 3D
environment—the proverbial flat horizontal grid against
a black background—a rather cliché image, symbolizing
the sleek geometrical perfection of cyberspace. I had
always considered those vast flat grids extending
out into infinity as symbolizing the loneliness of
virtual reality. In the back of my mind, I had a feeling
that this lonely grid would be temporary.
My first avatar
To represent the user, I created an upright wireframe
cylinder standing on the horizontal grid (it was so
crude that it had only six sides!) I set the point of
view (the virtual camera) in this 3D world at a
location next to the cylinder, slightly raised and
aimed slightly downward—your basic third-person view.
You may ask: how much is there to say about moving a
cylinder around on a flat plane? Actually,
representing a user as a cylinder on a horizontal
surface comes with a load of design considerations.
For instance, how fat? How tall? How should the user
tell the cylinder to move forward, or turn?
If it is controlled with the mouse, what mouse
motions should control moving forward versus turning
around? And how fast should these motions occur as a function of mouse movement? Can the cylinder tilt on its vertical axis? Should the viewpoint stay fixed in space or should it follow the cylinder? If so, how closely? These were fundamental questions that played an important role in making the cylinder “feel like me”. I wanted it to be easy and natural to move around in the world. These fundamental first principles of navigation, it turns out, remained just as important as the avatar evolved into a human form in the years that followed.
Let’s drill down into this a bit more. Why might one start with a cylinder as opposed to some other simple shape, such as a sphere or a cube? Think about the human body for a moment. We are bipedal mammals who occupy a mostly vertical volume when we stand or walk. A vertical volume…with the possible exception of people who consume sufficient numbers of McDonald’s hamburgers so as to approach spherehood. Human eyes and brains understand the world most easily with a roughly horizontal line of sight, head upright. The three semicircular canals of each inner ear are oriented approximately orthogonal to each other, and each corresponds to one of three axes of head rotation. The lateral (horizontal) canal actually becomes perpendicular to gravity and parallel to the ground when the head is pitched down 30 degrees as if to be looking at the ground a few meters in front of the feet while walking or running. Just another one of the many reminders that humans are walking machines.
A vertically-oriented navigation system comes naturally to bipedal mammals. A cylinder represents this verticality, as well as having rotational affordance, which is manifest in the roundness about the vertical axis.
The Social Life of a Cylinder
I’m not ready to come back to the slopes of the Matterhorn and the charming ski-lift companion just yet. First I want to do a thought-experiment to examine the limits and possibilities of avatar body language. For this thought experiment I want to greatly reduce the degrees of freedom of the avatar in order to point out the many ways of generating body language, even with a limited vocabulary of movement. This is to help prepare us to explore the entire realm of body language, and it will also provide the context for reviewing a bit of avatar history.
Consider a virtual world with text chat as in a typical instant-messaging application. The avatars are cylinders, which have “noses” (or some feature that serves to indicate where the front side of the avatar is as it rotates).
Locomotion affordance in the sphere, cylinder, and cylinder with “nose”
You can move these cylinders around on a horizontal surface and turn them in place - as if they were being rolled around on coasters. Given this constraint on avatar movement, what kinds of body language can we possibly create? At first you might think, "None! How can you have body language if you have no moving parts?” But in fact, there is quite a bit of fundamental body language that you could create, such as:
1. moving up to another avatar and facing it while you are chatting with it
2. NOT facing the avatar while you are chatting with it (to express aloofness, shyness, indirection, discreetness, etc.)
3. standing next to the avatar you are chatting with but facing in the same direction as the other avatar, as if you were both watching the same scene (a behavior that is seen in men more than in women)
4. turning completely in the opposite direction from the avatar you are chatting with and deliberately facing away
5. standing in place and rotating (perhaps to express that you are completely out of your mind)
6. revolving around another avatar (to annoy, get attention, or make that avatar feel trapped)
7. repeatedly moving towards and away from the avatar you are chatting with
8. standing abnormally far away from the avatar you are chatting with, or conversely, standing extremely close
9. occasionally turning and facing a third avatar that you are talking about as a form of referencing
10. standing and watching a distant avatar which is moving along, and continually adjusting your rotation so you are always facing that avatar, as if to be watching it
As you can see, not having an articulated body doesn't mean that you can’t have nonverbal communication. In fact, we could probably come up with more examples than the ones listed. Furthermore, arbitrary behaviors can easily take on new meaning above and beyond the natural signals that animals create with bodily proximity, relative motion, direction, etc. And so the possibilities are essentially endless.
I have just described a fundamental base level vocabulary of body language that can be used no matter what kind of avatar you have (as long as it has a horizontal heading and as long as you can move it and rotate it around its vertical axis). A lot of this stuff falls within the realm of Proxemics: the study of measurable distances between people and the associated interactions and effects. The birth of shared virtual worlds has added a whole new dimension—and experimental playing field—to the study of proxemics. Some virtual world researchers have observed a sensitivity to personal space—the degree of comfort that users feel based on proximity to other avatars, and how close an avatar has to be in order to make a user uneasy. Gender and culture have been found to be factors in a phenomenon called “invasion anxiety” (Nassiri et al. 2008).
Back in 1995 one of the only online virtual worlds being publicly used was Worlds Chat. Spontaneous forms of body language were invented by its dial-up users. There were no articulated gestures; avatars were like chess pieces. As Bruce Damer reminisces, “My first ‘in-world’ experience was ever so timidly moving up to another group of avatars in the corner of the hub just after teleporting in. I kept back from them and typed a few words to engage in conversation. I had no idea what the rules of body contact were or whether I would be rude to interject my virtual body in between conversants. Others had no idea either so this was an interesting circumstance. Soon people (probably younger than me) were whizzing around, passing right through others' avatars (there was no collision active). So the social conventions were established for body language: i.e., it was OK to move through other people but not pausing your avatar right up front of someone's view (all perspective was first person so this would occlude someone's view completely)” (Damer 2010). Right from the start, as Damer points out, users were working out conventions of body language, basically figuring it out as they went along.
Not surprisingly, gross-motor body language was used widely in these early worlds. In Onlive Traveler (now Digital Space Traveler), a virtual world that included voice chat, avatars were originally represented as floating heads, and users communicated using real-time voice. The audio signal of the collection of users’ voices was received in stereo, and the positioning of avatars in conversation groups affected both the visual and the aural experience. Users could move these heads forward and back using the up and down arrow keys on the keyboard, and turn left and right using the left and right arrow keys. They could also pitch up and down using page-up and page-down keys. Several forms of body language emerged spontaneously in the community, including coming together in chat circles, standing close or far apart, expressing yes and no (by alternating page-up/page-down, and left/right keys), and even turning the head completely upside down to say “my user is away”.
Floating avatar heads in Online Traveler
Inner Ear Disease and Avatar Motion
As mentioned earlier, the three semicircular canals in the inner ear correspond to the three rotational axes commonly used in computer graphics and engineering (x, y, z—often referred to as yaw, pitch, and roll). Each canal is sensitive to a specific axis of rotation in the head. The fluid in these canals gets swished around in complicated ways, and the tiny hairs sense the flow of the fluid. The system works pretty much all the time, and so it is usually invisible to us, thanks to the tight integration of our vestibular systems with our eyes and our motor systems. We humans are accustomed to turning our heads left and right (shaking the head no), and pitching the head up and down (nodding yes). The vestibular system is well-equipped to process these two rotations. These are salient head rotations—nearly universal in human expression of the binary opposites. These rotations are a bit more universal than waggling the top of the head side-to-side like an inverted pendulum—used widely in Indian culture to express something like “sure…okay”. If you are Indian, you may argue that all three rotations are equally important for body language.
One morning, after a night of too much drinking, and a wicked, week-long sinus infection, I turned around in my bed and felt something strange. I woke up to find that the light fixture hanging from the ceiling was jumping to the left wall, sweeping across the ceiling to the right wall, and then repeating, again, and again, and again. I watched in terror until the motion subsided. The repetitive sweeping was caused by nystagmus (eye-beating), an involuntary behavior triggered by signals from the inner ear—the same thing that causes dizziness after spinning in place. What I had was BPPV, a benign (and fortunately for me, temporary) inner-ear malfunction. It is caused by tiny crystals (otoconia) getting dislodged, which then sloshing around against the hairs of the semicircular canals. The cure came in the form of an amazingly low-tech procedure called the Epley maneuver, which basically turns your head into a Rubik’s Cube (specific changes in head orientation, done in just the right order, cause the otoconia to settle back into their usual home in the labyrinth). The doctor who performed the maneuver instructed me to avoid nodding “yes” for a week after the maneuver, as this would disturb the otoconia. This can be a difficult chore for someone who strives to be a positive person.
I learned quite a lot about the vestibular system from this experience. I didn’t realize it at the time, but this experience had an effect on the way I coded avatar motions. The next stage of evolution from the cylinder-avatar was what Will and I affectionately called “Ice Cream Cone Man”, initially inspired by some of the characters that Will had developed for a game years before. Ice Cream Cone Man had a pointy bottom, and a wider top; he was an inverted cone. And on top of the cone was a sphere: the head. The pointy bottom represented a new degree of freedom—the avatar could tilt away from its vertical axis. We had a special interest in this degree of freedom because we wanted to be able to move fast in the world and to “lean-in” to our turns, lurching and banking. And my own personal inner ear experience made me acutely aware of the fact that throughout our lives, our bodies and heads are constantly moving, turning and tilting, and we manage to not fall down constantly. For me, Ice Cream Cone Man's new degree of freedom represented an important part of bodily experience.
I gave Ice Cream Cone Man an angular spring force that caused it to pop back upright if it was tilted over. A familiar meme comes to mind for a popular children’s toy: “Weebles wobble but they don’t fall down”. To increase wobbliness, I could adjust this force to be weak (making it act drunk—slow to correct itself), or I could adjust it to be strong (upright and uptight). Tuning this force to be somewhere in-between made it just right. After adding a few of these degrees of freedom, and adjusting the forces, things started to get a lot more physical and fun for our dear little avatar.
Super Mario 64
Will brought in a Nintendo 64 game console, and we studied the user navigation and 3D camera behavior for Super Mario 64 (this game was very influential in our thinking, as well as for many 3D game developers at the time). Our goal in building an avatar was to make it easy and fun to zoom around in a make-believe world (like Super Mario). Super Mario 64 (© Nintendo)
Ice Cream Cone Man experienced another spike in evolutionary history—rudimentary Newtonian Physics. I turned the cone into a torso-sized ovoid and raised it above the grid plane, as if it had compressible legs beneath, pushing it up so it would stay at the level of the pelvis. It still moved forward, turned left and right, and tilted. But now, when walking or running, it had a subtle up-down bobbing motion, roughly corresponding to gait. I also added more degrees of physical simulation to the avatar’s body. It now had a full 3D position and a velocity, and responded to frictions and gravity. Moving forward now meant applying a linear force in the avatar's forward direction. Turning left and right meant applying an angular force around the avatar's local up axis (with a bit of Ice Cream Cone Man sideways tilt—if moving forward, just for fun).
Primitive avatar locomotion acquires more degrees of freedom
The bobbing motions on the torso were to create the effect as if there were walking legs underneath, and the motions were timed to an imaginary gait. Rather than try to simulate the actual physics of legs walking (which is extremely complex), I animated simple walking legs using a technique called inverse kinematics: a method for adjusting joint angles (like knees) so that specific endpoints (like feet) move to specific locations (like the ground). The feet stayed roughly underneath the body, and took steps when the avatar moved horizontally, and the hips stayed attached to the torso, and, most importantly, because of inverse-kinematics, the knees bent just right so as to keep everything else looking natural.
|The tall, gangly Abraham Lincoln was once asked, "How long are your legs?" His answer: "Long enough to reach the ground". This is how the legs worked for the avatar. They weren't part of the physical simulation—they were merely animated geometry. The reason for this abstraction was to stay focused on the overall sensation and goals of moving around in the world—which usually doesn't include minutia of legged ambulation. We wanted to keep our simulation goal-oriented. This idea will be revisited later in the chapter, “Seven Hundred Puppet Strings”.|
The bobbing motions on the torso were to create the effect as if there were walking legs underneath, and the motions were timed to an imaginary gait. Rather than try to simulate the actual physics of legs walking (which is extremely complex), I animated simple walking legs using a technique called inverse kinematics: a method for adjusting joint angles (like knees) so that specific endpoints (like feet) move to specific locations (like the ground). The feet stayed roughly underneath the body, and took steps when the avatar moved horizontally, and the hips stayed attached to the torso, and, most importantly, because of inverse-kinematics, the knees bent just right so as to keep everything else looking natural. The tall, gangly Abraham Lincoln was once asked, "How long are your legs?" His answer: "Long enough to reach the ground". This is how the legs worked for the avatar. They weren't part of the physical simulation—they were merely animated geometry. The reason for this abstraction was to stay focused on the overall sensation and goals of moving around in the world—which usually doesn't include minutia of legged ambulation. We wanted to keep our simulation goal-oriented. This idea will be revisited later in the chapter, “Seven Hundred Puppet Strings”.
|I gave the head special treatment, as far as body parts go. One technique I used (which is also used in other avatar systems) causes the head to turn before the body turns. I used a simple physics algorithm for this: when the user rotates the avatar, it doesn’t immediately start turning. The avatar has a bit of rotational inertia, and so it ramps up its rotation within the first fraction of a second when the user starts rotating, and after the user stops rotating, it ramps-down for a fraction of a second. (The same applies for the avatar’s position in space: translational inertia). These effects come for free when a physics model is used, and the effects of friction and mass can be delicately tweaked to make avatar navigation intuitive and satisfying (again, inspired by Mario).|
Now consider that the head is much lighter than the body, and that it is also where the brain and eyes are. So it makes sense to give the head a bit less simulated mass (or none at all), so it can turn more quickly and responsively when the user changes rotation via the mouse or keyboard. The net effect is that the avatar seems to “anticipate” the turns. This was but one of the many techniques I used to imbue the avatar with some sentience.
Since I didn’t want my dear little avatar to experience vertigo, I applied a stabilizing force on the avatar's head. As the avatar zoomed around the world (banking, bobbing, weebling and wobbling), its head stayed upright, as if to be always trying to keep a stable view onto the world. Imagine what happens when you pick up a doll, an action figure like GI Joe, or a Barbie doll. If you wag the doll around in the air, the head stays rigidly-oriented on the torso. But if the doll were alive, it might tilt its head so as to keep its view on the world stable as you wobbled it around, as shown at the bottom of this illustration:
Keeping a level head creates the illusion of sentience
Holding a doll that adjusts its head as if it were alive might seem a tad macabre, like a scene from the Twilight Zone. But this image provides an example of one technique for making animated characters appear more sentient. A feature of living things is the desire to keep a stable perspective on the world. The way an avatar holds its head can create an illusion of sentience (which is a prerequisite for expressiveness).
|Holding the head upright is a variation on the theme of holding the body up, which is a variation of the erect verticality of the human posture. Slithering horizontally like a lizard is one way to come across as non-human-like. Slumping in the presence of others is bad form. The postures of attractive, persuasive, and charismatic people are often perky and vertical. Thus, the visual language of verticality, perkiness, and upright posture can be designed into avatar systems, just as they are in film character animation, to create different personalities and moods.|
But enough with this heady discussion. It’s time to move on to the next stage of avatar evolution, and to bring the avatar into the realm of standard character animation.
Full body Articulation
The last important phase of anatomical evolution was creating a way to represent the motions of the parts of the avatar with one overarching scheme. The avatar already had ambulating legs and an articulated head. Soon our avatar had acquired two simple arms, and a few spine joints. Now the avatar had joined company with others in the world of standard character animation: it acquired a hierarchical skeleton. In hierarchical modeling, a “root node” or “parent node” provides the mathematical coordinate system from which all the “child nodes” rotate, in trickle-out fashion. So you could characterize the entire configuration of your body at any particular time as a list of the rotations of all your joints.
The typical modern avatar has a root, pelvis, left hip, left knee, left ankle, right hip, right knee, right ankle, torso, chest, neck, head, left clavicle, left shoulder, left elbow, left wrist, right clavicle, right shoulder, right elbow, right wrist. That makes a total of 19 joints. Some systems have fewer joints, some have more. The Second Life avatar uses this set of 19. The segments between these joints are often called “bones”. Standard character animation is accomplished by dynamically translating and rotating the root joint in the global coordinate system, and also dynamically rotating all the joints in their local coordinate systems (parent-relative). So, for instance, when you walk your avatar around, you are translating and rotating the pelvis joint in the world coordinate system. As the avatar walks, the various joints branching off of the pelvis are rotated in their local coordinate systems according to a looping walk cycle.
But as we will see in chapters to come, it is not quite enough to use plain hierarchical modeling to represent the many kinds of motion required for sentience, expressivity, and interaction with the environment. Head stabilizing and leg inverse kinematics are just two examples of procedural animation: the various techniques that employ running software to animate, adjust, and modify motion. This is all for the sake of laying the groundwork for building a virtual human that is not only responsive to the environment, but also responsive to other virtual humans.
For many computer game or virtual world designers, it would be considered overkill to simulate the minute mechanisms of eyeballs and eyelids in their characters. It may seem a bit much to give each eyeball a full rotation matrix, or to control eyelids with variable levels of openness, or to map eyelid geometry to conform to a radius slightly greater than that of the eyeball, or to render the iris and pupil positions and rotations according to their corresponding eyeball rotations. On the other hand, this is completely justified for any avatar system that aims to be socially oriented, especially if it involves camera close-up views. For the There.com avatar prototype, the mathematics of eyeball rotation was implemented because direction of eye gaze creates a visual signal that is easily detected, and critical for clear communication. In fact, from the standpoint of avatar expressivity, an avatar skeleton representation could easily include eyeballs as joints (and this is sometimes done in character animation systems). As skeletal joints go, eyeballs would rank pretty darn high on the expressivity scale.
The facial rendering of the avatar for There.com went through several iterations. In design meetings, I advocated a more cartoony, flat-shaded style in order to allow the facial features that are important for expression to be easy to read. Also cartoon faces can do expressive things that real faces cannot.
Prototype avatars with cartoon-shading
But my prototype avatars were so cartoonlike that others found it hard to "inhabit" the avatar as themselves. Some developers experimented with making avatars that were much more realistic, with full texturing and shading. We settled on a compromise. Similar to a rendering style made popular in Pixar’s films, we allowed some shading, but ‘ambient light’ was turned way-up, which flattened the rendering a bit, allowing the communicative facial delineators like eyebrows, eyelashes, lips, etc. to show up as nice crisp features.
A Colorful Evolution
There.com’s avatar had a very colorful evolution—so many detours down side roads, procedural meanderings into ragdoll hell, cross-eyed bugs, exploding hair physics, and spring-loaded penises. There were also many explorations into sentience enhancements and algorithms for holding hands and other forms of avatar-to-avatar contact. Many of these inventions never made it into the final product. Such is the nature of prototyping. Below are some screenshots taken throughout the avatar’s many phases.
Prototype avatars using spring physics for There.com
Examples of prototype avatars for There.com
Rigging Avatars for Embodied Communication
At There.com we had debates like: "Are we building a game? Or are we building an open-ended virtual world?" Where the debate settled was that it wasn't a game: it was to be an open-ended virtual world that was socially-focused. My early prototype work in avatar expression was part of the motivation for this decision. Tom Melcher (the CEO during the company’s high-growth years) used the term "Avatar-Centric Communication" to distinguish what we were doing from the existing paradigm of online virtual world communication. The trend at the time was to use disembodied text chat, with little or no connection to the 3D world that the avatars occupied. We were aiming for something new and different.
Richard Bartle, author of Designing Virtual Worlds, co-wrote the first multi-user-dungeon (MUD), a text-based virtual world, back in 1978. He has been a prominent authority on virtual worlds, especially in regards to the text-based variety. In describing the field of Computer-Mediated Communication, he says that to a specialist in the field, virtual worlds are important “…because they are media, not because they are places” (Bartle 2004). Good point…however, I’m not quite sure about his claim about “place”. Now that visual virtual worlds have become established, and input devices like the Wii are bringing our bodies more intimately into these immersive spaces, virtual embodied communication is bringing “place” onto center stage. For someone like myself who came of age working on fully visual virtual worlds, the sense of place is central to experience design. This is why Will and I named our virtual world “There”. It is a virtual place where the spatial aspects of bodily interaction are critical to the experience.
Avatar scholar Ralph Schroeder claims that text communication does not qualify as a form of Virtual Reality because it does not enhance—but rather, detracts from—the sense of presence and copresence. However, he acknowledges that text-chat is such a widely used mode of communication in online virtual worlds that it cannot be ignored in research on avatar communication. For the same reason, I had decided to tackle the problem of text chat in virtual worlds. Text chat was not going away; virtual worlds appeared to be growing around, or in partnership with, the already existing modalities of instant messaging and text-based virtual worlds. So the problem became: how can we build an avatar system that merges the verbal dynamics of text conversation with the sensory matrix of the 3D world in an intuitive way?
I do agree with Bartle’s claim that text-based virtual worlds are more about imagination, and that graphical virtual worlds are more about the senses (Bartle 2004). I will always be imaginatively stimulated by a great novel, and possibly even more so in an online, object-oriented text-based virtual world with all the setting, character, and plot of a great novel except that it is happening in realtime all around me and I am one of the characters. But complications arise when a visual medium starts to build itself around a textual medium. To handle this media convergence, some design expertise can help.
Enter Chuck Clanton. In his busy life, Chuck has been a psychologist with a Ph.D., a medical doctor, a game designer, and a marble sculptor. Chuck and I became the two primary designers of Avatar-Centric Communication. In April 2003, Chuck and I gave a lecture at Stanford University (Clanton and Ventrella 2003), which was part of the Human-Computer Interaction Seminar Series, hosted by Terry Winograd. A few of the early Linden Lab engineers who created the avatar system in Second Life attended the lecture. After our presentation, they complimented us on our design ideas. They had implemented their own variations in Second Life. (In those days, the memes were flying between developers of competing startup companies like locust swarms). Since Linden Lab was culturally averse to in-house interaction design (the ugly underbelly of their successful open-ended user-generated philosophy), they didn’t invest a lot of design time into these features. As a consequence, many of the avatar expression features were left undone, or under-developed. But despite this fact, and even though we at There had developed such expressivity in our avatars, Second Life had begun to build steam. Their virtual economic business model and the ability for users to customize the world to such a great extent were compelling prospects for potential success.
At There, Chuck and I had developed several techniques for adding layers of non-verbal communication to our socially-focused virtual world. I contributed several components including the automatic formation of chat groups based on the spatial arrangement of avatars. Chat balloons appeared over the avatars’ heads as a way to make communication more embodied. I also developed techniques for controlling gaze between avatars in the group. Below is an illustration from an early prototype demonstrating gaze, as well as the early version of the chat balloon system.
An early prototype for Avatar-Centric Communication
The illustration also shows a user interface referencing the function keys of a standard keyboard, for triggering expressions. These keys were adapted as puppeteering triggers for body language. I had hoped that we could define a standard for using special keys on the keyboards for puppeteering avatars. This was inspired by a music education software package I once saw that came with an overlay for the computer keyboard that turned it into a simple piano. We had also explored the idea of using different “palettes” of expressions that the user could switch between for different conversational modes.
Will had hired Ken Duda to become Chief Technology Officer, and Ken started to build a team to develop the networking code, using the prototype that I was building and re-implementing it as a distributed simulation. Meanwhile, I kept prototyping new ideas. I developed some “helper” code that allowed an avatar to wander in the world as if it were being controlled by a user, using a simple AI program. In doing so, I could set my avatars wandering about, stopping in front of each other, and emitting make-believe chats for each other. This allowed me to start testing out various socializing tools that I began to work on with Chuck. These were the beginnings of our Avatar-Centric Communication System.
Analogous to the creation of Chat Rooms in traditional instant messaging systems, we wanted to allow users to create chat groups implicitly, simply by walking up to each other, facing each other, and starting to type text. These actions would be picked up by the system, and a chat group would be automatically created. Similar designs have been described by other developers and researchers, such as the Conversational Circles of Salem and Earle (2000), shown here.
Conversation Circles (Salem and Earle, 2000)
On the next page is another illustration of my prototype, in which you can see several graphical elements. These were used to help design the system and see all the inner-workings so that we could debug as we went along. The existence of a chat group is indicated by a “hula hoop” that encircles the group. A two-person group can (barely) be seen in the distance. My avatar (I’m the black woman in the foreground, to the left) has just joined the group of five avatars in the foreground. A small bit of my chat group’s hula hoop can be seen behind my avatar. If my avatar were to move away from the center of my group, the hula hoop would grow a bit to accommodate my change in positioning, but only up to a point; if I moved far enough away, I would pop out of the chat group, and the hula hoop would shrink back to a smaller size, still encompassing the remaining five members of the group.
Prototype for chat group dynamics
But why did we want the software to automatically keep track of chat groups? We wanted to help the conversation along. We had two ways of doing this: camera view, and chat balloon positioning. Refer to the illustration again, in which my avatar has just joined a chat group. As soon as my avatar joined, my camera viewpoint shifted so that the heads of all the avatars in that group were in view. Also, the chat balloons of the avatars automatically lined-up from left-to-right, and became more legible. This is explained further in the chapter, “Ectoplasm”.
A dotted white line is shown indicating the fact that my avatar’s gaze is fixed to the head of another avatar. This avatar is my “lookat target”. I would trigger this gaze behavior by passing my mouse cursor over the head of that avatar (which causes a circle to appear) and then clicking on the circle. My avatar’s gaze would become “fixed” to that avatar’s head. Both that avatar and my avatar could then move around within the chat group, and my head and eyes would remain fixated on that avatar’s head (as long as my head didn’t have to turn too much). I will explain more of this interaction and some of the considerations and consequences of virtual gaze later in this book, in the chapter called “The Three-Dimensional Music of Gaze”.
The illustration is full of information. It shows how many of the aspects of Avatar-Centric Communication came together in an early stage. Because embodied communication is naturally multimodal, we realized that we had to solve many of these problems together, simultaneously, in an integrated way. This kind of design problem cannot be solved in a linear, piecemeal fashion.
On the next page is another image from the prototype, showing a different arrangement of chat balloons and avatar gaze.
Prototype for chat group dynamics
In most 3D virtual worlds and computer games, there is a virtual camera that follows your avatar around and makes sure it is always in view: the third-person view. In most virtual worlds, we spend a lot of time looking at our avatars’ back-sides. That doesn’t happen so much in films when we’re watching the protagonist go through some adventure. As I suggested earlier, this viewpoint is a carry-over of the navigation style of many 3D games in which the player runs around in a 3D environment. In an action adventure game it’s more important to look where you are going, and what lies ahead, than to watch your own face. At There, we were looking for ways to escape from some of these constraints, and the most obvious place to look was cinema. The virtual camera is an ever-present point of view on the world. It is usually unnoticed until it does something wrong—like getting stuck behind a tree or a wall, which obscures your view of the avatar. Virtual cameras in games and virtual worlds were initially slow to evolve cinematic intelligence, while other aspects (such as 3D modeling, texturing and lighting) have been steadily advancing at a good clip. But cinematic intelligence is now becoming more sophisticated. This is partly due to the third-person camera and the need to see your avatar as it does more complicated things than just run through a dungeon and kill monsters.
One technique developed in the early days of There.com was two-person chat camera behavior. It works like this: when the avatar’s camera detects the initiation of a two-person chat, it dollies over to take a place perpendicular to the social link between the chatting avatars. This essentially places the chatters to either side of the field of view, and allows their chat balloons to take positions side-by-side.
The camera shifts to a perpendicular view for two-person chat
Easy enough to deal with two avatars, but how should the camera catch the correct view on a large group of avatar chatters? This presents a more complex problem to the camera. Entire Ph.D. dissertations have been written on autonomous, intelligent camera behavior—it ain’t easy. I was determined to make the camera do the right thing so it could be invisible to the user—like cameras are supposed to be. While prototyping the automatic chat group system, I struggled with the following problem: when an avatar joined a chat group, how could I make the avatar’s camera shift from rear-view to a view that would reveal as many faces in the group as possible, and not have any of them overlap? In coming up with a solution, I tried to make the camera shift as little as possible so as to maintain visual context (not swinging 180 degrees to the other side, for example!) Also, the algorithm was designed to try to catch as many front views of faces as possible. This scheme worked like a charm when the number of avatars in the group was small (up to about five), but things became overwhelmingly complex when the number of avatars became larger—especially since avatars tend to shift around constantly.
During play testing, some of the subjects being tested were already familiar with third-person view cameras, having played many 3D computer games. They were in the habit of shifting their avatars around—which caused their cameras to move in response. They had developed their own scheme for finding a good view. But the camera system I wrote was supposed to take care of this automatically! Since it had become so engrained in the users to move their avatars, they would unwittingly trigger the camera to re-adapt—often messing up their view :( This caused a continual fight between my camera algorithm and the user. As is the case with most fights between an artificial intelligence and a human intelligence, the human won. We removed this feature.
Chuck contributed many solutions to these problems I have been describing, based on his background in applying cinematic technique to games—such as using camera close-ups on the face. He developed some techniques that allowed the close-up face view to switch between avatars as they triggered expressions while chatting in a group.
So, now we had camera controls, avatar expressions, chat balloons, the formation of chat groups, and other techniques in our bag of tricks. After coming up with this hodge-podge of chat-related tools and algorithms, we decided that it would be best to bundle them into specific places and times when conversation was a priority (and not let them get in the way when, for example, a user is more interested in navigating a hoverboard over the hills). We called our scheme “chat props”. A chat prop is a designated environment where you expect your avatar to do conversational things. It could take the form of a couch, a couple of stumps for seats, a stage, a café bar, or a park bench.
In The Psychology of Cyberspace, John Suler discusses the utility of virtual props for conversation in The Palace (an early virtual world originally developed by Jim Bumgardner for Time Warner Interactive in the mid ‘90s). “…props make interacting easier and more efficient by providing a visual means to express oneself. They are very useful communication tools. On the simplest level, they act as conversation pieces. If you can think of nothing else to say, express an interest in someone's prop. Talking about props is one of the most common topics of discussion at The Palace. It greases the social interaction, especially with people whom you are meeting for the first time. It's like discussing the weather—except people are more personally invested in their props than they are in whether it's rainy or sunny” (Suler).
Imagine walking down a street alone on your way to a party. You are not socializing while you are walking, although you may throw an occasional smile or “hello” to a passerby on the street. You arrive at the party and look around. Perhaps you exchange a few words with some friendly strangers, but nothing especially conversational is happening. Then you notice two friends sitting on a couch eating chips, and so you sit down with them. You choose to sit next to a friend that you particularly like. Some small talk starts up along with some starter body language. Some new brain chemistry is probably stirring around in your skull.
This regime is what we implemented in code to help structure the avatar communication activities. After your avatar is placed into a chat prop, your avatar metamorphoses into a social butterfly. Various forms of body language are created when you enter text or type out emoticons. The user can control the avatar’s gaze, but also the body gaze, or “directional pose”, of the entire body (like looking at someone except using the whole body—which, by the way, can be the opposite of your head gaze, if you want, to form variations of emotional engagement). Users could trigger various gestures, facial expressions, and even trigger the formation of extra-body visual symbols, called “moodicons”.
Our first chat prop was the “loveseat”: a two-person bench where we imagined a couple sitting, talking, flirting, arguing, etc.
Camera views associated with body pose changes in the Love Seat Chat Prop
Aspects of the chatting activities caused the camera to cut to specific positions as the avatars changed their poses. If you sat facing your partner, the camera shot would emphasize togetherness. If you turned away, it would emphasize separation. There was an “over the shoulder” shot for each avatar, a “far away view”, and various views aimed at the two loveseat sitters.
While Chuck was building the scripting tool for designing chat props, he included several parameters for configuring camera behavior like those used in the loveseat. My failed attempts at automatically orienting the camera for the best view on a conversation inspired camera settings that were opened up for the user.
Regarding some of these parameters, Chuck says, “We also gave users control over the camera so they could accept the default view on joining a conversational group, which shows everyone but is quite distant, or they could rotate and zoom the camera in to better see what they are interested in. So, for example, when seated in the audience at a stage, you can choose to have a close-up camera view of the people on the stage or of the audience or of yourself and your nearest neighbors. In some games, audience members may need to talk among themselves, which is best done with one camera, and then call out answers to someone on a stage, which is best viewed with a different camera” (Isbister 2006).
For the same reason that close-up shots are used in film (greater emotional impact), we explored using camera cuts to avatar faces when making certain expressions. This was very cool, but only up to a point. Some play-testers found that this was too jarring, and so we had to tone down the number of face-cuts. Perhaps there is no way an algorithmic cinematographer, no matter how greatly imbued with AI, can ever guess the right view on matters to match a user’s needs, expectations, and communications.
Some key components of Avatar-Centric Communication
Away From Keyboard
Let’s return to this theme of presence again. If a user has to step away from his or her computer, some avatar systems provide a way to indicate that he or she is “away from the keyboard” (AFK). In There.com, a visual symbol on your avatar’s head makes this readily apparent to other users. A symbol (the goggles on the avatar’s face in the next illustration) shows that you are away. It can be set either directly by a user or else by having the user’s cursor outside of the There.com window.
The problem of representing the fact that you are AFK is familiar to most users of virtual worlds and online games. The indicators of AFK can take on many forms—some quite amusing. AFK takes on a whole new meaning when playing a game such as World of Warcraft, where critical, highly-focused gameplay has to be interrupted by a mundane need to visit the bathroom (this has come to be known as “bio-break”, sometimes expressed in a chat message as “brb bio”).
Avatar goggles signify that the user is busy
In Second Life, if a user has not touched the keyboard for a while, the avatar slumps over forward as if to fall asleep while standing—or, depending on interpretation, as if it were about to lose its dinner. Here is an example of a well-intentioned idea using what some may argue is a bad choice in visual indicators. This is parodied in a video found on YouTube (Lavenac 2007) shown below, where real people act out the antics of Second Life avatars. Also shown is a parody of the default animation that plays when a user is typing text chat; the avatar acts as if she is typing at an invisible keyboard.
Video satire of avatar antics in Second Life (Lavenac, 2007)
What is the best indicator for AFK? Well, it really depends on the context, and the nature of the game or virtual world. It’s one thing for the user to switch on an explicit indicator of AFK, with an indication of reason (“gone fishin’”, “bio-break”, “astral projecting”, “computer crashed”, “reading the manual”, etc.) before departing the virtual world to attend some other world—this is a deliberate mode change. But having the avatar system decide automatically when the user has not touched the keyboard long enough to automatically set this mode—that’s not so clear. What’s the right length of time? No good answer, unfortunately.
Let me pause here for a moment and insert some narrative so as to avoid confusion. Before starting this book I was told by a potential agent that I should write a juicy page-turner about the shady goings-on during the history of There.com. I told him that I didn’t want to write that kind of book. I would prefer to just write a Horror novel. Without going into details, I’ll just tell you that two years after working at There.com, I joined Linden Lab, makers of Second Life. So, in this chapter, and in chapters to come, I will continue telling stories about the adventures of avatar development…in the context of both of these virtual world companies. Aside from a few jabs here and there, I hope these accounts will come across as generally ☺☺cheery☺☺. And in some instances these stories may be strange and amusing, as in the case of the avatar who slumped while gesticulating—read on.
Subtleties of AFK
Both There and Second Life use avatar code that detects when the user has not pressed any keys on the keyboard for a specified duration—and when that duration passes, the avatar goes into AFK mode. While working on voice-activated gesticulation for Second Life (more details to come in the chapter, “Voice as Puppeteer”), I encountered an amusing yet annoying problem. When using voice chat as opposed to text chat, users tend not to be typing into the keyboard as much (obviously). And so, while the users were talking, their avatars would fall into slump-mode quite frequently—even through there were voices coming out of them! And since I was working on the gesticulation system, it got even weirder: these avatars would gesticulate even while slumping!—shoulders tilting, hands flapping, and heads bobbling. I was already not fond of Second Life’s slump pose—but when I saw this anomaly, I came to dislike it even more. The solution was of course to add an extra bit of code that detected when there had been no voice signals for a specified duration, as well as keyboard activity.
In the movie, Avatar, Jake Sulley controls his genetically-engineered Na’vi avatar while lying motionless in a sleep-like state. It’s a familiar science-fiction avatar puppeteering scenario: total brain interface; we’ve seen it in The Matrix, and in several other films and sci fi novels. In this case, avatar control requires no body movement at all. In fact, any body movement that Jake Sully made seemed to spell trouble: if he stirred and woke up, his avatar collapsed into a heap! In this film, the equivalent of being AFK is pretty easy to detect. That’s actually integral to the plot of the film. But it may not be the best AFK affordance to use in most virtual worlds or games. What is the deep solution to the AFK problem? There is no single solution. And at the same time, there are several solutions, each of which depend on context. This is just one of those problems that comes with having an avatar with an identity leash longer than zero.
Body language can either be deliberate and conscious, or automatic and unconscious (which is usually the case). Body language habits that are unconscious can be made conscious, and later, acted out deliberately. This fact about our real selves is played out in the cloudy, chaotic world of avatar design. Here’s a question: should the gaze behavior of avatars be automatic? For me, the answer is subtle and controversial. At There.com, the extended Avatar-Centric Communication team decided to add some automatic lookat features. For instance, when your avatar joins a chat group, all the other avatars automatically look at your avatar.
Conversational gaze in There.com
This tends to make users feel acknowledged and welcome. There.com avatars do these nearly subconscious social acts as part of their autonomic behavior, and it makes conversations feel much more natural. Also, if you mention the name of one of the avatars in your text chat, that avatar will automatically look at your avatar. And if you are doing most of the chatting, the other avatars will end up looking at you more than others. If you use the word “yes” or “no” in your chat, as well as other words with clearly-associated expressions, your avatar will generate those expressions nonverbally.
These automatic behaviors lend a sense of ease and social connectedness to the There experience. But in fact I have always been just a bit uneasy about these features, because the net effect, while I am operating my avatar, is that I am not entirely in control of my avatar’s body language. The real issue in avatar autonomic design is not what level of body language is exactly right (they’re all appropriate for different reasons). Rather, what is important is the ability to easily move between automatic and manual body language modalities—to dynamically change the length of DiPaola’s leash. At the end of the day—as far as I’m concerned—a user must always be able to set his or her avatar gaze to anything or anyone, and at any time—to override the autonomic system.
The End of an Era: the Beginning of an Era
As I write these words (2010) There.com is officially closed. When a company owns a world, that world vanishes when the company folds—along with the rich virtual lives built by its residents over the years. The phenomenon of large virtual communities losing their world, and being forced into exodus to new virtual worlds, is covered by Celia Pearce in Communities of Play (2009). The shut down of There.com was not the first time such a tragedy has occurred. In fact, in 2004 There.com itself hosted a group of nearly 500 refugees who migrated from the defunct Myst-based game Uru. Pearce chronicles this migration in her book.
Avatar-Centric Communication is an integrated design solution addressing the problem of text communication using an avatar. Little did I know when I made that first crude cylinder moving around on a grid that things would get so complicated, and so fast. And yet, at There.com, we had only scratched the surface in terms of designing tools for embodied communication in virtual worlds.
Bartle, R. 2004. Designing Virtual Worlds. New Riders Publishing.
Clanton, C., Ventrella, J. 2003. Avatar-centric Communication in There. Presentation at the People, Computers and Design seminar, Stanford University. http://hci.stanford.edu/courses/cs547/abstracts/02- 03/030404-clanton.html.
Damer, B. 2010. personal email correspondence. January, 2010.
Isbister, K. 2006. Better Game Characters by Design: A Psychological Approach. Elsevier.
Lavenac, E. 2007. YouTube video: Second Life. Draftfcb/paris: http://www.youtube.com/watch?v=flkgNn50k14 .
Nassiri, N., Powell, N., and Moore, D. 2004. “Avatar gender and personal space invasion anxiety level in desktop collaborative virtual environments”. Virtual Reality. Volume 8, Number 2. Springer.
Pearce, C. 2009. Communities of Play – Emergent Cultures in Multiplayer Games and Virtual Worlds. MIT Press.
Salem, B. Earle, N. 2000. “Designing a non-verbal language for expressive avatars”. Proceedings of the Third International Conference on Collaborative Virtual Environments. ACM.
Suler, John. The Psychology of CyberSpace. A hypertext book online at: http://www-usr.rider.edu/~suler/psycyber/psycyber.html Department of Psychology, Science and Technology Center, Rider University .
(The text above is a chapter from Virtual Body Language, by Jeffrey Ventrella)
A Creation of Many Great Minds
So many amazing people passed through the doors of There and contributed to the vision and execution - some of them stayed on for many years and became pillars. Tim Nufire came on soon after the company was founded. He helped prototype "ThereTV" - an idea for projecting real-life events in There. Tim didn't stay too long. Soon after that, Amy Morris joined to help with administrative matters - she ended up staying quite a while and also helped foster a social culture at There. Ken Duda (with more brain-power between his ears than a hundred programmers), came aboard as the CTO, right around the same time as Brett Durrett. Both of them were friends and ex-colleagues of Will. Brett and Ken were two of the most solid and hard-working people at There, for many years. Then came Mel Guymon, a computer graphics guru, who helped build the art production line, and provided brain-power to many complex problems.
I could go on. And on. But at this point I've already forgotten the chronological order of people joining, and I have to stop somewhere! Here's a list of as many people as I can come up with (with a little help from Google). In alphabetical order:
Sonny Abello, Darren Allarde, Alejandro Samina Arif, Stacey Artandi, Doug Banks, Stephanie Barrientos, Floyd Bates, John Beckwith, Eileen Belton, Bruce Benda, Howard Berkey III, Karuna Bhavnani, James Birchler, Roland Blanton,
(if I forgot anyone, let me know! - Jeffrey@Ventrella.com. Also, let me know if you have any links I can add to a name).
-yours truly and Therely, Jeffrey