top of page

Conversation Design

A voice assistant in auto-driving cars to helpe people arrange their travels in new cities

auto-driving cars, UI/UX Design, expolre

截屏2023-06-12 下午12.39.21.png


Over the years, many car functions have become automated. Recent advances have led to the automation of the main driving task. We think that even if there is still a long road ahead, motor vehicles with self-driving functions are increasingly shaping the mobility of the future.

My teammate and I want to explore the future possible usage of self-driving cars and how the voice assistant can help to bring better user experiences on the road. We are imagining the future of a “Fully Automated Driving” situation in which the car can handle the majority of driving situations independently.

My Role

Slot Design
Interface Design
Script Design
Intent Design
Utterance Design

Research Methods

Conversation models
User Journey Map
In-depth Interviews
Literature Review



Voice Flow
Lucid Chart

Problem space

People tangling with apps before making a single plan

When traveling to a new city, people struggled with setting up plans: Where should I go now?”; “How should I schedule my time?” ; “What is this magnificent building passed by?”; “What would be the best restaurant nearby?" …

Nowadays, in order to set a plan to explore a new city, people need to check their schedule with the calendar, search most famous sightseeing and food on Google and Yelp, and find the road with a map... have you ever felt annoyed when switching back and forth in so many apps?

Final Solution

Final solution

A multi-modal tour guide

In an age where everything is demanding your time and energy, We give you a multi-modal tour guide — the voice assistant and screen-based helper on cars — to explore the city with users together. Voice assistants on cars have the potential to become the perfect substitute for human tour guides or google, providing users with a collaborative, delightful, and interesting experience when exploring the city.



I. First time designing within autonomous vehicle

Challenge: Experts have defined five levels in the evolution of autonomous driving. Each level describes the extent to which a car takes over tasks and responsibilities from its driver, and how the car and driver interact. However, it is our first time to design within autonomous vehicles. What do they mean? Which level should we choose?

Design decision: Consulting with industry experts and doing in-depth research, we realized that main constraints, in terms of regulations, laws and insurance, lies in level 2 to level 3. Humans have to be ready to take over, yet carmakers aren’t particularly confident in their ability to do so at a millisecond’s notice. As a result, for this project, we are trying to innovate by jumping to Level 4 autonomy, where no driver involvement is necessary in almost all cases.


II. building trust is an important concern for autonomous cars

Challenge: In order to bring users a better experience, we struggled a lot and found it hard to decide the main interaction modal. At first sight, visual interactions made more sense since they are more intuitive & familiar with users and auto-driving vehicles provided users with time & energy to distract from driving. However, in the interview process, we realized that lacking of trust on vehicles, as a key concern in In car monitoring driving environment, was hard to be mitigated by visual interactions

Design decision: Conducting literature review and consulting with academic experts, we found that in-car Voice Assistants had the chance to play unprecedented roles in building trust between human and cars. Consulting with academic experts, we decided to move the main modal from visual to voice design. Moreover, consulting with expert Paul Pangaro and researching in-depth on his four modes theory. Collaboration mode, one of the four modes, could be the key to build trust between users and cars.

Overall process

截屏2023-06-12 下午12.56.37.png

Context settings

Identify the purpose and capability

Based on 10+ literature reviews and 3 rounds of users testing, we brainstormed the main purpose and capability of our virtual tour guide. We explored three basic patterns to generalize all potential scenarios in the process.

截屏2023-06-12 下午12.56.49.png

Who is our user?

After interviewing people from various ages, and industries, we realized that users have various requirements for voice assistants and held different views on autonomous cars. In order to include multiple patterns and more constraints, we selected a young person who loves traveling and is willing to experience the latest technology as our final persona.

截屏2023-06-12 下午12.56.57.png

What does she experience?

In the journey, we set three stops for Jenny and assigned them different patterns :

  • Coffee shop: Pattern 02. Users have a specific topic in mind

  • Lady M: Pattern 03. Users have a specific place in mind

  • Vintage store: Pattern 01. Users would like to explore the city without a specific preference

Storyline: It was Jenny’s last day staying in New York for the spring break and her flight back to LA is 20:00 PM at JFK airport. Jenny checked out of the hotel at 10:00 AM and rented an autonomous vehicle from the hotel front-desk. She wanted to explore the city. and buy a birthday cake for her friends. However, Lady M wasn’t open until 11:00 AM and thus the voice assistant recommended a coffee shop for her to have some brunch first. After buying the cake, there’re still plenty of time and Jenny didn’t have a solid plan so she just let the voice assistant to help her explore the city and they went to several vintage stores.

截屏2023-06-12 下午12.57.03.png

Design and Iteration

Draft: preliminary flow


Now that we had a clear picture of who’s communicating and what they’re communicating about, we started writing sample dialogs. We were looking for a quick, low-fidelity sense of the “sound-and-feel” of the interaction we’re designing. We conducted role-playing with 3+ people to create the initial script and generalized the preliminary drafted flow.

First iteration: Improve the logic of the flow

Conducting multiple user interviews, we concluded some insights and focused this round of iteration on exploring more adaptive flows the for each pattern according these insights.

INSIGHT 1. Users want more flexible flows in communications.

Findings: In interviews, users did not always follow the flow but jumped into the middle of the flow directly.

Design decision: We thought about potential starting points users may jump into and added that to our flow.Here is an example and click here to view the whole picture.

截屏2023-06-12 下午12.57.08.png

INSIGHT 2. Users expect voice assistants to remember some information for them.

Findings: Users expect the CUI to remember some important information after each round of conversation.

Design decision: We then asked ourselves “What kind of information does our CUI need to remember”. We figured out that the final destination and time to arrive at the final destination is important to users and it will be great if CUI can keep those information in mind during the journey.

截屏2023-06-12 下午12.57.13.png

INSIGHT 3. Instend of picking destinations carefully, Users would like to make decisions quickly

Findings: In the preliminary draft, we sequenced the three patterns to help them narrow down scopes and found a destination. However, after several interviews, we realized that, instead of narrowing down step by step, it’s better to give out some recommendations or just randomly explore the city.

Design decision: In the iteration, we redesigned the flow of the pattern so that users don’t have preferences. We provided straightforward recommendations and also a choice called “random exploration” to help narrow down their choice without compromising their active exploration.

截屏2023-06-12 下午12.57.17.png

Design and Iteration

Second iteration: making conversations more human-like

Testing with more users and consulting with academic experts in the field, we realized that giving more human-like emotion was an important part that was missing in our first iteration. One of the most important reasons is that keeping the conversation more like a human-to-human conversation can help build trust between voice assistants and users. There are four basic modes in human conversations according to the paper from Paul Pangaro: controlling, delegating, collaborating, and guiding.

INSIGHT 1. Human shift conversation modes when they talks

Findings: Consulting with academic experts and conducting a literature review, we realized that human beings combine different modes of conversation and shift among different modes when they talk.

Design decision: 
Bearing designing human-like and natural conversations in mind, we segmented the flow into smaller rounds and assigned each round of conversation with specific modes according to their goals and means.

截屏2023-06-12 下午12.57.23.png

INSIGNT 2: Collaborating mode can help build trust between users and voice assistants

Findings: In collaborating mode, users and voice assistants would work together to define a goal for users. For example, users would like to set a time to go to the airport. In traditional controlling modes, voice assistants would follow what users say. However, if voice assistants are able to suggest a better time for users, and also discuss with users together to settle the time, users would begin to build some trust on assistants.

Design decision: 
After consulting with academic experts, we redesigned three main parts of the flow and modified them into more collaborating modes. Here is an example that set time to go to the airport. The whole flow can be referred here.

截屏2023-06-12 下午12.57.28.png

Design intents, utterances and slots

Guided by the finalized conversation flow, we rewrote the sample dialogs into a more detailed script. After that, we annotated each sentence users may say and the voice assistant may answer by identifying the utterances, intentation, entity, and slots in each turn.

截屏2023-06-12 下午12.57.34.png

Designing interface


When designing the conversation flow, we realized that visual interaction would be more efficient and understandable for users in certain scenarios. Moreover, bearing patterns we designed in mind, we iterated the flow from only conversation modal to the combination between conversation and visual modals. In the later testing sessions with more than 3 users, we realized that organically combining voice and visual interface could boost the communication experience between users and the voice assistant.

bottom of page