Header picture of Adidas dashboard

Sam: The Proactive Home DJ

Research Project Advised by Spotify HCI

January 2021 - May 2021

I developed a conversational design framework for Sam – the proactive smart speaker. It curates personalized music-listening experiences for people in their homes based on activities, moods, and interpersonal dynamics.

Research Goal

How might we design a voice assistant that makes music-listening more enjoyable for multi-person households?

Voice assistants struggle to curate music experiences, especially when trying to account for multiple people’s tastes. Without any contextual awareness, they rely too heavily on user input.

Solution Benefits

We presented our interim and final outcomes to Spotify's HCI Research team. Our most compelling outcomes include:

  1. It curates music-listening experiences that dynamically change based off context, with minimal effort

  2. It strengthens users' interpersonal connections by sharing music between household members

  3. Our accompanying app builds user trust so they enable sensors needed for the agent's recommendations

My Role

UX Researcher + Conversation Designer

I designed and led a series of novel research methods exploring users’ relationships to music and reactions to our WoZ prototypes.

I mapped out appropriate conversational flows based off user-triggered and sensor-triggered inputs.

Client

Spotify (Academic Project)

Collaborators

Aziz Ghadiali

Lauren Hung

icon of persona analysis phase

Project Timeline

After initial ethnographic research to learn how households listen to music now, I tested ways to improve that experience with a proactive smart speaker. That led to a Wizard of Oz remote study that we refined into a full voice assistant and mobile experience.

icon of persona analysis phase

Understand Music Listening Today

I conducted 6 household focus group interviews to understand the highs and lows associated with existing group music-listening experience at home.


Directed Storytelling + Group Interviews + Affinity Diagrams

How enjoyable are group music-listening experiences at home today?

We had two goals before we could improve the enjoyability of music-listening experiences for multi-person households:

1

To build towards an ideal future, we first needed to understand why users listened to music, and where their experiences fell short.

2

We also wanted to examine household’s interpersonal dynamics, and how those played into listening preferences.

With those two goals in mind, we designed a group interview and directed storytelling hybrid protocol, in which we asked households to share their most recent times listening to music individually and together, and mapped our findings on an affinity diagram.

We learned...
icon of understanding context phase

People care about music supporting their group activity and maintaining social environment.

icon of understanding context phase

People found it difficult to curate and play the right music without significant effort.

icon of understanding context phase

The “right music” is closely tied to the context and goals users have while listening

icon of understanding context phase

People want to listen to music they know they will enjoy without much experimentation

icon of persona analysis phase

Test Potential Futures

I fleshed out the concept for a proactive music-playing voice assistant, and tested variations of what that could look like with a special script-based form of participatory design


Literature Reviews

What could the voice assistant of the future look like?

Across all their music-listening pain points, users repeatedly mentioned the frustration of having to prescribe exactly what the assistant should play. I identified a reframing opportunity that became the focus of our project:

What if voice assistants could recommend and adjust music proactively to account for users’ changing contexts, social dynamics, and goals?


To understand if this was possible, I went deep into literature reviews of next-gen voice assistants and context-sensing technologies, and identified four key sensor technologies (with varying levels of privacy concerns) that could together form a complete picture of users’ needs.

icon of persona analysis phase

Always-On Video Cameras: Can detect users’ emotions and identity through facial analysis, along with objects of interest through image recognition

icon of persona analysis phase

Always-On Microphones: Can parse relevant conversations through speech analysis, and identify ambient noises (e.g., stirring pot) through sound recognition

icon of persona analysis phase

Radar: Can identify body language and activity through human pose analysis, along with user location

icon of persona analysis phase

Infrared: Can also detect mood and emotion (at lower resolution) through heat signatures

Participatory Design + Wizard of Oz (WoZ)

How do music listeners feel about a proactive voice assistant?

We needed to understand whether users would find value from a proactive voice assistant, and if that value was localized to specific situations.

To do so, we employed script-based enactments of specific group-based value propositions at three levels of proactivity to test user interest. Users went through their scripts, then commented on how they'd change the experience.

To improve immersion, we ran the voice assistant’s “dialogue” through a computer-generated voice on our Zoom call. Generally, users enjoyed the proactive assistants, but with some caveats.

1

Low Proactivity: The agent detects that a change to the music is needed (e.g., volume, genre, song) but prompts the user for specific changes

2

Medium Proactivity: The agent suggests an appropriate change to the user after a contextual cue, but waits for user confirmation before acting

3

High Proactivity: The agent confidently identifies the appropriate change to make to the music, and only informs the user of the change made

We learned...
icon of understanding context phase

Users generally felt a proactive voice assistant reduced their effort (in picking, switching, and discovering music) and led to better music recommendations

icon of understanding context phase

Users want to be able to provide more input to the assistant up-front when it’s still learning, then gradually allow it to act with more autonomy

icon of understanding context phase

Users don’t want the assistant to justify its actions, as long as music recommendations are reasonable

icon of understanding context phase

Users are uncomfortable with invasive sensors (primarily video cameras), but become more sensitive when the assistant draws attention to inputs used

icon of persona analysis phase

Usability Test a Voice Assistant

I co-developed a Wizard of Oz prototyping method for a proactive voice assistant and embedded this prototype in a multi-person household for 6 days to identify improvement opportunities


Wizard of Oz (WoZ) + Longitudinal Testing + Diary Studies

How can we identify a proactive voice assistant’s true value in the real world?

Real life was rarely as clean as a perfectly scripted scenario. So how could we deploy our system in real life and see where it failed and succeeded?

Step 1: Build a “good-enough” context-sensing, intelligent assistant

We couldn’t create a fully-working voice assistant with real sensors given our time and budget constraints. Instead, we developed a Wizard of Oz (WoZ) prototype, with:

icon of persona analysis phase

Two phones

icon of persona analysis phase

A bluetooth speaker

icon of persona analysis phase

A lapel mic

icon of persona analysis phase

A small box and tape



Step 2: Embed this assistant into participants’ lives

Two roommates agreed to participate in a 6-day study where we spoke and played music as Sam the voice assistant. We had three goals, based off feedback from our scripts:

GOAL:

TACTIC:

Users had only experienced the agent’s value in limited contexts

Stress-test the assistant across a broad range of common area contexts

Users wanted more control at first, but claimed they would trust the agent with time

Start off at a medium level of proactivity, then slowly ramp up to a high level

Users weren't comfortable with explicit reminders of data collected

Collect video and audio data without drawing attention and examine users’ comfort level



We combined the study with interim feedback interviews every 3 days and diary study responses after every session, which helped us refine, iterate, and improve upon our interactions.

Affinity Diagrams

We learned...

icon of understanding context phase

Overall, our participants wanted the assistant to use contextual information to improve its proactive music recommendations

icon of understanding context phase

They wanted the assistant to text them details about denser information to allow them to digest it when and how they wanted

icon of understanding context phase

They liked the assistant creating individualized music-listening experiences, even in multi-person settings

icon of understanding context phase

They were completely comfortable with the assistant’s data usage, but only after they understood the purpose and value of data collection

Synthesizing 6 days' worth of experiences took a bit of time...



icon of persona analysis phase

Refine the Full Experience

I designed a more informative WoZ study and an accompanying onboarding app that led to the most enjoyable participant experiences with Sam yet.


Wizard of Oz (WoZ) + Longitudinal Testing + Diary Studies

How do we test the scalability of our findings?

We had received a lot of validation on our assistant, improved its conversational flow and added in new features like trivia and “just for me” moments.

But to ensure we weren’t overfitting to a single household’s preferences, we elected to run our study again with some deliberate changes:

QUESTION:

IDEA:

Was it easier to find amenable music because our participants were brothers?

Recruit a diverse 4-person household with different music tastes and interpersonal dynamics

How can we reduce the frustration of 10-minute setup times for our participants?

Streamline setup with phone stands we built from foam core, allowing for easy placement and maneuverability

How do we test viability beyond a limited set of contexts (cooking, working, relaxing)?

Move the assistant around the house to capture diverse contexts and situations to assess the agent’s capabilities

quotation mark end

ISSUE:

Our initial study had some clear limitations that made it difficult to generalize its findings



quotation mark end

SOLUTION:

Conduct a second study with markedly different participant dynamics, contexts to test, and an improved prototype



We learned...

Although feedback was still overwhelmingly positive, we learned about some limitations of our agent, which will be focus areas for future studies.

icon of understanding context phase

Mood is the most informative factor in predicting music listening preferences, but is also the most difficult to consistently interpret.

icon of understanding context phase

People want to use music to strengthen social connections, but want to minimize their own vulnerability.

icon of understanding context phase

Listeners liked additional information to help them connect to the music, but context dictated their preferred method of delivery

icon of understanding context phase

Adopting a friendly and supportive voice and tone increases the overall effectiveness of the agent

icon of persona analysis phase

Final Value Proposition

Sam is a context-sensing, proactive voice assistant. Sam curates the perfect music for any moment by dynamically adjusting the music to match people’s activities, tastes, and moods.

Match Listeners' Needs in the Moment

Adapts to listening contexts by identifying listeners’ activities in the moment and mapping them to preferred music based off historical patterns and similar user preferences.

Adjusts the prominence of the music by separating high-cognitive-load (e.g., working) and low-cognitive-load tasks (e.g., cooking) and tweaking music volume, BPM, and lyric prominence accordingly.

Sam identifies what users are talking about and finds relevant music


Foster Interpersonal Connections

Adjusts for listener dynamics by detecting the identities, music taste overlaps, and inferred relationships of nearby users and optimizing for those dynamics

Creates appreciation for others’ tastes by introducing users’ music to others based off calculated similarity in music tastes and closeness of user relationship.

Sam introduces one housemate to the music of another


Deepen Personal Music Connections

Creates “personalized musical moments” by identifying objects of interests in field of view and suggesting music based on niche musical associations with those objects.

Integrates text-based communication to relay denser pieces of music-related information introduced during conversation (e.g., concert information, news articles, personal playlists).

Sam detects chorizos with its camera and plays a relevant song

icon of persona analysis phase

Results & Final Thoughts

I consolidated our findings across our two studies into a single conversation framework that can serve as a basis for future research that I'll be advising on.

Consolidating findings into a framework

Our team is staying on in an advisory capacity as part of a broader team that will be refining the conversation design we laid the foundation for. They'll focus on building deeper social connections with music, and refining the conversation design of the agent.

We developed a model conversational flow that mapped out exactly how we acted across our two studies in successful interactions, along with a model for conversation that could guide designers going forward.

Sam can drive significant value for both listeners and Spotify, as we validated with our presentation to the Spotify HCI Research Team. This includes:

Listener Benefits:
icon of persona analysis phase

Eliminates the need for users to monitor or adjust their music based off their contexts, with dynamic and automatic adjustments

icon of persona analysis phase

Helps users strengthen their social connections within their household by sharing music between household members

icon of persona analysis phase

Helps users better connect to the music itself with timely pieces of extra information (e.g., concert info, trivia, new release information)

Spotify Benefits:
icon of persona analysis phase

Differentiated value in the voice assistant market, strengthening ecosystem partnerships with smart home players

icon of persona analysis phase

Increased active users as users can reliably expect their music to not be a distraction in their day-to-day lives

icon of persona analysis phase

Increased user sign-up and retention from a superior music recommendation system