January 2021 - May 2021
How might we design a voice assistant that makes music-listening more enjoyable for multi-person households?
Voice assistants struggle to curate music experiences, especially when trying to account for multiple people’s tastes. Without any contextual awareness, they rely too heavily on user input.
We presented our interim and final outcomes to Spotify's HCI Research team. Our most compelling outcomes include:
It curates music-listening experiences that dynamically change based off context, with minimal effort
It strengthens users' interpersonal connections by sharing music between household members
Our accompanying app builds user trust so they enable sensors needed for the agent's recommendations
UX Researcher + Conversation Designer
I designed and led a series of novel research methods exploring users’ relationships to music and reactions to our WoZ prototypes.
I mapped out appropriate conversational flows based off user-triggered and sensor-triggered inputs.
Spotify (Academic Project)
Aziz Ghadiali
Lauren Hung
Directed Storytelling + Group Interviews + Affinity Diagrams
We had two goals before we could improve the enjoyability of music-listening experiences for multi-person households:
To build towards an ideal future, we first needed to understand why users listened to music, and where their experiences fell short.
We also wanted to examine household’s interpersonal dynamics, and how those played into listening preferences.
With those two goals in mind, we designed a group interview and directed storytelling hybrid protocol, in which we asked households to share their most recent times listening to music individually and together, and mapped our findings on an affinity diagram.
People care about music supporting their group activity and maintaining social environment.
People found it difficult to curate and play the right music without significant effort.
The “right music” is closely tied to the context and goals users have while listening
People want to listen to music they know they will enjoy without much experimentation
Literature Reviews
Across all their music-listening pain points, users repeatedly mentioned the frustration of having to prescribe exactly what the assistant should play. I identified a reframing opportunity that became the focus of our project:
To understand if this was possible, I went deep into literature reviews of next-gen voice assistants and context-sensing technologies, and identified four key sensor technologies (with varying levels of privacy concerns) that could together form a complete picture of users’ needs.
Always-On Video Cameras: Can detect users’ emotions and identity through facial analysis, along with objects of interest through image recognition
Always-On Microphones: Can parse relevant conversations through speech analysis, and identify ambient noises (e.g., stirring pot) through sound recognition
Radar: Can identify body language and activity through human pose analysis, along with user location
Infrared: Can also detect mood and emotion (at lower resolution) through heat signatures
Participatory Design + Wizard of Oz (WoZ)
We needed to understand whether users would find value from a proactive voice assistant, and if that value was localized to specific situations.
To do so, we employed script-based enactments of specific group-based value propositions at three levels of proactivity to test user interest. Users went through their scripts, then commented on how they'd change the experience.
To improve immersion, we ran the voice assistant’s “dialogue” through a computer-generated voice on our Zoom call. Generally, users enjoyed the proactive assistants, but with some caveats.
Low Proactivity: The agent detects that a change to the music is needed (e.g., volume, genre, song) but prompts the user for specific changes
Medium Proactivity: The agent suggests an appropriate change to the user after a contextual cue, but waits for user confirmation before acting
High Proactivity: The agent confidently identifies the appropriate change to make to the music, and only informs the user of the change made
Users generally felt a proactive voice assistant reduced their effort (in picking, switching, and discovering music) and led to better music recommendations
Users want to be able to provide more input to the assistant up-front when it’s still learning, then gradually allow it to act with more autonomy
Users don’t want the assistant to justify its actions, as long as music recommendations are reasonable
Users are uncomfortable with invasive sensors (primarily video cameras), but become more sensitive when the assistant draws attention to inputs used
Wizard of Oz (WoZ) + Longitudinal Testing + Diary Studies
How can we identify a proactive voice assistant’s true value in the real world?Real life was rarely as clean as a perfectly scripted scenario. So how could we deploy our system in real life and see where it failed and succeeded?
Step 1: Build a “good-enough” context-sensing, intelligent assistant
We couldn’t create a fully-working voice assistant with real sensors given our time and budget constraints. Instead, we developed a Wizard of Oz (WoZ) prototype, with:
Two phones
A bluetooth speaker
A lapel mic
A small box and tape
Two roommates agreed to participate in a 6-day study where we spoke and played music as Sam the voice assistant. We had three goals, based off feedback from our scripts:
GOAL:
TACTIC:
Users had only experienced the agent’s value in limited contexts
Stress-test the assistant across a broad range of common area contexts
Users wanted more control at first, but claimed they would trust the agent with time
Start off at a medium level of proactivity, then slowly ramp up to a high level
Users weren't comfortable with explicit reminders of data collected
Collect video and audio data without drawing attention and examine users’ comfort level
We combined the study with interim feedback interviews every 3 days and diary study responses after every session, which helped us refine, iterate, and improve upon our interactions.
Affinity Diagrams
Overall, our participants wanted the assistant to use contextual information to improve its proactive music recommendations
They wanted the assistant to text them details about denser information to allow them to digest it when and how they wanted
They liked the assistant creating individualized music-listening experiences, even in multi-person settings
They were completely comfortable with the assistant’s data usage, but only after they understood the purpose and value of data collection