An example interaction design probe implemented in Unity. The user can command the robot using voice commands (e.g., “go to the blue square”). The robot will listen to what was said (e.g., recognize linguistic commands) and how it was said (e.g., recognize affective non-verbal cues in the speech audio). Consequently, the robot implements a proactive and different style of behavior when implementing the same command (e.g., running faster to the target or performing poses when the user sounds more angry, cautious, joyful, etc. )
Problem Description
Problem Statement:
What kinds of novel, emotion-driven proactive voice interaction concepts for games (or playful interaction contexts in general) can be created by combining Speech Emotion Recognition and Large Language Models, and how do users experience controlling a game through expressive voice acting rather than explicit commands?
Background and Motivation:
Affective computing aims to enable interactive systems to sense, interpret, and respond to human emotions. One promising direction is Speech Emotion Recognition (SER), which infers emotional states from vocal cues such as prosody, pitch, intensity, but also implicit linguistic cues (keywords, etc.). At the same time, Large Language Models (LLMs) enable flexible, context-aware interpretation of user intent and dynamic generation of linguistic system responses. Despite advances in both areas, their combined use in interactive applications remains underexplored, particularly in game design, where emotional expression and role-play are central to user experience. Voice-based game interaction is usually limited to command recognition (“jump”, “attack”), while the expressive qualities of voice acting are largely ignored. Affective computing enables an underexplored form of context understanding, which can be used for proactive design ( e.g., anticipating what kind of system behavior the user expects or would prefer). This project explores how emotion-aware voice interaction, powered by SER and LLMs, can open up new interaction paradigms for games.
Project Definition
A basic example and framework to realize the interaction concept of speech emotion commands will be provided. The example project connects a Python and a Unity application to allow interacting through speech emotion commands with graphical content in Unity. The following details project content and expectations:
- Design and implement interaction probes that integrate SER and LLM
- Explore new game interaction mechanics where, e.g.:
- Player emotions expressed through voice influence game actions, outcomes, or character behavior
- Emotional expression complements or replaces traditional voice commands
- Investigate user experience, focusing on, e.g.:
- How users perceive emotional voice control
- The role of voice acting and emotional performance in gameplay
- Feelings of immersion, agency, expressiveness, and playfulness
Example Interaction Concepts
- A game character that reacts differently when commands are spoken calmly, angrily, or fearfully
- Emotion-triggered abilities (e.g., shouting in anger increases attack power, calm speech enables stealth)
- NPCs whose behavior adapts based on the player’s emotional tone rather than explicit dialogue choices
- LLM-driven narrative branching influenced by detected vocal emotion These concepts are exploratory and should be treated as research-through-design (RtD) artifacts.
Research Questions (Examples)
- How do users experience emotion-based voice control compared to traditional command-based interaction?
- Do players find expressive voice acting empowering, awkward, playful, or exhausting?
- How does emotional voice input affect immersion and role-play?
- What breakdowns occur when SER misclassifies emotions, and how do users interpret or adapt to them?
Expected Outcomes
- One or more working interaction probes (low- or high-fidelity)
- Detailed documentation and description with Images (& Videos) of the probes in the context of the research through design (RtD) process
- A user study or exploratory evaluation documenting player experiences (optional in RtD)
- Design insights and limitations for emotion-driven voice interaction in games
Learning Outcome
Students will experience/learn/improve:
- to apply RtD as a research method, and how to document and produce knowledge through the creation of design probes/artifacts
- improve their Unity and Python programming and prototyping skills
- improve presentation and teamwork skills in a game design project
- skills to design AI-driven interactions and experiences
Participation Requirements
- Practical experience in Python and Unity programming, and a willingness to deepen their skills
- Interest in Game and UX design
- Interest in integrating AI technologies to explore advanced interaction technologies
- Willingness to document (especially with images/video) the design process in detail
- Appreciation of aesthetics in their documentation and interaction designs