Exploring the Design Space of Emotion-Driven Voice Interaction for Games and Playful Interaction Contexts edit

Anzahl Teilnehmer*innen (min/max) 2-6
Start tbd.
Sprache German or English
ILU Projekt auf ILU
Schwerpunkt EXA, DUX

An example interaction design probe implemented in Unity. The user can command the robot using voice commands (e.g., “go to the blue square”). The robot will listen to what was said (e.g., recognize linguistic commands) and how it was said (e.g., recognize affective non-verbal cues in the speech audio). Consequently, the robot implements a proactive and different style of behavior when implementing the same command (e.g., running faster to the target or performing poses when the user sounds more angry, cautious, joyful, etc. )  An example interaction design probe implemented in Unity. The user can command the robot using voice commands (e.g., “go to the blue square”). The robot will listen to what was said (e.g., recognize linguistic commands) and how it was said (e.g., recognize affective non-verbal cues in the speech audio). Consequently, the robot implements a proactive and different style of behavior when implementing the same command (e.g., running faster to the target or performing poses when the user sounds more angry, cautious, joyful, etc. )

Problem Description

Problem Statement:

What kinds of novel, emotion-driven proactive voice interaction concepts for games (or playful interaction contexts in general) can be created by combining Speech Emotion Recognition and Large Language Models, and how do users experience controlling a game through expressive voice acting rather than explicit commands?

Background and Motivation:

Affective computing aims to enable interactive systems to sense, interpret, and respond to human emotions. One promising direction is Speech Emotion Recognition (SER), which infers emotional states from vocal cues such as prosody, pitch, intensity, but also implicit linguistic cues (keywords, etc.). At the same time, Large Language Models (LLMs) enable flexible, context-aware interpretation of user intent and dynamic generation of linguistic system responses. Despite advances in both areas, their combined use in interactive applications remains underexplored, particularly in game design, where emotional expression and role-play are central to user experience. Voice-based game interaction is usually limited to command recognition (“jump”, “attack”), while the expressive qualities of voice acting are largely ignored. Affective computing enables an underexplored form of context understanding, which can be used for proactive design ( e.g., anticipating what kind of system behavior the user expects or would prefer). This project explores how emotion-aware voice interaction, powered by SER and LLMs, can open up new interaction paradigms for games.

Project Definition

A basic example and framework to realize the interaction concept of speech emotion commands will be provided. The example project connects a Python and a Unity application to allow interacting through speech emotion commands with graphical content in Unity. The following details project content and expectations:

  1. Design and implement interaction probes that integrate SER and LLM
  2. Explore new game interaction mechanics where, e.g.:
  1. Investigate user experience, focusing on, e.g.:

Example Interaction Concepts

Research Questions (Examples)

Expected Outcomes

Learning Outcome

Students will experience/learn/improve:

Participation Requirements

arrow_upward