Stefanos Nikolaidis
Home Publications Academics Contact

The goal of my research is to improve human-robot collaboration by finding principled ways for robots to account for their human counterparts in their decision making. I focus on developing game-theoretic algorithms and probabilistic planning techniques that enable robots to act optimally, using compact representations of the human state and its dynamics.

The first part of my work, done in the Interactive Robotics Group at MIT, focuses on enabling robots to integrate seamlessly into human teams, by adapting to the preference of their human teammates. In particular, the robot can learn effectively the individualized preference of their human teammate through cross-training, a widely used team-training practice. Learning exact individualized models requires a large number of samples, however, which can be tedious from a human-robot interaction perspective. Therefore, human users can be clustered into types using unsupervised learning. When a new user comes to work with the robot, the robot can infer their type online and execute a policy aligned with their type.

What if the human teammate has a preference that is suboptimal, however? My PhD thesis at the Personal Robotics Lab at CMU focuses on improving team performance by reasonining over the effect of the robot actions to the future human behavior. I propose bounded-memory and game-theoretic models of human adaptation that enable the robot to guide users that are willing to change their preference towards a better way of completing the task, while retaining their trust in the robot. The models are applicable in both human-robot collaboration and shared-autonomy settings.


Cross-training is a training practice widely used in human teams, in which teammates switch roles to learn a shared plan for a collaborative task. Our HRI2013 and IJRR2015 papers present a computational formulation of cross-training for the team of a human and a robot, evaluated it in a large-scale subject experiment. Compared to standard interactive reinforcement learning techniques, cross-training provides statistically significant improvements in quantitative team performance measures. Additionally, significant differences emerge in the perceived robot performance and human trust. These results support the hypothesis that effective and fluent human-robot teaming may be best achieved by modeling effective practices for human teamwork.

The video above shows the difference in team fluency between two participants at the task execution phase that succeeds the team training, and the virtual environment at which the participants trained with the robot.


Learning individualized models in complex tasks requires a large number of samples, which can be tedious from a human-robot interaction perspective. On the other hand, in several collaborative scenarios, a limited number of "dominant" strategies can capture the majority of observed behaviors. Using this insight, in our HRI2015 paper (Best Enabling Technologies Award), we use unsupervised learning to cluster demonstrated sequences of human and robot actions into human types. We then denote the human type as a partially observable variable in a mixed-observability Markov decision process (MOMDP), which enables the robot to infer online the type of a new user, and compute a robot policy that follows the preference of this user.

Hand finishing task      Flowchart


What if the human teammate has a suboptimal preference, and the robot knows a better way of completing the task? Our objective is to improve team performance by having the robot guide the operator towards a new goal, while retaining their trust. In our HRI2016 and IJRR2017 papers, we proposed a bounded-memory model of human adaptation (BAM), where the human may switch towards a different strategy demonstrated by the robot, with a probability equal to their adaptability. We then integrate the model into a partially observable stochastic model for the robot, enabling robot adaptation to the human: when the human is adaptable, the robot will guide the human towards a better goal, unknown to them in advance. Otherwise, it will adapt to the human retaining their trust. In our HRI2017 paper, we generalize the formalism to a shared-autonomy setting, where the human operates the robot using a joystic interface.

Hand finishng task      


Whereas in our BAM model the human may switch towards a robot strategy with some probability, in several settings human adaptation can be more subtle: The human teammate may change their actions based on the observed outcomes and the robot's actions, without replicating the robot's policy. In our HRI2017 paper, we present a game-theoretic model of human partial adaptation to the robot, where the human responds to the robot’s actions by maximizing a reward function that changes stochastically over time, capturing the evolution of their expectations of the robot’s capabilities. We prove that the optimal policy can be computed efficiently, and we evaluate our model in an actual human-robot collaboration task.

Table clearing task      Game-theoretic models


We propose a robotic scheduling and control capability for human-robot collaborative work. The algorithm computes a flexible optimal scheduling policy for the robot that accomodates temporal disturbance. The algorithm also supports on-the-fly optimization in response to changing human preferences. Our RSS2012 paper shows that execution is computationally fast, robust, and adaptable to changing preferences for workflow.