Egocentric Vision for Detecting Social Relationships

Social interactions are so natural that we rarely stop wondering who is interacting with whom or which people are gathering into a group and who are not. Nevertheless, humans naturally do that neglecting that the complexity of this task increases when only visual cues are available. Different situations need different behaviors: while we accept to stand in close proximity to strangers when we at- tend some kind of public event, we would feel uncomfortable in having people we do not know close to us when we have a coffee. In fact, we rarely exchange mutual gaze with people we are not interacting with, an important clue when trying to discern different social clusters.

We address the problem of partitioning people in a video sequence into socially related groups from an egocentric vision (from now on, ego-vision) perspective. Human behavior is by no means random: when interacting with each other we generally stand in determined positions to avoid occlusions in our group, stand close to those we interact with and organize orientations to naturally be focused on the subjects we are interest in. Distance between individuals and mutual orientations assume clear significance and must be interpreted according to the situation. F-formation theory describes patterns that humans create naturally while interacting with each other and it can be used to understand whether a gathering of people forms a group or not based on the mutual distances and orientations of the subjects in the scene.

Related publications:

Alletto S, Serra G, Calderara S, Solera F, Cucchiara R. From Ego to Nos-vision: Detecting Social Relationships in First-Person Views. In: Proc. of Workshop on Egocentric (First-person) Vision. Columbus, Ohio; 2014.
Alletto S, Serra G, Calderara S, Cucchiara R. Head Pose Estimation in First-Person Camera Views. In: Proc. of International Conference on Pattern Recognition (ICPR). Stockholm, Sweden; 2014.

Datasets:

EGO-HPE: This dataset provides a set of egocentric video a with different subjects for head pose estimation. Each video is annotated at frame level for five yaw angle orientations(-75, -45, 0, 45, 75) with respect to the subject wearing the camera.
EGO-GROUP: This social group detector dataset for egocentric vision, which consists of 18 videos collected in different situations: a laboratory, a coffee break, a conference room, an outdoor scenario.