publications
2024
- PsyArXivModeling dynamic social vision highlights gaps between deep learning and humansJun 2024
Deep learning models trained on computer vision tasks are widely considered the most successful models of human vision to date. The majority of work that supports this idea evaluates how accurately these models predict brain and behavioral responses to static images of objects and natural scenes. Real-world vision, however, is highly dynamic, and far less work has focused on evaluating the accuracy of deep learning models in predicting responses to stimuli that move, and that involve more complicated, higher-order phenomena like social interactions. Here, we present a dataset of natural videos and captions involving complex multi-agent interactions, and we benchmark 350+ image, video, and language models on behavioral and neural responses to the videos. As with prior work, we find that many vision models reach the noise ceiling in predicting visual scene features and responses along the ventral visual stream (often considered the primary neural substrate of object and scene recognition). In contrast, image models poorly predict human action and social interaction ratings and neural responses in the lateral stream (a neural pathway increasingly theorized as specializing in dynamic, social vision). Language models (given human sentence captions of the videos) predict action and social ratings better than either image or video models, but they still perform poorly at predicting neural responses in the lateral stream. Together these results identify a major gap in AI’s ability to match human social vision and highlight the importance of studying vision in dynamic, natural contexts.
- TiCSThe neurodevelopmental origins of seeing social interactionsEmalie McMahon, and Leyla IsikTrends in Cognitive Sciences Mar 2024
In a recent letter, Grossmann argues that, in young children and non-human primates, third-party social interaction recognition is supported by top-down processing in the medial prefrontal cortex (mPFC). He suggests that top-down signals in the developing brain may be used to train neural systems in the superior temporal sulcus (STS), which, in adults, appears to process social interactions in a visual manner. The hypothesis that the visual computations supporting social interactions are trained using top-down signals from the mentalization network is interesting. However, activation of mPFC when viewing social interactions does not preclude visual processing. As we discuss in our original article, when seeing social interactions, viewers can make rich inferences about the goals and mental states of the interacting agents. Young children and non-human primates may spontaneously engage higher-level cognitive processes when viewing social interactions, but we argue these processes are separate from the recognition of the interaction itself. Later, we review evidence suggesting that social interactions are processed visually in both young children and non-human primates and that STS selectivity emerges early in life.
- TiCSAbstract social interaction representations along the lateral pathwayEmalie McMahon, and Leyla IsikTrends in Cognitive Sciences May 2024
Recent work in vision science and visual neuroscience has moved away from focusing on single people or objects to understanding the relations between them. Based on converging behavioral, computational, and neuroscience evidence, we recently argued that the visual system contains rich, abstract representations of social interactions between others [1]. We also outlined a framework for how this may be implemented in the human brain hierarchically [1], beginning with detecting agents, processing their physical relations, and finally recognizing their social interactions. As part of this framework we argue that mid-level visual features about the physical relations between agents, which we refer to as social primitives, are represented in the extrastriate body area (EBA) and nearby regions of lateral occipitotemporal cortex (LOTC), whereas more abstract information about social interactions is represented in more anterior regions along the superior temporal sulcus (STS). However, Papeo questions our claim that social interaction representations in the STS are, in fact, abstract. We review evidence here that strengthens our claims of a posterior-to-anterior gradient of increasingly abstract representations of social interaction features along the recently proposed lateral visual stream.
2023
- Current BioHierarchical organization of social action features along the lateral visual pathwayEmalie McMahon, Michael F. Bonner, and Leyla IsikCurrent Biology May 2023
Recent theoretical work has argued that in addition to the classical ventral (what) and dorsal (where/how) visual streams, there is a third visual stream on the lateral surface of the brain specialized for processing social information. Like visual representations in the ventral and dorsal streams, representations in the lateral stream are thought to be hierarchically organized. However, no prior studies have comprehensively investigated the organization of naturalistic, social visual content in the lateral stream. To address this question, we curated a naturalistic stimulus set of 250 3-s videos of two people engaged in everyday actions. Each clip was richly annotated for its low-level visual features, mid-level scene and object properties, visual social primitives (including the distance between people and the extent to which they were facing), and high-level information about social interactions and affective content. Using a condition-rich fMRI experiment and a within-subject encoding model approach, we found that low-level visual features are represented in early visual cortex (EVC) and middle temporal (MT) area, mid-level visual social features in extrastriate body area (EBA) and lateral occipital complex (LOC), and high-level social interaction information along the superior temporal sulcus (STS). Communicative interactions, in particular, explained unique variance in regions of the STS after accounting for variance explained by all other labeled features. Taken together, these results provide support for representation of increasingly abstract social visual content—consistent with hierarchical organization—along the lateral visual stream and suggest that recognizing communicative actions may be a key computational goal of the lateral visual pathway.
- TiCSSeeing social interactionsEmalie McMahon, and Leyla IsikTrends in Cognitive Sciences May 2023
Seeing the interactions between other people is a critical part of our everyday visual experience, but recognizing the social interactions of others is often considered outside the scope of vision and grouped with higher-level social cognition like theory of mind. Recent work, however, has revealed that recognition of social interactions is efficient and automatic, is well modeled by bottom-up computational algorithms, and occurs in visually-selective regions of the brain. We review recent evidence from these three methodologies (behavioral, computational, and neural) that converge to suggest the core of social interaction perception is visual. We propose a computational framework for how this process is carried out in the brain and offer directions for future interdisciplinary investigations of social perception.
2021
- CogSciUnderstanding Mental Representations Of Objects Through Verbs Applied To ThemKa Chun Lam, Francisco Pereira, Maryam Vaziri-Pashkam, and 2 more authorsMay 2021
In order to interact with objects in our environment, we rely on an understanding of the actions that can be performed on them, and the extent to which they rely or have an effect on the properties of the object. This knowledge is called the object “affordance”. We propose an approach for creating an embedding of objects in an affordance space, in which each dimension corresponds to an aspect of meaning shared by many actions, using text corpora. This embedding makes it possible to predict which verbs will be applicable to a given object, as captured in human judgments of affordance, better than a variety of alternative approaches. Furthermore, we show that the dimensions learned are interpretable, and that they correspond to typical patterns of interaction with objects. Finally, we show that the dimensions can be used to predict a state-of-the-art mental representation of objects, derived purely from human judgements of object similarity.
- JoVThe ability to predict actions of others from distributed cues is still developing in 6- to 8-year-old childrenEmalie McMahon, Daniel Kim, Samuel A. Mehr, and 3 more authorsJournal of Vision May 2021
Adults use distributed cues in the bodies of others to predict and counter their actions. To investigate the development of this ability, we had adults and 6- to 8-year-old children play a competitive game with a confederate who reached toward one of two targets. Child and adult participants, who sat across from the confederate, attempted to beat the confederate to the target by touching it before the confederate did. Adults used cues distributed through the head, shoulders, torso, and arms to predict the reaching actions. Children, in contrast, used cues in the arms and torso, but we did not find any evidence that they could use cues in the head or shoulders to predict the actions. These results provide evidence for a change in the ability to respond rapidly to predictive cues to others’ actions from childhood to adulthood. Despite humans’ sensitivity to action goals even in infancy, the ability to read cues from the body for action prediction in rapid interactive settings is still developing in children as old as 6 to 8 years of age.
2019
- JoVSubtle predictive movements reveal actions regardless of social contextEmalie McMahon, Charles Y. Zheng, Francisco Pereira, and 3 more authorsJournal of Vision Jul 2019
Humans have a remarkable ability to predict the actions of others. To address what information enables this prediction and how the information is modulated by social context, we used videos collected during an interactive reaching game. Two participants (an “initiator” and a “responder”) sat on either side of a plexiglass screen on which two targets were affixed. The initiator was directed to tap one of the two targets, and the responder had to either beat the initiator to the target (competition) or arrive at the same time (cooperation). In a psychophysics experiment, new observers predicted the direction of the initiators’ reach from brief clips, which were clipped relative to when the initiator began reaching. A machine learning classifier performed the same task. Both humans and the classifier were able to determine the direction of movement before the finger lift-off in both social conditions. Further, using an information mapping technique, the relevant information was found to be distributed throughout the body of the initiator in both social conditions. Our results indicate that we reveal our intentions during cooperation, in which communicating the future course of actions is beneficial, and also during competition despite the social motivation to reveal less information.
2018
- Kin RevThe Embodied Origins of Infant Reaching: Implications for the Emergence of Eye-Hand CoordinationDaniela Corbetta, Rebecca F. Wiener, Sabrina L. Thurman, and 1 more authorKinesiology Review Jul 2018
This article reviews the literature on infant reaching, from past to present, to recount how our understanding of the emergence and development of this early goal-directed behavior has changed over the decades. We show that the still widely-accepted view, which considers the emergence and development of infant reaching as occurring primarily under the control of vision, is no longer sustainable. Increasing evidence suggests that the developmental origins of infant reaching is embodied. We discuss the implications of this alternative view for the development of eye-hand coordination and we propose a new scenario stressing the importance of the infant body-centered sensorimotor experiences in the months prior to the emergence of reaching as a possible critical step for the formation of eye-hand coordination.