Readers can have different goals with respect to the text they are reading. Can these goals be decoded from the pattern of their eye movements over the text? In this work, we examine for the first time whether it is possible to decode two types of reading goals that are common in daily life: \emphinformation seeking and \emphordinary reading. Using large scale eye-tracking data, we apply to this task a wide range of state-of-the-art models for eye movements and text that cover different architectural and data representation strategies, and further introduce a new model ensemble. We systematically evaluate these models at three levels of generalization: new textual item, new participant, and the combination of both. We find that eye movements contain highly valuable signals for this task. We further perform an error analysis which builds on prior empirical findings on differences between ordinary reading and information seeking and leverages rich textual annotations. This analysis reveals key properties of textual items and participant eye movements that contribute to the difficulty of the task.
CoNLL
The Effect of Surprisal on Reading Times in Information Seeking and Repeated Reading
Keren Klein,
Yoav Meiri,
Omer Shubi,
and Yevgeni Berzak
In Proceedings of the 28th Conference on Computational Natural Language Learning
2024
The effect of surprisal on processing difficulty has been a central topic of investigation in psycholinguistics. Here, we use eyetracking data to examine three language processing regimes that are common in daily life but have not been addressed with respect to this question: information seeking, repeated processing, and the combination of the two. Using standard regime-agnostic surprisal estimates we find that the prediction of surprisal theory regarding the presence of a linear effect of surprisal on processing times, extends to these regimes. However, when using surprisal estimates from regime-specific contexts that match the contexts and tasks given to humans, we find that in information seeking, such estimates do not improve the predictive power of processing times compared to standard surprisals. Further, regime-specific contexts yield near zero surprisal estimates with no predictive power for processing times in repeated reading. These findings point to misalignments of task and memory representations between humans and current language models, and question the extent to which such models can be used for estimating cognitively relevant quantities. We further discuss theoretical challenges posed by these results.
EMNLP
Fine-Grained Prediction of Reading Comprehension from Eye Movements
Omer Shubi,
Yoav Meiri,
Cfir Hadar,
and Yevgeni Berzak
In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
2024
Can human reading comprehension be assessed from eye movements in reading? In this work, we address this longstanding question using large-scale eyetracking data. We focus on a cardinal and largely unaddressed variant of this question: predicting reading comprehension of a single participant for a single question from their eye movements over a single paragraph. We tackle this task using a battery of recent models from the literature, and three new multimodal language models. We evaluate the models in two different reading regimes: ordinary reading and information seeking, and examine their generalization to new textual items, new participants, and the combination of both. The evaluations suggest that the task is highly challenging, and highlight the importance of benchmarking against a strong text-only baseline. While in some cases eye movements provide improvements over such a baseline, they tend to be small. This could be due to limitations of current modelling approaches, limitations of the data, or because eye movement behavior does not sufficiently pertain to fine-grained aspects of reading comprehension processes. Our study provides an infrastructure for making further progress on this question.
From cooking recipes to novels and scientific papers, we often read the same text more than once. How do our eye movements in repeated reading differ from first reading? In this work, we examine this question at scale with L1 English readers via standard eye-movement measures and their sensitivity to linguistic word properties. We analyze consecutive and non-consecutive repeated reading, in ordinary and information-seeking reading regimes. We find sharp and robust reading facilitation effects in repeated reading, and characterize their modulation by the reading regime, the presence of intervening textual material, and the relevance of the information to the task across the two readings. Finally, we examine individual differences in repeated reading effects and find that their magnitude interacts with reading speed, but not with reading proficiency. Our work extends prior findings, providing a detailed empirical picture of repeated reading which could inform future models of eye movements in reading.
CogSci
Eye Movements in Information-Seeking Reading
Omer Shubi,
and Yevgeni Berzak
In Proceedings of the Annual Meeting of the Cognitive Science Society
2023
In this work, we use question answering as a general framework for studying how eye movements in reading reflect the reader’s goals, how they are pursued, and the extent to which they are achieved. We leverage fine-grained annotations of task-critical textual information to perform a detailed comparison of eye movements in information-seeking and ordinary reading regimes. We further examine how eye movements during information seeking relate to question answering behavior. We find that reading times, saccade patterns and sensitivity to the linguistic properties of the text are all strongly and systematically conditioned on the reading task, and further interact with question answering behavior. The observed reading patterns are consistent with a rational account of cognitive resource allocation during task-based reading.
OPMI
Eye Movement Traces of Linguistic Knowledge in Native and Non-Native Reading
Eye movements in reading offer a rich, detailed picture of how language understanding unfolds in real time. Decades of research have demonstrated the sensitivity and quantitative functional form of how readers’ eye movements are influenced by the linguistic characteristics of the words being read and their relationship with context. However, most of this work has examined only reading by native (L1) speakers, even though much of the world’s population is multilingual, and non-native (L2) reading is a ubiquitous everyday activity. Here we present an analysis of eye movements in reading in a dataset containing a large and linguistically diverse sample of English L2 readers, including a quantitative characterization of the shape of the relationship between linguistic word properties and eye movements, and how this relationship relates to the reader’s independently measured L2 proficiency.
Our key result is that while many of the same qualitative effects are found in L2 readers as in L1 readers, we also find a “lexicon-context tradeoff” that is sensitive to a reader’s L2 proficiency. L2 readers’ eye movements are generally less sensitive to a word’s relationship with its context and more sensitive to the word’s intrinsic properties. However, the most proficient L2 readers’ eye movements approach an L1 pattern. This tradeoff supports an experience-dependent account of the speed and efficiency with which context-driven expectations can be deployed in L2 language processing, with a proficiency driven gradual shift away from lexicon-dependent processing and towards contextual processing.
EMNLP
The Aligned Multimodal Movie Treebank: An Audio, Video, Dependency-Parse Treebank
Adam Yaari,
Jan DeWitt,
Henry Hu,
Bennett Stankovits,
Sue Felshin,
Yevgeni Berzak,
Helena Aparicio,
Boris Katz,
Ignacio Cases,
and Andrei Barbu
In Proceedings of the Conference on Empirical Methods in Natural Language Processing
2022
Treebanks have traditionally included only text and were derived from written sources such as newspapers or the web. We introduce the Aligned Multimodal Movie Treebank (AMMT), an English language treebank derived from dialog in Hollywood movies which includes transcriptions of the audiovisual streams with word-level alignment, as well as part of speech tags and dependency parses in the Universal Dependencies (UD) formalism. AMMT consists of 31,264 sentences and 218,090 words, that will amount to the 3rd largest UD English treebank and the only multimodal treebank in UD. We find that parsers on this dataset often have difficulty with conversational speech and that they often rely on punctuation which is often not available from speech recognizers. To help with the web-based annotation effort, we also introduce the Efficient Audio Alignment Annotator (EAAA), a companion tool that enables annotators to significantly speed-up their annotation processes.
OPMI
CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading
Yevgeni Berzak,
Chie Nakamura,
Amelia Smith,
Emily Weng,
Boris Katz,
Suzanne Flynn,
and Roger Levy
We present CELER (Corpus of Eye Movements in L1 and L2 English Reading), a broad coverage eye-tracking corpus for English. CELER comprises over 320,000 words, and eye-tracking data from 365 participants. Sixty-nine participants are L1 (first language) speakers, and 296 are L2 (second language) speakers from a wide range of English proficiency levels and five different native language backgrounds. As such, CELER has an order of magnitude more L2 participants than any currently available eye movements dataset with L2 readers. Each participant in CELER reads 156 newswire sentences from the Wall Street Journal (WSJ), in a new experimental design where half of the sentences are shared across participants and half are unique to each participant. We provide analyses that compare L1 and L2 participants with respect to standard reading time measures, as well as the effects of frequency, surprisal, and word length on reading times. These analyses validate the corpus and demonstrate some of its strengths. We envision CELER to enable new types of research on language processing and acquisition, and to facilitate interactions between psycholinguistics and natural language processing (NLP).
CogSci
Eye Movement Traces of Linguistic Knowledge
Yevgeni Berzak,
and Roger Levy
In Proceedings of the Annual Meeting of the Cognitive Science Society
2021
This study examines how linguistic knowledge is manifested in eye movements in reading, focusing on the effect of two key word properties: frequency and surprisal, on three progressively longer standard fixation measures: First Fixation, Gaze Duration and Total Fixation. Comparing English L1 speakers to a large and linguistically diverse group of English L2 speakers, we obtain the following results. 1) Word property effects on reading times are larger in L2 than in L1. 2) Differences between L1 and L2 speakers are substantially larger in the response to frequency than to surprisal. 3) The functional form of the relation between fixation times and frequency and surprisal in L2 is superlinear. 4) In L2 speakers, proficiency modulates frequency effects as a U shaped function. We discuss the implications of these results on theory of language processing and acquisition, as well as the general interpretation of frequency and surprisal effects in reading.
CoNLL
Predicting Text Readability from Scrolling Interactions
Sian Gooding,
Yevgeni Berzak,
Tony Mak,
and Matt Sharifi
In Proceedings of the 25th Conference on Computational Natural Language Learning
2021
Judging the readability of text has many important applications, for instance when performing text simplification or when sourcing reading material for language learners. In this paper, we present a 518 participant study which investigates how scrolling behaviour relates to the readability of a text. We make our dataset publicly available and show that (1) there are statistically significant differences in the way readers interact with text depending on the text level, (2) such measures can be used to predict the readability of text, and (3) the background of a reader impacts their reading interactions and the factors contributing to text difficulty.
CoNLL
Bridging Information-Seeking Human Gaze and Machine Reading Comprehension
Jonathan Malmaud,
Roger Levy,
and Yevgeni Berzak
In Proceedings of the 24th Conference on Computational Natural Language Learning
2020
In this work, we analyze how human gaze during reading comprehension is conditioned on the given reading comprehension question, and whether this signal can be beneficial for machine reading comprehension. To this end, we collect a new eye-tracking dataset with a large number of participants engaging in a multiple choice reading comprehension task. Our analysis of this data reveals increased fixation times over parts of the text that are most relevant for answering the question. Motivated by this finding, we propose making automated reading comprehension more human-like by mimicking human information-seeking reading behavior during reading comprehension. We demonstrate that this approach leads to performance gains on multiple choice question answering in English for a state-of-the-art reading comprehension model.
CoNLL
Classifying Syntactic Errors in Learner Language
Leshem Choshen,
Dmitry Nikolaev,
Yevgeni Berzak,
and Omri Abend
In Proceedings of the 24th Conference on Computational Natural Language Learning
2020
We present a method for classifying syntactic errors in learner language, namely errors whose correction alters the morphosyntactic structure of a sentence. The methodology builds on the established Universal Dependencies syntactic representation scheme, and provides complementary information to other error-classification systems. Unlike existing error classification methods, our method is applicable across languages, which we showcase by producing a detailed picture of syntactic errors in learner English and learner Russian. We further demonstrate the utility of the methodology for analyzing the outputs of leading Grammatical Error Correction (GEC) systems.
ACL
STARC: Structured Annotations for Reading Comprehension
Yevgeni Berzak,
Jonathan Malmaud,
and Roger Levy
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
2020
We present STARC (Structured Annotations for Reading Comprehension), a new annotation framework for assessing reading comprehension with multiple choice questions. Our framework introduces a principled structure for the answer choices and ties them to textual span annotations. The framework is implemented in OneStopQA, a new high-quality dataset for evaluation and analysis of reading comprehension in English. We use this dataset to demonstrate that STARC can be leveraged for a key new application for the development of SAT-like reading comprehension materials: automatic annotation quality probing via span ablation experiments. We further show that it enables in-depth analyses and comparisons between machine and human reading comprehension behavior, including error distributions and guessing ability. Our experiments also reveal that the standard multiple choice dataset in NLP, RACE, is limited in its ability to measure reading comprehension. 47% of its questions can be guessed by machines without accessing the passage, and 18% are unanimously judged by humans as not having a unique correct answer. OneStopQA provides an alternative test set for reading comprehension which alleviates these shortcomings and has a substantially higher human ceiling performance.
CL
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
Edoardo Maria Ponti,
Helen O’Horan,
Yevgeni Berzak,
Ivan Vulić,
Roi Reichart,
Thierry Poibeau,
Ekaterina Shutova,
and Anna Korhonen
Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.
NAACL
Assessing Language Proficiency from Eye Movements in Reading
Yevgeni Berzak,
Boris Katz,
and Roger Levy
In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
2018
We present a novel approach for determining learners’ second language proficiency which utilizes behavioral traces of eye movements during reading. Our approach provides stand-alone eyetracking based English proficiency scores which reflect the extent to which the learner’s gaze patterns in reading are similar to those of native English speakers. We show that our scores correlate strongly with standardized English proficiency tests. We also demonstrate that gaze information can be used to accurately predict the outcomes of such tests. Our approach yields the strongest performance when the test taker is presented with a suite of sentences for which we have eyetracking data from other readers. However, it remains effective even using eyetracking with sentences for which eye movement data have not been previously collected. By deriving proficiency as an automatic byproduct of eye movements during ordinary reading, our approach offers a potentially valuable new tool for second language proficiency assessment. More broadly, our results open the door to future methods for inferring reader characteristics from the behavioral traces of reading.
EMNLP
Grounding Language Acquisition by Training Semantic Parsers Using Captioned Videos
Candace Ross,
Andrei Barbu,
Yevgeni Berzak,
Battushig Myanganbayar,
and Boris Katz
In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
2018
We develop a semantic parser that is trained in a grounded setting using pairs of videos captioned with sentences. This setting is both data-efficient, requiring little annotation, and similar to the experience of children where they observe their environment and listen to speakers. The semantic parser recovers the meaning of English sentences despite not having access to any annotated sentences. It does so despite the ambiguity inherent in vision where a sentence may refer to any combination of objects, object properties, relations or actions taken by any agent in a video. For this task, we collected a new dataset for grounded language acquisition. Learning a grounded semantic parser—turning sentences into logical forms using captioned videos—can significantly expand the range of data that parsers can be trained on, lower the effort of training a semantic parser, and ultimately lead to a better understanding of child language acquisition.
MIT
Second Language Learning from a Multilingual Perspective
How do people learn a second language? In this thesis, we study this question through an examination of cross-linguistic transfer: the role of a speaker’s native language in the acquisition, representation, usage and processing of a second language. We present a computational framework that enables studying transfer in a unified fashion across language production and language comprehension. Our framework supports bidirectional inference between linguistic characteristics of speakers’ native languages, and the way they use and process a new language. We leverage this inference ability to demonstrate the systematic nature of cross-linguistic transfer, and to uncover some of its key linguistic and cognitive manifestations. We instantiate our framework in language production by relating syntactic usage patterns and grammatical errors in English as a Second Language (ESL) to typological properties of the native language, showing its utility for automated typology learning and prediction of second language grammatical errors. We then introduce eye tracking during reading as a methodology for studying cross-linguistic transfer in second language comprehension. Using this methodology, we demonstrate that learners’ native language can be predicted from their eye movement while reading free-form second language text. Further, we show that language processing during second language comprehension is intimately related to linguistic characteristics of the reader’s first language. Finally, we introduce the Treebank of Learner English (TLE), the first syntactically annotated corpus of learner English. The TLE is annotated with Universal Dependencies (UD), a framework geared towards multilingual language analysis, and will support linguistic and computational research on learner language. Taken together, our results highlight the importance of multilingual approaches to the scientific study of second language acquisition, and to Natural Language Processing (NLP) applications for non-native language.
ACL
Predicting Native Language from Gaze
Yevgeni Berzak,
Chie Nakamura,
Suzanne Flynn,
and Boris Katz
In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics
2017
A fundamental question in language learning concerns the role of a speaker’s first language in second language acquisition. We present a novel methodology for studying this question: analysis of eye-movement patterns in second language reading of free-form text. Using this methodology, we demonstrate for the first time that the native language of English learners can be predicted from their gaze fixations when reading English. We provide analysis of classifier uncertainty and learned features, which indicates that differences in English reading are likely to be rooted in linguistic divergences across native languages. The presented framework complements production studies and offers new ground for advancing research on multilingualism.
COLING
Survey on the Use of Typological Information in Natural Language Processing
Helen O’Horan,
Yevgeni Berzak,
Ivan Vulić,
Roi Reichart,
and Anna Korhonen
In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
2016
In recent years linguistic typology, which classifies the world’s languages according to their functional and structural properties, has been widely used to support multilingual NLP. While the growing importance of typological information in supporting multilingual tasks has been recognised, no systematic survey of existing typological resources and their use in NLP has been published. This paper provides such a survey as well as discussion which we hope will both inform and inspire future work in the area.
EMNLP
Anchoring and agreement in syntactic annotations
Yevgeni Berzak,
Yan Huang,
Andrei Barbu,
Anna Korhonen,
and Boris Katz
In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
2016
We present a study on two key characteristics of human syntactic annotations: anchoring and agreement. Anchoring is a well known cognitive bias in human decision making, where judgments are drawn towards pre-existing values. We study the influence of anchoring on a standard approach to creation of syntactic resources where syntactic annotations are obtained via human editing of tagger and parser output. Our experiments demonstrate a clear anchoring effect and reveal unwanted consequences, including overestimation of parsing performance and lower quality of annotations in comparison with human-based annotations. Using sentences from the Penn Treebank WSJ, we also report systematically obtained inter-annotator agreement estimates for English dependency parsing. Our agreement results control for parser bias, and are consequential in that they are on par with state of the art parsing performance for English newswire. We discuss the impact of our findings on strategies for future annotation efforts and parser evaluations.
ACL
Universal Dependencies for Learner English
Yevgeni Berzak,
Jessica Kenney,
Carolyn Spadine,
Jing Xian Wang,
Lucia Lam,
Keiko Sophie Mori,
Sebastian Garza,
and Boris Katz
In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics
2016
We introduce the Treebank of Learner English (TLE), the first publicly available syntactic treebank for English as a Second Language (ESL). The TLE provides manually annotated POS tags and Universal Dependency (UD) trees for 5,124 sentences from the Cambridge First Certificate in English (FCE) corpus. The UD annotations are tied to a pre-existing error annotation of the FCE, whereby full syntactic analyses are provided for both the original and error corrected versions of each sentence. Further on, we delineate ESL annotation guidelines that allow for consistent syntactic treatment of ungrammatical English. Finally, we benchmark POS tagging and dependency parsing performance on the TLE dataset and measure the effect of grammatical errors on parsing accuracy. We envision the treebank to support a wide range of linguistic and computational research on second language acquisition as well as automatic processing of ungrammatical language. The treebank is available at universaldependencies.org. The annotation manual used in this project and a graphical query engine are available at esltreebank.org.
EMNLP
Do You See What I Mean? Visual Resolution of Linguistic Ambiguities
Yevgeni Berzak,
Andrei Barbu,
Daniel Harari,
Boris Katz,
and Shimon Ullman
In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
2015
Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.
CoNLL
Contrastive Analysis with Predictive Power: Typology Driven Estimation of Grammatical Error Distributions in ESL
Yevgeni Berzak,
Roi Reichart,
and Boris Katz
In Proceedings of the Nineteenth Conference on Computational Natural Language Learning
2015
This work examines the impact of cross-linguistic transfer on grammatical errors in English as Second Language (ESL) texts. Using a computational framework that formalizes the theory of Contrastive Analysis (CA), we demonstrate that language specific error distributions in ESL writing can be predicted from the typological properties of the native language and their relation to the typology of English. Our typology driven model enables to obtain accurate estimates of such distributions without access to any ESL data for the target languages. Furthermore, we present a strategy for adjusting our method to low-resource languages that lack typological documentation using a bootstrapping approach which approximates native language typology from ESL texts. Finally, we show that our framework is instrumental for linguistic inquiry seeking to identify first language factors that contribute to a wide range of difficulties in second language acquisition.
CoNLL
Reconstructing Native Language Typology from Foreign Language Usage
Yevgeni Berzak,
Roi Reichart,
and Boris Katz
In Proceedings of the Eighteenth Conference on Computational Natural Language Learning
2014
Linguists and psychologists have long been studying cross-linguistic transfer, the influence of native language properties on linguistic performance in a foreign language. In this work we provide empirical evidence for this process in the form of a strong correlation between language similarities derived from structural features in English as Second Language (ESL) texts and equivalent similarities obtained from the typological features of the native languages. We leverage this finding to recover native language typological similarity structure directly from ESL text, and perform prediction of typological features in an unsupervised fashion with respect to the target languages. Our method achieves 72.2% accuracy on the typology prediction task, a result that is highly competitive with equivalent methods that rely on typological resources.