OneStop: A 360-Participant English Eye Tracking Dataset with Different Reading Regimes#
📄 Paper | 📚 Documentation | 💾 Data (Coming Soon) | 🔬 More from LaCC Lab
Example#
Overview#
OneStop Eye Movements (in short OneStop) is a large-scale English corpus of eye movements in reading with 360 L1 participants and 2.6 million word tokens. The dataset was collected using an EyeLink 1000 Plus eyetracker (SR Research).
OneStop comprises four sub-corpora, one for each of the following reading regimes:
Ordinary reading for comprehension
Information seeking
Repeated reading
Information seeking in repeated reading
We provide the entire corpus, as well as each of the sub-corpora separately. If you are looking for a general purpose eye tracking corpus (like Dundee, GECO, MECO and others), we recommend downloading the ordinary reading sub-corpus.
Key Features#
Texts and Reading Comprehension Materials#
30 articles with 162 paragraphs in English from the Guardian.
Annotations of part-of-speech tags, syntactic dependency trees, word frequency and word surprisal.
Each paragraph has two versions: an Advanced version (original Guardian text) and a simplified Elementary version.
Extensively piloted reading comprehension questions based on the STARC (Structured Annotations for Reading Comprehension) annotation framework.
3 multiple-choice reading comprehension questions per paragraph.
486 reading comprehension questions in total.
Auxiliary text annotations for answer choices.
Statistics#
Statistics of OneStop and other public broad-coverage eyetracking datasets for English L1.
Category |
Dataset |
Subjects |
Age |
Words |
Words Recorded |
Qestions |
Subjects per Question |
Questions per Subject |
---|---|---|---|---|---|---|---|---|
Reading Comprehension |
OneStop |
360 |
22.8±5.6 |
19,425 (Advanced) |
2,632,159 (Paragraphs) |
486 |
20 |
54 |
66 |
NA |
2,539 |
167,574 |
20 |
95 |
20 |
||
Passages |
Dundee |
10 |
NA |
51,502 |
307,214 |
NA |
10 |
NA |
14 |
21.8±5.6 |
56,410 |
774,015 |
NA |
14 |
NA |
||
84 |
NA |
2,689 |
225,624 |
0 |
0 |
0 |
||
46 |
21.0±2.2 |
2,109 |
83,246 |
48 |
46 |
48 |
||
Sentences |
69 |
26.3±6.7 |
61,233 |
122,423 |
78 |
69 |
78 |
|
18 |
34.3±8.0 |
15,138 |
272,484 |
42 |
18 |
42 |
||
43 |
25.8±7.5 |
1,932 |
81,144 |
110 |
43 |
110 |
‘Reading Comprehension’ are datasets with a substantial reading comprehension component over piloted reading comprehension materials. The remaining datasets are general purpose datasets over passages or individual sentences. ‘Words’ is the number words in the textual corpus. ‘Words Recorded’ is the number of word tokens for which tracking data was collected. ‘NA’: data not available.
Controlled experimental manipulations#
Reading goal: ordinary reading for comprehension or information seeking.
Prior exposure to the text: first reading or repeated reading.
Amount of textual material between first and repeated reading: consecutive or non-consecutive article presentation (2-9 articles).
Paragraph difficulty level: original Guardian article (Advanced) or simplified (Elementary).
Question identity: one of three possible questions for each paragraph.
Experiment Structure#
Trial Structure#
Pages presented only in the information seeking regime are depicted in green.
Obtaining the Data#
The data is not yet available. Instructions for downloading the data will be updated here.
There are several ways to obtain the data:
Direct Download from OSF#
The data is hosted on OSF. We provide the possibility to download the entire dataset, or any of four sub-corpora:
Ordinary reading (download this data if you are interested in a general purpose eyetracking dataset)
Python Script#
The data can also be downloaded using the provided Python script. The script will download and extract the data files.
Basic usage to download the entire dataset: 0. Make sure you have python installed. - If you don’t have python installed, you can download it from here.
Get the Code
Open your terminal/command prompt:
Windows: Press
Win + R
, typecmd
and press EnterMac: Press
Cmd + Space
, typeterminal
and press Enter
Run this command to download the code:
git clone https://github.com/lacclab/OneStop-Movements.git
Move into the downloaded folder:
cd OneStop-Movements
Run the Download Script
Run this command to download the full dataset:
python onestop/download_data_files.py
The data will be downloaded to a folder called “OneStop”
Available options:
--extract
: Extract downloaded zip files (default: True)--asc
: Download ASC files (default: False)--edf
: Download EDF files (default: False)-o, --output-folder
: Specify output folder (default: “OneStop”)--mode
: Choose dataset version to download (default: “full”)Options: “full”, “repeated”, “information-seeking”, “ordinary”, “information-seeking-in-repeated”
Download Specific Parts (Optional)
Example usage to download only the ordinary reading subset:
python onestop/download_data_files.py --mode ordinary
Documentation Structure#
The documentation is organized into the following sections:
Data Files and Variables: Provides detailed information about the data files and variables used in the project.
Known Issues: Documents any known issues.
Scripts: Contains scripts for data preprocessing and analysis reproduction.
Citation#
Paper: OneStop: A 360-Participant English Eye Tracking Dataset with Different Reading Regimes
@article{berzak2025onestop,
title={Onestop: A 360-participant english eye tracking dataset with different reading regimes},
author={Berzak, Yevgeni and Malmaud, Jonathan and Shubi, Omer and Meiri, Yoav and Lion, Ella and Levy, Roger},
journal={PsyArXiv preprint},
year={2025}
}
License#
The data and code are licensed under a Creative Commons Attribution 4.0 International License.