Brain TreeBank

About

We present the Brain Treebank, a large-scale dataset of electrophysiological neural responses, recorded from intracranial probes while 10 subjects watched one or more Hollywood movies. Subjects watched on average 2.6 Hollywood movies, for an average viewing time of 4.3 hours, and a total of 43 hours. The audio track for each movie was transcribed with manual corrections. Word onsets were manually annotated on spectrograms of the audio track for each movie. Each transcript was automatically parsed and manually corrected into the universal dependencies (UD) formalism, assigning a part of speech to every word and a dependency parse to every sentence. In total, subjects heard over 38,000 sentences (223,000 words), while they had on average 168 electrodes implanted. This is the largest dataset of intracranial recordings featuring grounded naturalistic language, one of the largest English UD treebanks in general, and one of only a few UD treebanks aligned to multimodal features. We hope that this dataset serves as a bridge between linguistic concepts, perception, and their neural representations. To that end, we present an analysis of which electrodes are sensitive to language features while also mapping out a rough time course of language processing across these electrodes.

See our technical paper and quickstart guide (below) for technical details.

Download links

In total, all combined data is ~130GB (zipped).
Link Description
quickstart.ipynb Quickstart IPython Notebook
localization.zip Spatial position of electrodes
subject_timings.zip Wall clock time of triggers used for synchronization with movie
subject_metadata.zip Movie metadata
electrode_labels.zip Semantic ID for electrodes
speaker_annotations.zip Speaker IDs for movie audio
scene_annotations.zip Scene cut annotations for movies
transcripts.zip Pre-computed features for movies
trees.zip Universal Dependency parse trees for movie dialogue
sub_1_trial000.h5.zip
sub_1_trial001.h5.zip
sub_1_trial002.h5.zip
Subject 1
sub_2_trial000.h5.zip
sub_2_trial001.h5.zip
sub_2_trial002.h5.zip
sub_2_trial003.h5.zip
sub_2_trial004.h5.zip
sub_2_trial005.h5.zip
sub_2_trial006.h5.zip
Subject 2
sub_3_trial000.h5.zip
sub_3_trial001.h5.zip
sub_3_trial002.h5.zip
Subject 3
sub_4_trial000.h5.zip
sub_4_trial001.h5.zip
sub_4_trial002.h5.zip
Subject 4
sub_5_trial000.h5.zip
Subject 5
sub_6_trial000.h5.zip
sub_6_trial001.h5.zip
sub_6_trial004.h5.zip
Subject 6
sub_7_trial000.h5.zip
sub_7_trial001.h5.zip
Subject 7
sub_8_trial000.h5.zip
Subject 8
sub_9_trial000.h5.zip
Subject 9
sub_10_trial000.h5.zip
sub_10_trial001.h5.zip
Subject 10
Data released under a CC BY 4.0 license