Scientific dataHumansLanguageMultilingualismLinguisticsSocial Interaction

A curated global dataset of social contact between diverse language communities.

Eri Kashima, Francesca Di Garbo, Oona Raatikainen, Robert Forkel, Rosnátaly Avelino, Sacha Beck, Anna Berge, Ana Blanco Pena, Ross Bowden, Nicolás Brid, Joseph M Brincat, María Belén Carpio, Alexander Cobbinah, Paola Cúneo, Deginet Wotango Doyiso

Published: 202510.1038/s41597-025-06192-1

Abstract

Open Access

The GramAdapt Social Contact Dataset is a curated dataset of 34 language pairs with qualitative and quantifiable data on social interaction and aspects of societal multilingualism. The language pairs were sampled globally to represent the world's linguistic diversity. The dataset can be used to interrogate the social dimensions of language contact independently or in conjunction with appropriate linguistic data. The data were collected by distributing a questionnaire to experts who have experience with either one or both of the language communities of a pair. The data represent subjective expert assessments based on choices from predetermined answers which can be quantified. Authors 1, 2 and 3 manually checked the response to identify possible misjudgments or misunderstandings. This results in a dataset containing 13,493 data points. This dataset is a first of its kind in the field of linguistics, built upon wide findings from sociolinguistics, historical linguistics, psycholinguistics, and linguistic anthropology.

View at DOI