Research Article: Language Structure Is Partly Determined by Social Structure

Date Published: January 20, 2010

Publisher: Public Library of Science

Author(s): Gary Lupyan, Rick Dale, Dennis O’Rourke.

Abstract: Languages differ greatly both in their syntactic and morphological systems and in the social environments in which they exist. We challenge the view that language grammars are unrelated to social environments in which they are learned and used.

Partial Text: Although the largest languages are spoken by millions of people spread over vast geographic areas, most languages are spoken by relatively few individuals over comparatively small areas. The median number of speakers for the 6,912 languages catalogued by the Ethnologue is only 7,000, compared to the mean of over 828,000 [1]. Similarly, for the 2,236 languages in our sample (Figure 1), the median area over which a language is spoken is about the size of Luxembourg or San Diego, California (948 km2). The mean area is about the size of Austria or the US state of Maryland (33,795 km2). Languages also differ dramatically in the proportion of individuals who speak the language natively (L1 speakers) to those who learned it later in life (L2 speakers) (Table S1). Although there are numerous counter-examples (Text S1), languages spoken by millions of people have a greater likelihood of coming into contact with other languages and of having numerous nonnative speakers compared to languages spoken by only a few thousand people. This is not surprising: a language spoken by more people is more likely to encompass a larger and more diverse area and include speakers from varying ethnic and linguistic backgrounds. Conversely, languages spoken by a thousand or even fewer individuals tend to be spoken in highly circumscribed locales (Text S2). Overall, languages with smaller speaker populations are more likely to be spoken by more socially cohesive groups [2] than languages that have millions of speakers.

To assess relationships between social and linguistic structure we constructed a dataset that combined social/demographic and typological information for 2,236 languages. Grammatical information was obtained from the World Atlas of Language Structures (WALS) [32]—a database of structural properties of language compiled from descriptive materials such as reference grammars. The full dataset was constructed by combining typological data from WALS with the following demographic variables: speaker population, geographic spread, and number of linguistic neighbors derived from Ethnologue [1] and the Global Mapping Institute [33] (see Text S5, containing analyses that demonstrate representativeness of the sample). Although WALS includes over 2,000 languages, most languages are only defined on a small number of linguistic features.

Languages that are on the exoteric side of esoteric-exoteric continuum—as indicated by larger speaker populations, greater geographical coverage, and greater degree of contact with other languages—had overall simpler morphological systems, more frequently express semantic distinctions using lexical means, and were overall less grammatically specified. This was true both for quantitative grammatical measures such as the number of different grammatical categories encoded by verbal inflections (feature 6) and case markings, as well as for qualitative grammatical types. For example, languages spoken in the exoteric niche were associated with a lack of conventional strategies for encoding semantic distinctions like situational/epistemic possibility, evidentiality, the optative, indefiniteness, the future tense, and both distance contrasts in demonstratives (consider the rarity of the English “over yonder”) and remoteness distinctions in the past tense.

We used three socio-demographic variables as proxies for esotericity: speaker population, geographic spread, and degree of inter-language contact. Speaker population data for each language was retrieved from the Ethnologue [1] and included the summed total of speakers in all the countries in which the language is spoken. Total area (km2) for each language was calculated from data provided by Global Mapping International [33]. Inter-linguistic contact was calculated based on languages boundaries: for each language we counted the number of languages contained in, overlapping with, or contacting the area polygons of other languages. Linguistic data was retrieved from WALS [32]. We selected linguistic features most relevant to inflectional morphology. Details are presented below.



