Implications of corpus-based analysis for word representation by heritage speakers of Russian


Thematic Section: Literacy in heritage languages

heritage languages, literacy, reading, writing, pedagogy

Anastasia Vyrenkova, National Research University Higher School of Economics

Word recognition has been widely discussed in psycholinguistics in relation to literacy both for monolingual and bilingual speakers. Among the great variety of experimental findings, the one that is most commonly accepted is that word perception comes either directly from the phonological form or from the semantic meaning. Studying orthographic errors is relevant for both of these paths. In this perspective, corpus data, which have not yet been extensively employed for word recognition analysis, could prove useful (cf. few studies that involve corpora for similar tasks). This study presents a corpus-based research of spelling errors that occur in the written production by heritage speakers of Russian dominant in different languages and shows which implications learner corpora may have for word recognition. The data for the current research comes from the Russian National Corpus (RLC, www.web-corpora.net/RLC) that contains written texts performed by heritage speakers of Russian dominant in 8 languages of different alphabetic systems and orthographic depth. We mainly focus on merged and separate spelling errors, which are relatively frequent in the RLC and structurally heterogeneous, cf., the following examples:
(1) *potomučto (corr. potomu čto)
‘because’ (lit. ‘because that’)
(2) *vsemirnoznamenityj (corr. vsemirno znamenityj)
‘world-famous’ (lit. ‘worldwide renown’)
(3) *mne na dobylo (corr. mne nado bylo)
‘I needed’ (lit. ‘me need was’)

Such errors, however, reveal the trends in word partitioning that may point to within- and between-language factors related to word meanings and phonological representations, and prove relevant for word recognition by heritage speakers of Russian. The results are further compared to L2 and monolingual data.