Research Article: Network motifs for translator stylometry identification

Date Published: February 8, 2019

Publisher: Public Library of Science

Author(s): Heba El-Fiqi, Eleni Petraki, Hussein A. Abbass, Diego Raphael Amancio.


Despite the extensive literature investigating stylometry analysis in authorship attribution research, translator stylometry is an understudied research area. The identification of translator stylometry contributes to many fields including education, intellectual property rights and forensic linguistics. In a two stage process, this paper first evaluates the use of existing lexical measures for the translator stylometry problem. Similar to previous research we found that using vocabulary richness in its traditional form as it has been used in the literature could not identify translator stylometry. This encouraged us to design an approach with the aim of identifying the distinctive patterns of a translator by employing network-motifs. Networks motifs are small sub-graphs which aim at capturing the local structure of a complex network. The proposed approach achieved an average accuracy of 83% in three-way classification. These results demonstrate that classic tools based on lexical features can be used for identifying translator stylometry if they get augmented with appropriate non-parametric scaling. Moreover, the use of complex network analysis and network motifs mining provided made it possible to design features that can solve translator stylometry analysis problems.

Partial Text

A much-debated question about translation is whether the translation is an art, science, or art and science combined. This question is raised due to the very specific nature of the translation task. If a piece of text is being given to two translators to translate, how can their correctness, validity, and accuracy be measured? What causes people to prefer one of these translations over another? Do translators have their own touch or signature in their translations? Or is it the case that if we have a number of valid translations for the same text, all of them are indistinguishable?

Stylometry is the study of the unique linguistic styles and writing behaviours of individuals. Kestemont defined it as the quantitative study of (literary) style, nowadays often accomplished by means of computation [13]. Stylometry can be thought as a measure of the style of a writer, which begs the question of what a style is.

In this study, we follow Baker’s definition of “Translator styles”: “a study of a translator’s style must focus on the manner of expression that is typical of a translator, rather than simply instances of open intervention. It must attempt to capture the translator’s characteristic use of language, his or her individual profile of linguistic habits, compared to other translators” [25]. Her definition of the style as a matter of patterning of linguistic behaviour is what we targeted in this research.

We are targeting the translator style by detecting the repeated patterns in the translator writings. We employ complex network analysis for that purpose.

In this paper, we addressed the challenging problem of translator stylometry, which received limited research attention. We demonstrated that vocabulary richness features can be used to detect translator stylometry, contrary to the claims made by Mikhailov and Villikka [10]. Detecting network motifs in a network can mimic detecting translators’repeated patterns in their writing. Although using network motifs as a stylistic feature failed to identify translators, representing the data using ranking to express the relationship between different usages of the same pattern in comparison to different translators introduced promising results. It provided data transformation that allowed minimized the effect of the original text on the analysis. Some of the generated classifiers achieved accuracy of 97.97%, while the overall average of accuracy reached 79.02% for the case of two translators for the Holy Qur’an corpus. Applying feature selection with the proposed approach achieved an accuracy of 81.08% in the case of three classes (translators) problem on the same dataset. Additionally, network motifs outperformed vocabulary richness as a stylometric feature for the second dataset investigated in this study, which is a Spanish novel, with an average accuracy of 95.10% that reached 100% for a number of cases.




Leave a Reply

Your email address will not be published.