Computational stylometry is the quantitative study of recurring features in our language – those that we use automatically and are as unique to each one of us as our fingerprints.
Stylistic signals are used in forensic as well as literary contexts to determine authorship, but have also proven to be a lens through which one can observe patterns and trends in literature regarding large-scale cultural and social phenomena. Computational stylometry serves as a powerful method in the course of “distant reading,” which, following Franco Moretti, offers a new approach to the analysis of literary texts, one that replaces the selective reading of a canon.
The primary purpose of this proposal is to leverage stylometric methods in order to contribute a new perspective on the study of modern Hebrew prose. Many scholars have offered historiographical perspectives on modern Hebrew literature; however, these studies have necessarily been based upon limited samples and the interconnections therein. The advent of computational stylometry, and the “distant reading” approach which it allows, call for a reevaluation of the historiography of modern Hebrew literature. Instead of focusing on a small set of representative works, we can now evaluate a comprehensive corpus of thousands of texts all at once, generating a specific stylistic profile for each book, as well as for groups of books (grouped by author, by period of writing, by authors’ gender or age, geographical regions of writing and linguistic backgrounds) and mapping the proximity and distance between each of these profiles.
Such a project entails the exploration of new methods for Hebrew stylometry. The field of stylometry is still dominantly engaged with author attribution, and leverages mainly those features which are most easily accessible from a computational point of view, usually focusing on the frequency of individual words. In order to represent a stylistic profile in a meaningful literary fashion, we must take account of more complex features, and refine a typology of the distinctive frequent features. Our approach will include computational analysis of linguistics features such as morphology and syntax; semantic features that represent temporal and spatial dimensions and relations; and soundplay figures such as alliteration and assonance. In order to accomplish this, we will leverage and develop cutting-edge natural language processing algorithms for the automatic extraction of these features from each text, in order to relate them to the metadata of the work and the author, and ultimately to run the clustering and classification algorithms which will isolate the features of any given stylistic profile.
All in all, we expect that our project will comprise a substantial contribution both to Hebrew literary stylometry and to the historiography of Hebrew literature. Furthermore, the algorithms which we plan to develop will form a foundation for the automated analysis of any modern Hebrew text, and will contribute to the ever-growing field of Hebrew natural language processing.