Automatic Dating of Hebrew Manuscripts from the Cairo Genizah

Israel Ministry of Science & Technology research grant, Digital Humanities (320,000 NIS for 3 years)

Dr. Daria Vasyutinsky Shapira, The Open University of Israel (postdoc)
Prof. Ophir Münz-Manor, The Open University of Israel

This project aims to develop deep machine learning algorithms for automatic dating of Hebrew liturgical poems (piyyutim) from the storerooms (genizot) of synagogues where out-of-use manuscripts were kept. We research the manuscripts from the Cairo Genizah, using datasets our group had already built during previous studies. Currently, many libraries and archives digitize significant collections of manuscripts, and thousands of manuscript images are available. Thus, one of the primary desiderata of digital historical and culturological research is finding efficient new methods for studying these collections. Determining the date of copying for unrecognized digitized manuscripts is one of the most desired among these new techniques.
Deep machine learning for image processing is a cutting-edge technology in digital manuscript research. This project is at the forefront of digital humanities, as we explore problems that have not been solved anywhere in the world. We are discovering the possibilities of semi-supervised and unsupervised algorithms on hard- and soft-labeled datasets. 
Our research is supervised by Prof. Ophir Münz-Manor in collaboration with the DHSS Hub at the Open University, the Visual Media Lab at Ben-Gurion University, and the National Library of Israel.
Preliminary results:
Malachi Beit-Arié and his team described the dated Hebrew manuscripts in the Sfardata database kept at the National Library of Israel. The Visual Media Lab team and I extracted the Sfardata for research purposes and are currently training the deep learning algorithms on this new dataset. We will then incorporate the few existing dated manuscripts from the Cairo Genizah into the train sets. After the models are successfully pretrained on the existing datasets, we will apply them to the more considerable genizah data of poetic manuscripts.
1  Ahmad Droby, Irina Rabaev, Daria Vasyutinsky Shapira, Berat Kurar Barakat and Jihad El-Sana. “Digital Hebrew Paleography: Script Types and Modes.” Journal of Imaging 2022, 8(5), 143; doi:10.3390/jimaging8050143