Classifying and diacritizing Arabic poems using deep recurrent neural networks



Poetry has a prominent history in Arabic literature. The classical Arabic poetry has 16 m that vary in rhythm and target purpose. Chanting a poem eloquently requires knowing the poem’s meter and obtaining a diacritized version of its verses (letters inscribed with their short vowels); diacritics are often not inscribed in Arabic texts. This work proposes solutions to classify input Arabic text into the 16 poetry meters and prose. It also investigates the automatic diacritization of Arabic poetry. We adopt machine learning approach using a large dataset of 1657 k verses of poems and prose to develop neural networks to classify and diacritize Arabic poetry. We propose deep and narrow recurrent neural networks with bidirectional long short-term memory cells for solving these problems. The proposed model classifies the input text with an average accuracy of 97.27%, which is significantly higher than previous work. We also propose a solution that achieves an accuracy that approaches 100% when multiple verses of the same poem are available through predicting the class from the aggregate probabilities of the multiple verses. Diacritizing poetry is much harder than diacritizing prose due to the poet’s meticulous selection of phrases and relaxation of some diacritization rules.​