Context and Problem Statement

Nahuatl was once a major language in Latin America, but its status declined dramatically after the Spanish conquest. The colonial era tarnished its reputation, and Spanish became the imposed language of communication. As a result, Nahuatl is now overshadowed in Mexico, and its speakers often face social stigma associated with their indigenous heritage. This stigma is compounded by the lack of Nahuatl in crucial areas like public education, where the language is not taught, making it difficult for speakers to communicate with those who only know Spanish. Additionally, economic and social challenges have led many indigenous people to migrate abroad in search of better opportunities, further diminishing the number of Nahuatl speakers.

The Totonacs are an indigenous group in Mexico, primarily settled in the state of Veracruz, as well as in Puebla and Hidalgo. Their language, Totonac, belongs to the Totonacan-Tepehua language family and has several dialectal variants. Like many other indigenous cultures, were significantly affected by the Spanish conquest and subsequent colonization, leading to profound changes in their social, economic, and cultural structures. One cultural aspect worth highlighting is the ritual ceremony of the flyers, which is a complex system of ritual processes and a dance associated with fertility that contributes to social cohesion and social capital among the Totonacs, and which has spread to various ethnic groups in Mexico and Central America. This ritual was declared “Intangible Heritage of Humanity” by UNESCO.

The government has implemented various initiatives to support the preservation of indigenous languages, including policy measures and educational programs. However, these efforts have not been sufficient to ensure the long-term conservation of these languages. To effectively safeguard and revitalize indigenous languages, a more integrated approach is required. This involves a combined effort from the government, universities, and society as a whole. Collaboration across these sectors is essential to create comprehensive strategies, provide resources, and foster community engagement that will support the sustained preservation and growth of these valuable linguistic heritage.

Nahuatl and Totonaco, two of the most widely spoken indigenous languages in the State of Puebla, have numerous regional variations. However, there is a significant lack of comprehensive, high-quality datasets that capture these variations. Existing datasets are often fragmented, outdated, or lack the necessary linguistic diversity, which hampers efforts in natural language processing (NLP), machine learning, and cultural preservation.

 
In order to illustrate the diversity, here are five variants of Nahuatl in Puebla and their autodenominations:

VARIANT AUTODENOMINATION
náhuatl de la Sierra, noreste de Puebla mexicano tlajtol, nauta
náhuatl de la Sierra oeste de Puebla masehual tla’tol
náhuatl alto del norte de Puebla maseual tajtol, nahuat
náhuatl del centro de Puebla mexicano (del centro de Puebla)
mexicano del oriente de Puebla mexicano (del oriente de Puebla)

 
Here are four variants of Totonaco in Puebla and their autodenominations:

VARIANT AUTODENOMINATION
totonaco central del norte tachaqawaxti, tutunakuj, tachiwiin
totonaco del cerro Xinolatépetl kintachiuinkan
totonaco del río Necaxa totonaco (del río Necaxa)
totonaco central del sur tutunáku (central del sur), tutunakú, totonaco (central del sur)

Scroll to Top