A team of Israeli researchers has developed a machine translation model capable of translating cuneiform tablets from ancient Mesopotamia into English. A real challenge, because the signs of cuneiform writing can be read in many different ways. Published in the magazine PNAS Nexus, the results of several tests carried out to verify the sharpness of the translations produced by the artificial intelligence network are not convincing, as they indicate that the machine is capable of directly translating cuneiform signs into Latin letters without a step of transliteration. . This is a new path for academics, because this kind of collaboration between humans and machines – where humans complete the first reading provided by the machine – will make it possible to advance research by making hundreds of thousands of clays accessible. Pills are still waiting to be understood.
Cuneiform tablets will soon be translated by artificial intelligence
Tablets found at archaeological sites in ancient Mesopotamia are rich with information about the civilizations of the region; Therefore, it is important to understand the texts written in cuneiform script, Sumerian in the south, and Akkadian in the north (2700 BC – 75 AD). Considering the number of Akkadian language tablets, whose contents are still unknown due to the time for their decipherment and translation, Shai Kardin, a team of computer researchers and experts in ancient languages from the Universities of Ariel and Tel Aviv (Israel), began to use artificial intelligence. After teaching him Restoration Fragmentary Tablets He was entrusted with a very difficult task: to help academics carry out the difficult and time-consuming tasks of their translation.
Read moreChambolion, conqueror of hieroglyphs
Reduce the number of steps required for translation
To translate a tablet written in cuneiform, a paleographer must perform three consecutive tasks: copy the signs (glyphs), transcribe them into the Latin alphabet, and then translate the text into the target language—in this case English. So researchers use this map to assign two comparable sets of tasks to an AI to decide which is more useful. On the one hand, the machine is responsible for translating text translated into Latin characters into English; On the other hand, for optical character recognition (OCR) to be particularly useful, the machine itself must transcribe the cuneiform glyphs into computer code to translate them directly into English.
Map of translation tasks. The top line represents the tasks performed by humans, the middle line, translation performed by AI from transliteration, and the last line, translation performed directly by AI from cuneiform. Credits: Guthers et al., 2023
Machine translations are compared to reference translations
To evaluate machine translation results, they are compared to reference translations performed by humans. The researchers note with satisfaction that the machine scores well with transliteration and direct translation of characters into cuneiform. so far”High-quality translations can be achieved by translating directly from cuneiform to English“, concluding that”No transliteration step required“, as is already the case for languages that use ideograms (for example Chinese or Japanese) the main criterion for the variation of results is the length of the text to be translated. Very long texts are more prone to errors. The best scores are obtained with average length sentences estimated at 118 characters. AI is most efficient when it includes formulas that apply to texts.
Read moreAI to reconstruct cuneiform tablets
AI creates different types of mazes
Material errors caused by machine translation are called “hallucinations”. The machine actually “finds” sentences when it stumbles, which can happen when its learning data isn’t consistent (when the reference doesn’t match the human translation source text) or when it skips parts of the text. It can be translated because it is too long, or too short, because there is no context to refer to it. It should be noted that the Akkadian language is very difficult to understand, first of all it does not know punctuation, but each glyph corresponds to different phonetic and logographic forms (which can be read in different ways) and different meanings depending on the context. The paleographer refers to these elements by symbols added to the Latin transliteration. Shai Gordin’s team has already achieved 97% accuracy and is working on a machine to accomplish this task. Finally, the artificial neural network commits the sin of ignorance because it does not know all the correct names mentioned in the tablets.
Pathways to growth
From several experiments involving different structures, the researchers conclude, “The best way to translate a text is to break it up into small sentences“, this is entirely possible because the texts are presented in the form of lines on the tablets; they decide what they want”Define each line of text written on the tablet as a unit of translation”. Despite the good results of the direct translation from cuneiform, it was “Given the versatility of cuneiform signs, hallucinations are more likely“. As for the gaps in proper nouns (names of people or places), they are filled during practice, especially by diversifying the types of texts.
During this first battery of tests, the researchers were surprised by the machine’s ability to reproduce the style and type of the original text. Conclusion: These first experiments on Akkadian tablets demonstrate that AI-generated translations can be fully used within the framework of collaboration between humans and machines, where automated translations can be corrected and refined by researchers. This will not only make these resources accessible to academics, but will also help to make them better known to the public and better preserve this heritage of humanity.
“Beeraholic. Friend of animals everywhere. Evil web scholar. Zombie maven.”
More Stories
This prompts a debate among editors – is the only English club in the semi-finals of the European Cup really surprising?
Do I have the right not to respect a professional intention written in English?
MICS introduces CroissantLLM, a bilingual French-English open source model