The best language models currently talking about themselves are capable of handling all subjects (almost) while expressing themselves well and in multiple languages. In addition to giving more efficient answers when in English… To prove this, a team of Spanish researchers from the Language Technology Research Center of the University of the Basque Country has developed a new assessment method. And tested it on seven language models, more accurately obtained from 4 models XGLMTechnology from META’s Artificial Intelligence Research Lab and built on the first version of 3 LlamaAlso produced by Meta AI.
“Prompt” in the original version or translated into English
The researchers asked each of these samples questions in a given language (a “prompt” according to the terminology specific to these technologies), but each question was asked using two different approaches. In one case, the “prompt” was written directly in that language and the response (or “output”) came out in that language. In another, the question was asked in the desired language, but was automatically translated into English by the language model, which then responded in English. It was at this point that the team innovated: until now, to assess the tools’ multilingualism, “stimuli” written in a specific language were translated by tools external to the model before generating “output” in English.
Sample responses were evaluated using questions from multiple multilingual databases. Some are common sense reasoning exercises (for example, relating a cause in a statement), in math (“John has 12 apples; he eats a quarter of them. How many are left?”), and others are used for paraphrase identification tasks (basic PAWS-X) . The researchers therefore compared the results obtained with one or the other of their methods for the same stimulus.
The results are clear: “The models perform better on average when responding in English, in all kinds of exercises, and it compensates for translation errors they may make when switching from their original language to English”, Explain Julen Etzanis is a language processing expert and co-author of this project.
If we look at reasoning tasks in detail, the translation-free approach sometimes gives better results, but it is very small and only for XGLM-based models. “The performance obtained with automatic translation demonstrates that it is not necessary to use an external translation method if one wants to improve results given directly in the source language. Julen Etxaniz continues. The authors mention all this in their article “Demonstrates that multilingual models cannot give their full potential when requested in a language other than English”.
The reason for this is quite obvious: the preponderance of English content in the training data. “Even multilingual models like XGLM and BLOOM are trained on more than 30% English-speaking data” Julen Xtanis notes. However, the average performance differences between the two estimation methods are smaller for the models obtained from XGLM than those obtained from LLaMa, even though the latter is larger: between 0.3 and 2.3 points for some and 2.8 to 3.5 points for others. However, as we have seen, only the translation-free method with XGLM can sometimes provide better “output”. In other words: the more multilingual a language model is, the less translation is needed, regardless of its size and the size of its training base.
However, researchers have another way to explain the superiority of the English language. Models can provide better output when they process input that they have generated themselves, ie the English translation here. But this is just a guess right now.
“Beeraholic. Friend of animals everywhere. Evil web scholar. Zombie maven.”