Occiglot: New initiative for European language models launched

AI language models such as ChatGPT offer a wide range of possible uses and have spread around the world in a very short space of time. The training of such so-called Large Language Models (LLM) requires huge amounts of data and computing resources. Due to the resulting high costs for computing time and the generally economical use, values such as linguistic diversity or multilingualism are often not taken into account.
This is where Occiglot comes in: The research collective, which consists largely of researchers from TU Darmstadt as well as the Hessian Center for Artificial Intelligence (hessian.AI) and the German Research Center for Artificial Intelligence (DFKI), has now launched an initiative for European AI language models – academic, non-profit and open source-based. With today’s announcement, Occiglot is releasing the first ten models, initially focusing on the five largest European languages: English, German, French, Spanish and Italian. To enable the exchange of insights and feedback within Europe, communication will take place on a public Discord server.

The participation of users and other researchers is not only expressly desired, but also necessary in order to be able to create and evaluate the training data required for the language model. Other European AI centers have already expressed their interest in collaborating.
The goal of Occiglot is to create a coherent language modeling system that takes into account all 24 official languages of the European Union as well as other unofficial and regional languages.

hessian.AI and DFKI are supporting this initiative by providing a significant amount of computing time on their AI supercomputers in 2024.