AI training: striking the right balance between data training and privacy
About Dr. Ivan Habernal
Dr Ivan Habernal is a researcher at the Department of Computer Science at Darmstadt University of Technology. He studied and obtained his doctorate at the University of West Bohemia in Pilsen, Czech Republic.
After various positions in industry, he has been a junior research group leader in the research group “Trustworthy Human Language Technologies” since 2021.
His AI methods protect personal data
Natural language processing (NLP) with artificial intelligence has made rapid progress in recent years. Whether it’s voice assistants in smartphones or AI-based text generation with ChatGPT: NLP is now ubiquitous and provides great benefits.
For an AI to learn and process natural language, it needs large amounts of data. Often, the protection of personal data is a particular challenge: reviews of medicines or court rulings can contain sensitive information that should not be included in the AI models.
Habernal wants to optimise these models and is researching how AI can preserve privacy. The simplest form is, for example, the anonymisation of texts. However, sometimes even blackened data contains sensitive information that can be used to draw conclusions about people: “There are models that can be used to calculate a person’s gender, social class or entire past history.”
That is why Habernal and his research group are developing their own AI models that automatically recognise indirect correlations of personal data and exclude them.
Habernal sees a central challenge in finding the balance between the accuracy of the AI model and the degree of privacy. The AI should work as well as possible and ideally not process sensitive data.
Interdisciplinary research for more logic in AI
The computer scientist is working in this area of tension in another research project on Legal-Natural-Language-Processing. Habernal and his team are developing a model to facilitate the work of legal scholars.
The AI is supposed to automatically recognise argumentation patterns and logic in court decisions and replace manual annotation as far as possible. For Habernal, this requires close cooperation with lawyers: “Law is not computer science. The approach is different and takes time – but it’s worth it.”
Habernal therefore pursues interdisciplinary research. This is where he sees the strengths of hessian.AI, which supports with funding programmes or start-up financing, for example. In a funded project, for example, Habernal and lawyers are jointly analysing court hearings in order to use an AI to find out more about the arguments, their logic and the influence on judgements – always with an eye on protecting privacy.