I have joined hessian.AI and TU Darmstadt as a full Professor on “Multimodal Grounded Learning”, further supported by a €2M LOEWE Start Professorship. Prior to that I was a Research Scientist at UC Berkeley, working with Prof. Trevor Darrell. I have completed my PhD at Max Planck Institute for Informatics under supervision of Prof. Bernt Schiele. My research is at the intersection of vision and language. I have worked on a variety of tasks, including image and video description, visual grounding, visual question answering and text-to-image synthesis. I am interested in building explainable models, diagnosing and addressing bias, and developing new multimodal models that can learn from language advice.