About the Talk
With the rise of pre-trained Large Language Models (LLMs), there is now an effective solution to store and use information extracted from massive corpora of text documents. Thus, we envision the use of SQL queries to cover a broad range of data that is not captured by traditional databases (DBs) by tapping the information in LLMs. This ability would enable the hybrid querying of both LLMs and DBs with the SQL interface, which is more expressive and precise than NL prompts. To ground this vision, we present a prototype based on a traditional DB architecture with new physical operators for querying the underlying LLM. For a large class of SQL queries, querying LLMs returns well structured relations, with encouraging qualitative results. We pinpoint several research challenges that must be addressed to build a DBMS that jointly exploits LLMs and DBs. While some challenges call for new contributions from the NLP field, others offer novel research avenues for the DB community.
About the Speaker
Paolo Papotti is an Associate Professor at EURECOM, France since 2017. He got his PhD from Roma Tre University (Italy) in 2007 and had research positions at the Qatar Computing Research Institute (Qatar) and Arizona State University (USA). His research is focused on data management and information quality. He has authored more than 140 publications, and his work has been recognized with two “Best of the Conference” citations (SIGMOD 2009, VLDB 2016), three best demo award (SIGMOD 2015, DBA 2020, SIGMOD 2022), and two Google Faculty Research Award (2016, 2020).