
Workshop Description
After the increased adoption of machine learning (ML) in various applications and disciplines, a synergy between the database (DB) systems and ML communities emerged. Steps involved in an ML pipeline, such as data preparation and cleaning, feature engineering and management of the ML lifecycle, can benefit from research conducted by the data management community. For example, the management of the ML lifecycle requires mechanisms for modeling, storing and querying ML artifacts. Moreover, in many use cases pipelines require a mixture of relational and linear algebra operators raising the question of whether a seamless integration between the two algebras is possible. In the opposite direction, ML techniques are explored in core components of database systems, e.g., query optimization, indexing and monitoring. Traditionally hard problems in databases, such as cardinality estimation, or problems with high human supervision like DB administration, might benefit more from learning algorithms than from rule-based or cost-based approaches.
The workshop aims at bringing together researchers and practitioners in the intersection of DB and ML research, providing a forum for DB-inspired or ML-inspired approaches addressing challenges encountered in each of the two areas. In particular, we welcome new research topics combining the strengths of both fields.
Timetable
Topic of Interest
Topics of particular interest for the workshop include, but are not limited to:
Important Dates
The timeline for the half-day single-track workshop is as follows:
Workshop Format
Notes on Workshop Research Papers
Organizing Committee
Rihan Hai is a postdoctoral researcher at Delft University of Technology, Netherlands. She received her Ph.D. degree from RWTH Aachen University, Germany. Her research focuses on data integration and related dataset discovery in large-scale data lakes. She was the organizer of the 11th International Workshop on Quality in Databases in conjunction with VLDB 2016, and the publicity chair of the International Workshop on Data Science for Industry 4.0 in conjunction with EDBT 2019.
Nantia Makrynioti is a postdoctoral researcher with the Database Architectures group at CWI, Netherlands. Her research interests include the integration of machine learning functionality with relational database systems and declarative programming, query execution in exploratory data analysis and provenance management in data science pipelines. Since the beginning of 2021, she is also a co-organizer of the Dutch Seminar on Data Systems Design. She received her PhD from the Athens University of Economics and Business, supervised by Prof. Vasilis Vassalos. In the past she has also worked at LogicBlox (later acquired by Infor) on expressing and optimising machine learning problems using the LogicBlox relational platform.
Ioana Manolescu is a senior researcher at Inria Saclay and a part-time professor at Ecole Polytechnique, France. She is the lead of the CEDAR INRIA team focusing on rich data analytics at cloud scale. She is also the scientific director of LabIA, a program ran by the French government whereas AI problems raised by branches of the local and national French public administration are tackled by French research teams. She is a member of the PVLDB Endowment Board of Trustees, and has been Associate Editor for PVLDB, president of the ACM SIGMOD PhD Award Committee, chair of the IEEE ICDE conference, and a program chair of EDBT, SSDBM, ICWE among others. She has co-authored more than 150 articles in international journals and conferences and co-authored books on “Web Data Management” and on “Cloud-based RDF Data Management”. Her main research interests algebraic and storage optimizations for semistructured data, in particular Semantic Web graphs, novel data models and languages for complex data management, data models and algorithms for fact-checking and data journalism, a topic where she is collaborating with journalists from Le Monde. She is also a recipient of the ANR AI Chair titled “SourcesSay: Intelligent Analysis and Interconnexion of Heterogeneous Data in Digital Arenas” (2020-2024).
