Workshop Description

After the increased adoption of machine learning (ML) in various applications and disciplines, a synergy between the database (DB) systems and ML communities emerged. Steps involved in an ML pipeline, such as data preparation and cleaning, feature engineering and management of the ML lifecycle, can benefit from research conducted by the data management community. For example, the management of the ML lifecycle requires mechanisms for modeling, storing and querying ML artifacts. Moreover, in many use cases pipelines require a mixture of relational and linear algebra operators raising the question of whether a seamless integration between the two algebras is possible. In the opposite direction, ML techniques are explored in core components of database systems, e.g., query optimization, indexing and monitoring. Traditionally hard problems in databases, such as cardinality estimation, or problems with high human supervision like DB administration, might benefit more from learning algorithms than from rule-based or cost-based approaches.

The workshop aims at bringing together researchers and practitioners in the intersection of DB and ML research, providing a forum for DB-inspired or ML-inspired approaches addressing challenges encountered in each of the two areas. In particular, we welcome new research topics combining the strengths of both fields.

May 9, 2022

Date

Virtual Zoom Conference

Location

Slack or Whova

Discussion Platform

Timetable

Topic of Interest

Topics of particular interest for the workshop include, but are not limited to:

  • Data collection and preparation for ML applications

  • Declarative machine learning on databases, data warehouses or data lakes

  • Hybrid optimization techniques for databases and machine learning

  • Model-aware data discovery, cleaning, and transformation

  • Benchmarking ML-oriented data management systems (data augmentation, data cleaning, etc)

  • Data management during the life cycle of ML models

  • Novel data management systems for accelerating training and inference of ML models

  • DB-inspired techniques for modeling, storage and provenance of ML artifacts

  • Learned database design, configuration and tuning

  • Machine learning for query optimization

  • Applied machine learning/deep learning for data integration

  • ML-enabled data exploration and discovery in data lakes

  • ML functionality inside DBMS

Important Dates

The timeline for the half-day single-track workshop is as follows:

Progress: 100%
  • Submission deadline: January 14, 2022 (Friday)

  • Notification for authors: February 22, 2022 (Tuesday)

  • Camera-ready copy due: March 8, 2022 (Tuesday)

Workshop Format

Keynote Talks

30 minutes for each industry or academia experts

Technical Presentation

15 – 25 minutes for each accepted paper

Panel Discussion

1 hour discussion

Notes on Workshop Research Papers

  • The workshop will accept both regular papers and short papers (work in progress, vision/outrageous ideas). The page limit for regular papers is 8 pages, and for short papers is 4 pages.

  • The authors of a selection of accepted papers (recommended by the program committee) and keynote speakers will be invited to submit an extended version of their work to the special issue on databases and ML in a journal.

  • Papers will be uploaded as PDF files to the review system.

  • Each paper will be reviewed by at least three reviewers from the program committee (PC).

  • DBML 2022 will apply the same principles for handling conflicts of interests with PC members or workshop chairs as the ICDE conference.

Organizing Committee

Rihan Hai is a postdoctoral researcher at Delft University of Technology, Netherlands. She received her Ph.D. degree from RWTH Aachen University, Germany. Her research focuses on data integration and related dataset discovery in large-scale data lakes. She was the organizer of the 11th International Workshop on Quality in Databases in conjunction with VLDB 2016, and the publicity chair of the International Workshop on Data Science for Industry 4.0 in conjunction with EDBT 2019.

Nantia Makrynioti is a postdoctoral researcher with the Database Architectures group at CWI, Netherlands. Her research interests include the integration of machine learning functionality with relational database systems and declarative programming, query execution in exploratory data analysis and provenance management in data science pipelines. Since the beginning of 2021, she is also a co-organizer of the Dutch Seminar on Data Systems Design. She received her PhD from the Athens University of Economics and Business, supervised by Prof. Vasilis Vassalos. In the past she has also worked at LogicBlox (later acquired by Infor) on expressing and optimising machine learning problems using the LogicBlox relational platform.

Ioana Manolescu is a senior researcher at Inria Saclay and a part-time professor at Ecole Polytechnique, France. She is the lead of the CEDAR INRIA team focusing on rich data analytics at cloud scale. She is also the scientific director of LabIA, a program ran by the French government whereas AI problems raised by branches of the local and national French public administration are tackled by French research teams. She is a member of the PVLDB Endowment Board of Trustees, and has been Associate Editor for PVLDB, president of the ACM SIGMOD PhD Award Committee, chair of the IEEE ICDE conference, and a program chair of EDBT, SSDBM, ICWE among others. She has co-authored more than 150 articles in international journals and conferences and co-authored books on “Web Data Management” and on “Cloud-based RDF Data Management”. Her main research interests algebraic and storage optimizations for semistructured data, in particular Semantic Web graphs, novel data models and languages for complex data management, data models and algorithms for fact-checking and data journalism, a topic where she is collaborating with journalists from Le Monde. She is also a recipient of the ANR AI Chair titled “SourcesSay: Intelligent Analysis and Interconnexion of Heterogeneous Data in Digital Arenas” (2020-2024).