Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores, 2nd edition
Co-located with VLDB 2021 (, Copenhagen, Denmark)
SEA Data workshop will provide a forum for researchers and practitioners to exchange ideas, results, and visions on challenges in data management, information extraction, exploration, and analysis of heterogeneous data and multiple data models at once.
Companies, governments, and organizations are now producing and collecting data from multiple heterogeneous sources, such as transactional data, internet traffic, logs, IoT applications, knowledge bases, and much more. The unprecedented pace in which data is produced and consumed calls for methods that organize, retrieve, and analyze such data appropriately. While traditionally data were organized into homogeneous datastores and formats, our current data collection from multiple different sources makes such datastores impractical. Even within the same organization, data dwells in independent silos each with a distinct data model and serving a specific application, keeping relevant portions of the data separate from each other.
As a consequence, we have witnessed an increasing interest in systems and methods that try to handle and analyze multiple data sources and formats holistically. Data-lakes and polystores are the most prominent examples of such heterogeneous datastores. Moreover, graphs and learned databases have recently attracted the attention of the community for their flexibility in modeling, managing, and organizing heterogeneous data. Due to the fast pace of data collection and evolution, consolidating all the sources into a single data format and loading them into a single store is usually impractical.
Hence, the first challenge that these systems face is to provide flexible storage and retrieval methods that can adapt to multiple models and domains. On the other hand, from the user perspective, when such diverse data is collected, the tasks of data discovery, exploration, and analysis become even more challenging. These solutions in the case of heterogeneous datastores remain still widely uncharted for a lack of established methods that allow effective multi-model data retrieval and exploration. Data analytics should also accommodate issues due to the lack of shared dimensions, ambiguous semantics, and the need to ensure the quality and lineage of the analytical result.