Program – Workshop GVDB23

Wednesday, June 7

Session	Time	Program
	12:30 – 13:45	Lunch and check-in
1	13:45 – 15:00	Welcome + invited presentation (Chair: Holger Schwarz) Peter Reimann: Metadata Modeling and Use of Domain Knowledge to Support Industrial Data Analytics
	15:00 – 15:30	Coffee break
2	15:30 – 16:45	Paper presentations (Chair: Kerstin Schneider) Julius Voggesberger: Optimierung von Klassifikator-Ensembles mit AutoML Sajad Karim: Assessing Non-volatile Memory in Modern Heterogeneous Storage Landscape using a Write-optimized Storage Stack
	17:00 – 18:30	Guided tour
	19:00	Dinner

Thursday, June 8

Session	Time	Program
	8:30 – 9:30	Breakfast
3	9:30 – 10:30	Invited presentation (Chair: Holger Schwarz) Manuel Fritz: Data Management in Semiconductor Manufacturing Equipment: Current State and Challenges
	10:30 – 11:00	Coffee break
4	11:00 – 12:15	Paper presentations (Chair: Günther Specht) David Lengweiler: MMSBench-Net: Scenario-Based Evaluation of Multi-Model Database Systems Lars Runge: Towards Semantic Identification of Temporal Data in RDF
	12:15 – 13:15	Lunch
5	13:15 – 14:30	Paper presentations (Chair: Peter Reimann) Paul Blockhaus: Towards a Future of Fully Self-Optimizing Query Engines Jennifer Landes: Influence Factors on Academic Integrity revealed by Machine Learning Methods
	14:30 – 15:00	Coffee break
	15:00 –	TBA
	18:30	Dinner

Friday, June 9

Session	Time	Program
	8:30 – 9:30	Breakfast
6	9:30 – 10:30	Invited presentation (Chair: Holger Schwarz) Jan Schneider: Lakehouses: Transferring Database Concepts to Data Lakes
	10:30 – 11:00	Coffee break
7	11:00 – 12:30	Paper presentations (Chair: Manuel Fritz) Kevin Kramer: Towards Evolution Capabilities in Data Pipelines Amir Reza Mohammadi: HPT4Rec: AutoML-based Hyperparameter Self-Tuning Framework for Session-based Recommender Systems
	12:30 – 13:30	Lunch
	13:30	End of the workshop

Invited Presentations

Peter Reimann: Metadata Modeling and Use of Domain Knowledge to Support Industrial Data Analytics

Abstract:
Industrial Analytics refers to problems and solution approaches for data management, data provision and data analytics across different phases of the industrial product life cycle. In this presentation, an overview of the research topics of the junior research group “ICT Platform for Manufacturing” at the Graduate School of Excellence advanced Manufacturing Engineering (GSaME) of the University of Stuttgart is given. This research group deals with both application-oriented and fundamental research in the area of Industrial Analytics. The talk will detail on two specific research topics and corresponding project results. Firstly, an approach to metadata modeling will be presented that connects heterogeneous data from various previously isolated data sources in virtual product development and corresponding CAx systems. Thereby, the work activities of product development projects and the data these activities consume and produce are explicitly represented in the metadata. This facilitates a democratized data access, so that product development engineers may easily find the data associated to the work activities in development projects they are familiar with. The second major part of this talk deals with an approach to exploit domain knowledge during data preparation in order to address two of the most important challenging data characteristics in industrial data: a multi-class imbalance and an aggregation bias that is due the high variety of underlying products. This approach is evaluated with a use cases for a data-driven identification of quality issues in assembled truck engines. It is shown that the approach leads to a significant increase of classification accuracy and to a reduction of the number of rework steps needed to repair faulty truck engines.

CV:
Peter Reimann studied computer science at the University of Stuttgart and received his PhD in 2016 at the Institute for Parallel and Distributed Systems (IPVS) and at the Cluster of Excellence Simulation Technology (SimTech) of the University of Stuttgart. His PhD topic was related to data management and data provision for computer-based simulations and simulation workflows. From July to September 2015, he was a visiting scholar at the University of Illinois at Urbana-Champaign in USA. Since 2017, he is head of a junior research group at the Graduate School of Excellence advanced Manufacturing Engineering (GSaME) of the University of Stuttgart. His research area covers topics on both application-oriented and fundamental research in the areas of data provision, data management, data analysis, and machine learning for industrial use cases (Industrial Analytics).

Manuel Fritz: Data Management in Semiconductor Manufacturing Equipment: Current State and Challenges

Abstract:
Semiconductor industry is a key-enabler for technological advancements such as AI and Industry 4.0 by providing ever shrinking transistor sizes, more storage and more computing power for less energy and less costs. The production of semiconductor manufacturing equipment relies on machines which achieve high yield and high output, thus relying on ultra-precise manufacturing processes. In this talk, we look at how a lithography system is produced and unveil the interdependency with data management and analytics. We present data management as an enabler for automated, robust, high-end manufacturing processes, and show how data analytics can support the overarching company goals on a technical and organizational level to achieve competitive benefits, such as increased product output, reduced costs, improved processes, and even longer product lifespan.

CV:
Dr. Manuel Fritz works at the Process Data Systems Engineering Group at Zeiss Semiconductor Manufacturing Technology and is a Product Owner for Data Analytics. He studied Computer Science at the Baden-Wuerttemberg Cooperative State University (B.Sc.) and at the University of Furtwangen (M.Sc.). He holds a PhD from the University of Stuttgart in Computer Science and is currently pursuing an MBA at Quantic School of Business and Technology. His research area focuses on Data Analytics, Big Data, Meta Learning, and AutoML.

Jan Schneider: Lakehouses: Transferring Database Concepts to Data Lakes

Abstract:
In times of digital transformation, enterprises need to store, organize and analyze huge amounts of data in order to exploit it for competitive advantages. Analytical data platforms form the technical foundation for these tasks and in the past, especially two types of them have attracted greater popularity: While data warehouses have traditionally been used by business analysts for reporting and OLAP, data lakes emerged as an alternative concept that also supports advanced analytics, such as data mining and machine learning. Since both types of data platforms show rather contrary characteristics and target different types of analytics, enterprises usually have to employ both of them, which leads to complex and costly architectures. Due to these issues, efforts have currently become apparent to combine the features of data warehouses and data lakes into integrated data platforms, so-called lakehouses, which can serve all types of analytical workloads. The vision of lakehouses raises the need for enhancing data lakes for established features of data warehouses, such as relational data structures, ACID transactions, concurrency control and time travel capabilities. However, this proves to be challenging, because data lakes, which commonly consist of distributed file systems and cloud object storages, tend to fundamentally differ from typical database environments and POSIX-compatible file systems. This presentation provides an overview on the goals of lakehouses, the current state of research, and in particular discusses the challenges that arise when attempting to implement common database features on top of data lakes.