General Information
Open Proceedings
Follow us on Twitter & Facebook


We are very pleased to announce the EDBT/ICDT 2017 keynote and invited speakers.

Joint EDBT/ICDT keynote speakers

ICDT Invited speaker

EDBT Industrial and Application invited speakers

Rewriting Ontology-Mediated Queries (by Carsten Lutz)


Abstract: Data sets that have been collected from various sources or extracted from the web are often highly incomplete and heterogeneous, which makes it hard to process and query them. Ontologies can provide support by assigning a semantics to the data, enriching it with domain knowledge, and providing a uniform vocabulary for query formulation. Combining a traditional database query with an ontology gives rise to ontology-mediated queries (OMQs). Since most database systems are unaware of ontologies, a popular approach to answering OMQs is to rewrite them into more conventional database query languages. The aim of this talk is to survey recent results about OMQs, their rewritability into first-order queries, into linear datalog, and into datalog, as well as related questions about data complexity.

Carsten Lutz Carsten Lutz is a full professor in the Department of Computer Science of the University of Bremen, where he is head of the research group on theoretical aspects of artificial intelligence. His research interests include knowledge representation, ontologies, and database theory. Carsten has frequently served on the PC of major international conferences on AI, KR, and DB, including KR, IJCAI, PODS, and ICDT. He served as an editor for the Journal of Artificial Intelligence Research (JAIR) and for the Reviews of Symbolic Logic (RSL). He is an EurAI fellow and has received the "AI ten to watch" award in 2006 and an ERC consolidator grant in 2014.

The Smart Crowd - Learning from the Ones Who Know (by Tova Milo)


Abstract: One of the foremost challenges for information technology over the last few years has been to explore, understand, and extract useful information from large amounts of data. Some particular tasks such as annotating data or matching entities have been outsourced to human workers for many years. But the last few years have seen the rise of a new research field called crowdsourcing that aims at delegating a wide range of tasks to human workers, building formal frameworks, and improving the efficiency of these processes.
What may be achieved with the help of the crowd depends heavily on the properties and knowledge of the given crowd. In this talk we will focus on knowledgeable crowds. We will examine the use of such crowds, and in particular domain experts, for assisting in solving data management problems. Specifically we will consider three dimensions of the problem: (1) How domain experts can help in improving the data itself, e.g. by gathering missing data and improving the quality of existing data, (2) How they can assist in gathering meta-data that facilitate improved data processing, and (3) How can we find and identify the most relevant crowd for a given data management task. Using examples from recent work, I will present several exciting and new directions that are opening up for database research.

Tova Milo Tova Milo received her Ph.D. degree in Computer Science from the Hebrew University, Jerusalem, in 1992. After graduating she worked at the INRIA research institute in Paris and at University of Toronto and returned to Israel in 1995, joining the School of Computer Science at Tel Aviv university, where she is now a full Professor. She is the head of the Database research group and holds the Chair of Information Management. She served as the Head of the Computer Science Department from 2011-2014. Her research focuses on advanced database applications such as data integration, XML and semi-structured information, Data centered Business Processes and Crowd-sourcing, studying both theoretical and practical aspects.
Tova served as the Program Chair of several international conferences, including PODS, VLDB, ICDT, XSym, and WebDB, and as a member of the VLDB Endowment and the ICDT executive board. She also served as the chair of the PODS Executive Committee and an editor of TODS and the Logical Methods in Computer Science Journal.
Tova has received grants from the Israel Science Foundation, the US-Israel Binational Science Foundation, the Israeli and French Ministry of Science and the European Union. She is an ACM Fellow, a member of Academia Europaea, and a recipient of the 2010 ACM PODS Alberto O. Mendelzon Test-of-Time Award and of the prestigious EU ERC Advanced Investigators grant.

DeepDive and Snorkel: Dark Data Systems (by Christopher Ré)

Abstract: Building applications that can read and analyze a wide variety of data may change the way we do science, make business decisions, and develop policy. However, building such applications is challenging: real world data is expressed in natural language, images, or other "dark" data formats which are fraught with imprecision and ambiguity and so are difficult for machines to understand. This talk describes DeepDive, a new type of system designed to cope with Dark Data by combining extraction, integration and prediction into one system. For some paleobiology and materials science tasks, DeepDive-based systems have surpassed human volunteers in quantity and quality (recall and precision) of extracted information. DeepDive is in daily use by scientists in areas including genomics and drug repurposing, by a number of companies involved in various forms of search, and by law enforcement in the fight against human trafficking.
This talk will also describe Snorkel, whose goal is to make routine Dark Data tasks dramatically easier. At its core, Snorkel focuses on a key bottleneck in the development of machine learning systems: the lack of large training datasets. In Snorkel, a user implicitly creates large training sets by writing simple programs that label data, instead of performing manual feature engineering or tedious hand-labeling of individual data items. We will describe our preliminary evidence that the Snorkel approach allows a broader set of users to write dark data programs more efficiently than previous approaches. We will also describe the underlying theory, in particular our recent work on new convergence guarantees for Gibbs sampling and large-scale non-convex optimization which play a key role in enabling Snorkel to scale.
DeepDive and Snorkel are open source on github and available from and

Christopher (Chris) Ré

Christopher (Chris) Ré is an associate professor in the Department of Computer Science at Stanford University in the InfoLab who is affiliated with the Statistical Machine Learning Group, Pervasive Parallelism Lab, and Stanford AI Lab. His work's goal is to enable users and developers to build applications that more deeply understand and exploit data. His contributions span database theory, database systems, and machine learning, and his work has won best paper at a premier venue in each area, respectively, at PODS 2012, SIGMOD 2014, and ICML 2016. In addition, work from his group has been incorporated into major scientific and humanitarian efforts, including the IceCube neutrino detector, PaleoDeepDive and MEMEX in the fight against human trafficking, and into commercial products from major web and enterprise companies. He received a SIGMOD Dissertation Award in 2010, an NSF CAREER Award in 2011, an Alfred P. Sloan Fellowship in 2013, a Moore Data Driven Investigator Award in 2014, the VLDB early Career Award in 2015, the MacArthur Foundation Fellowship in 2015, and an Okawa Research Grant in 2016.

Building and Querying Industry-specific Knowledge Bases (by Shivakumar Vaithyanathan)

Abstract: Building and maintaining industry-specific knowledge bases is essential to IBM Watson. The creation of such knowledge bases involves well known building blocks: natural language processing, entity resolution, data transformation, etc. It is critical that the models and algorithms that implement these building blocks be transparent and optimizable for efficient execution. In this talk, I will discuss the design of domain-specific languages (DSL) with specialized constructs that serve as target languages for learning these models and algorithms. I will also describe how we support cross-lingual natural language querying over the underlying knowledge bases.

Shivakumar Vaithyanathan Shivakumar Vaithyanathan is an IBM Fellow and Chief Architect, Watson for Compliance. Prior to that he was the Director, Watson Content Services and before that started and managed the Analytics Department at IBM Almaden. Multiple technologies developed under his direction ship with several IBM products as well as released in open-source. He has co-authored more than 40 papers in major conferences including, ACL, EMNLP, SIGMOD, VLDB, ICML, NIPS and UAI.

Graphs, Hypergraphs, and the Complexity of Conjunctive Database Queries (by Dániel Marx)


Abstract: The complexity of evaluating conjunctive queries can depend significantly on the structure of the query. For example, it is well known that various notions of acyclicity can make the evaluation problem tractable. More generally, it seems that the complexity is connected to the "treelikeness" of the graph or hypergraph describing the query structure. In the lecture, we will review some of the notions of treelikeness that were proposed in the literature and how they are relevant for the complexity of evaluating conjunctive queries and related problems.

Dániel Marx Dániel Marx is a senior research fellow at the Institute for Computer Science and Control of the Hungarian Academy of Sciences. His main area of research is parameterized algorithms and complexity, but he had significant contributions in graph theory, constraint satisfaction, and database theory as well.
Dániel served on the Program Committee of most of the leading conferences in Theoretical Computer Science such as STOC, FOCS, ICALP, SODA, and he is on the editorial board of the journals ACM Transactions on Computation Theory, SIAM Journal of Computing, Combinatorica, and Journal of Discrete Algorithms. He received the prestigious ERC Starting Grant in 2011 and the ERC Consolidator Grant in 2016.

SAP HANA Vora - Architecture and Applications (by Christian Mathis, SAP)

Abstract: SAP HANA Vora is a distributed compute platform that allows large scale analytics on a variety of data types: relational data, time series, graphs and semi-structured data. Vora is designed for scale-out hardware architectures and integrates with the SAP HANA database as well as with Apache Hadoop and Spark. We will give an overview on the Vora architecture and implementation and we will show how Vora supports various applications and use cases in the context of enterprise software systems.

Dr. Christian Mathis currently is a Development Manager at SAP SE. This is a brief CV:
Since 2016: Development Manager, SAP SE, Products and Innovation, Big Data Vora.
2014 – 2016: Development Architect, SAP SE, Products and Innovation, Big Data Vora.
2009 – 2014: (Senior) Developer, SAP SE, Innovation Center and HPI.
2004 – 2009: PhD Candidate, University of Kaiserslautern, Databases and Information Systems Group.
1998 – 2004: Computer Science at the Universität of Kaiserslautern.

Building an Enterprise-Grade Analytics Platform for Smart Manufacturing and Industry 4.0 (by Christoph Gröger, Bosch)

Abstract: The ecosystem of big data technologies and advanced analytics tools has evolved rapidly in the last years offering companies new possibilities for digitalization and data-driven solutions. Nevertheless, building and operating an enterprise-grade analytics platform involves far more than tooling and technology. In this talk, we discuss challenges and approaches for building a global analytics platform for smart manufacturing and Industry 4.0 at Bosch. The analytics platform is designed for more than 200 factories as part of Bosch’s worldwide manufacturing network enabling, e.g., data-driven process optimization and predictive maintenance. We present various aspects related to, e.g., governance, architecture, and tool selection in order to leverage advanced manufacturing analytics in a global enterprise.

Dr. Christoph Gröger is the Lead Architect for Industry 4.0 Analytics at Bosch and a technical professional in Bosch’s central Business Intelligence & Analytics department in Stuttgart, Germany. In this role, he is responsible for the design of analytical solutions across various business domains at Bosch, such as, manufacturing, purchasing or sales. Christoph received his doctoral degree in computer science from the University of Stuttgart and his technical passion focuses on the interdisciplinary field of analytics in manufacturing.

How Information Governance is getting Analytics on Big Data's Best Friend (by Albert Maier, IBM)


Abstract: Expectations about what information governance has to deliver have significantly changed over the last years and self-service analytics on big data, specifically in the context of big data lakes, is pushing the limits and demanding a new technology approach. In this talk we discuss the challenges big data lake solutions are up against, explain why IBM is pushing and implementing a new Open Governance and Open Metadata infrastructure based on open source in order to address these challenges, and debate some of the technology challenges that need to be solved.

Dr. Albert Maier is a Senior Technical Staff Member in IBM's Analytics development organization. He is the chief architect for IBMs Information Governance and Data Quality portfolio. Prior to this job, Albert has held a variety of technical lead positions in various IBM organizations including Research and Global Services. Albert's interests have ranged broadly across information integration, middleware integration, and databases. Albert is an IBM Master Inventor and has received many patents and awards for his contributions to Information Management products. He is also a member of the IBM Academy of Technology.