Research

III: Small: Native Compilation, Query Processing, and Indexing for In-memory Graph Relational Data Systems

NSF Award Number: IIS-1910216
Duration: 8/1/2019 -- 7/31/2022
Title: Native Compilation, Query Processing, and Indexing for In-memory Graph Relational Data Systems
PI: Walid G. Aref
Co-PI: Tiark Rompf
Project Web Page

Project Focus:

A wide variety of applications spanning various domains have graphs as first-class citizens, e.g., communication networks, road networks, social networks, and biological networks. The nodes and edges of these graphs are often associated with descriptors, e.g., labels and properties, or more generally, attributes. Many of these applications need efficient and real-time processing of the graph data. Because relational data systems are very mature and ubiquitous, extending these systems to support graph data is a natural choice. However, there is an impedance mismatch between the relational model and the graph model at the various levels that makes extending relational systems to efficiently support graph data very challenging. This project will address this impedance mismatch and the hurdles that face graph applications that run over graph-enabled relational systems in order to function properly and efficiently. More specifically, this project will address the following research challenges:

The expressiveness challenge to address the mismatch between the declarative nature in querying relational data and the navigational nature in querying graph data,
The scalability challenge to support large amounts of graph and relational data and queries in real-time, and
The performance challenge to address the complexity in answering graph and relational queries and the real-time processing needs of graph applications. Addressing these challenges in the focus of this project.

This project addresses how to overcome the impedance mismatch between the relational and graph models by addressing the above challenges. Techniques are proposed to seamlessly and natively process large graph databases inside relational systems without negatively affecting the graph query performance. The techniques to be developed include:

Graph query compilation techniques: State-of-art query compilation mechanisms will be developed to mixes of graph and relational query evaluation pipelines to efficiently execute compiled query processing plans that include both graph and relational operators,
Graph-as-an-index: In-memory graph indexing techniques that will facilitate the navigation of the graph relational data using the graph topology. The graph indexes will efficiently support sub-graph selection based on the attribute data of both the graph nodes and edges, and performing graph operations on the selected sub-graphs. The techniques to be developed will support dynamic graphs where both the graph topology as well as the graph attributes can be updated. The introduced techniques will tolerate updates that would otherwise invalidate graph intermediate representations that are typically prepared offline to speedup graph query processing.
Native graph+relational query execution: Introduce graph navigation operators that operate over graph data in native mode, yet seamlessly integrate with relational algebra operators inside query evaluation pipelines. The developed query processing techniques will permit bidirectional navigation over the relational and the graph data within the same query evaluation pipeline to permit further query optimization strategies that are infeasible otherwise, and to efficiently evaluate interleaved graph and relational operators in query evaluation pipelines, and
Costing of the interleaved relational and graph operations for query optimization purposes.

For further information see the project web page.