Paper Review 11 - Object Graph Programming

My thoughts on Object Graph Programming


Today’s paper first appeared in ICSE ’24.

Overview

This paper introduces OGO a tool for inspecting and modifying the object graph of a Java program using openCypher by treating the program state as a graph database. Examples of usecases include verifying program state during tests, or finding objects that satisfy complex criteria. The paper considers two implementations - one that uses Neo4J as the underlying database, and one that is implemented by the authors and can operate over their in-memory graph representation. In the Neo4J version, they incur the additional overhead of needing to serialize their graph to CSV and then importing the graph into Neo4J. The motivation behind having two implementations stems from their in-house implementation not being as feature-complete or mature as Neo4J which has been in active development for over a decade.

To speed up synthesis of the graph, the paper outlines 3 optimizations which all aim to reduce the size of the graph in some way by limiting what nodes/edges are considered. The most expensive step overall seems to be creating the edges of the graph.

Thoughts

  • I’m not sure how useful this is as a tool in general, but for testing, debugging, and tracing I can see how this could be a very powerful paradigm. The examples presented in the paper are not very compelling since the query representation obscures the intent of the code more than the normal version. It also doesn’t seem easier to write unless you were in the small subset of people that know cypher very well and don’t know Java at all. An example that showed something more compelling, like finding an instance of an object satsifying some property (e.g. find all nodes of a tree with only a single child, or find all elements in a UI framework that have some type of child node, etc) would have shown cases where writing Cypher might be easier than writing Java, or writing code against APIs that aren’t easily “queryable”.

  • Explicitly creating the graph seems slow and not generally useful - materializing the entire program as a graph is only needed if one runs queries that frequently iterate over the entire graphs. However, in the examples shown in the paper (and most non-pathological usecases I can think of) start from some known set of nodes (or in most cases, a single node) and explore the neighborhood of the initial set. Lazily materializing the graph would likely avoid some overhead, and eliminate the need for some optimizations in constructing the graph (reducing complexity). I’ve previously worked on building a library that aims to present openCypher as a frontend which can connect to an implementation of a graph interface. A clever implementation of the interface could directly connect to the program’s in-memory state and explore the graph/materialize nodes and edges on-demand.

Overall, it’s a cool idea, but I think I would have liked to see more examples that motivated the need for a graph database in the program. I’ve previously used custom annotations on C/C++ functions to build a graph database that could be used to statically enforce rules that prevent deadlock or other semantic violations of an API (see here) - maybe something similar/better could be achieved without the need for custom annotation by integrating graph databases more closely with the runtime.

Written on May 30, 2024