Paper Review 11 - Object Graph Programming
My thoughts on Object Graph Programming
Today’s paper first appeared in ICSE ’24.
Overview
This paper introduces OGO
a tool for inspecting and modifying the object graph
of a Java program using openCypher
by treating the program state as a graph
database. Examples of usecases include verifying program state during tests, or
finding objects that satisfy complex criteria. The paper considers two
implementations - one that uses Neo4J
as the underlying database, and one that
is implemented by the authors and can operate over their in-memory graph
representation. In the Neo4J
version, they incur the additional overhead of
needing to serialize their graph to CSV
and then importing the graph into
Neo4J
. The motivation behind having two implementations stems from their
in-house implementation not being as feature-complete or mature as Neo4J
which
has been in active development for over a decade.
To speed up synthesis of the graph, the paper outlines 3 optimizations which all aim to reduce the size of the graph in some way by limiting what nodes/edges are considered. The most expensive step overall seems to be creating the edges of the graph.
Thoughts
-
I’m not sure how useful this is as a tool in general, but for testing, debugging, and tracing I can see how this could be a very powerful paradigm. The examples presented in the paper are not very compelling since the query representation obscures the intent of the code more than the normal version. It also doesn’t seem easier to write unless you were in the small subset of people that know cypher very well and don’t know Java at all. An example that showed something more compelling, like finding an instance of an object satsifying some property (e.g. find all nodes of a tree with only a single child, or find all elements in a UI framework that have some type of child node, etc) would have shown cases where writing Cypher might be easier than writing Java, or writing code against APIs that aren’t easily “queryable”.
-
Explicitly creating the graph seems slow and not generally useful - materializing the entire program as a graph is only needed if one runs queries that frequently iterate over the entire graphs. However, in the examples shown in the paper (and most non-pathological usecases I can think of) start from some known set of nodes (or in most cases, a single node) and explore the neighborhood of the initial set. Lazily materializing the graph would likely avoid some overhead, and eliminate the need for some optimizations in constructing the graph (reducing complexity). I’ve previously worked on building a library that aims to present
openCypher
as a frontend which can connect to an implementation of a graph interface. A clever implementation of the interface could directly connect to the program’s in-memory state and explore the graph/materialize nodes and edges on-demand.
Overall, it’s a cool idea, but I think I would have liked to see more examples
that motivated the need for a graph database in the program. I’ve previously
used custom annotations on C/C++
functions to build a graph database that
could be used to statically enforce rules that prevent deadlock or other
semantic violations of an API (see here) -
maybe something similar/better could be achieved without the need for custom
annotation by integrating graph databases more closely with the runtime.