Paper Review 04 - RT-kNNS Unbound

My thoughts on RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor Search


Today’s paper was published in ICS '23. It reduces the k-nearest neighbors problem to a ray tracing problem. While I was vaguely familiar with ray tracing as a concept, I had never previously looked into the APIs surrounding it, and this paper was a pretty good introduction to how programs can be written to take advantage of ray tracing (RT) cores.

What is k-NNS?

k-NNS, or k Nearest Neighbors Search, is defined by the paper as follows:

For a dataset \(D\) and query point \(q \in D\), find the set of \(k\) nearest points to \(q\)

This problem has a lot of real-world applications, such as in networking or in biology.

What is Ray Tracing?

Ray tracing is a technique for rendering 3D scenes with accurate lighting. It works by simulating a ray of light being emitted from a particular source, and traces the path it would follow, recording things like the colors and reflective properties of materials it may interact with in order to determine the color of a particular pixel in the final render. Some GPUs have dedicated hardware to accelerate ray tracing. Incidentally, I’ve implemented a ray tracer in software before as an elaborate meme.

Implementing k-NNS with RT acceleration

The paper presents an initial known implementation of RT accelerated k-NNS that relies on the user specifying a maximum radius within which they would like to search. One thing that I had to read carefully was that while the algorithm seems to return all points with a distance within the specified radius, the union operator seems to actually imply that that only the k closest points are actually kept. However, this algorithm is insufficient because of the requirement that the radius be supplied. So the authors build on this algorithm by supplying an initial radius, and adding points that already found k neighbors into the result set (along with their neighbors), and keeping points that have not yet found k neighbors for additional iterations where the radius is doubled between each iteration.

This poses an important question - what should the initial radius be? If the radius picked is too large, then there will be many intersections and a lot of compute to be done for each point. If the radius picked is too low, then the process will need to be repeated for many iterations, incurring a lot of overhead due to communication between the GPU and the host. To solve this, the authors employ a random sampling approach, where a random subset of points are chosen and k-NNS is run against this subset with k=4 (want to choose a k small enough for compute to be fast, and big enough to get meaningful data) and the minimum distance from point to neighbor is chosen as the starting radius.

The paper has a lot of additional details around how this works and how the APIs around ray tracing are used, the most important of which seems to be constructing the BVH (bounding volume hierarchy), a datastructure which the RT cores rely on for efficient execution. The APIs around constructing this structure leads to some performance left on the floor. I initially wanted to write a much more details summary post with those details, but I don’t think I could make a more approachable summary than the already well-written explanations in the paper. I considered making some animations to better capture the concept, but I’ll save that for a future blog post.

My Thoughts

  • Expanding support to higher dimensions/optimizing for lower dimensions

From what I can tell, this paper interprets all data as 3D in order to use the RT APIs. For 2D data, we can observe that all points lie in the same plane, and we’re actually just looking at determining if a point is in a circle/intersections between lines and circles. I don’t know if current RT libraries optimize this case, but it seems to me like there might be opportunities there. Being able to expand this idea to work for more than 3 dimensions would be very useful in the context of things like vector DBs and AI, which typically produce feature vectors that are much larger.

  • Customizing radius expansion based on properties of the dataset

If we know that the dataset is very sparse or very dense, we might be able to use a more optimal strategy for resizing the spheres instead of just doubling the radius on every iteration. Very sparse datasets might benefit from faster growth (more likely to find neighbors, fewer iterations overall), and very dense datasets might benefit from smaller increment (less intersections, less compute compared to larger radii).

  • Similarity to circle packing

A couple years ago, I was working on GPU accelerated circle packing in order tio achieve a visual effect. I wrote a bit about the algorithm I came up with here. There’s definitely some conceptual similarities here which I found interesting.

  • Are there APIs that expose ray tracing to WebGPU?

Not exactly related to the contributions of this paper, but something worth looking into IMO. Web browsers are the way most people interact with technology these days, and being able to take advantage of things like ray tracing would allow for distribution of more computational intensive programs without needing to install additional software. I think this is especially useful considering the applications of kNN in things like vector DBs/AI.

Written on February 5, 2024