Paper Review 07 - Pond - CXL-Based Memory Pooling Systems for Cloud Platforms

My thoughts on Pond: CXL-Based Memory Pooling Systems for Cloud Platforms


Today’s paper appeared in ASPLOS'23 has received a distinguished paper award.

In the past, I’ve used VMs on AWS to host some very low-traffic webservices. There was barely any utilization of the server, even though I was using the cheapest instance type. Contrast that to VMs that I’ve spun up for various use-cases at work where we aimed for 100% utilization, and often achieved it. Between these two extremes, there are many workloads in cloud compute platforms that use some, but not all of their allocated memory - what if we could pool together all the unused memory (the paper calls this stranded memory) and make it available to other VMs? This is the general problem that Pond tackles.

Sharing memory? Sounds slow.

One of the key observations that the authors made is that emerging standards like CXL are fast enough to be even considered for this use case, but it still isn’t free. The paper presents an analysis of how much common workflows might regress by if all memory access were done via CXL (emulated latency). While some programs experience virtually no performance penalty ( \(<5%\)), almost a quarter experience more than 25% impact.

How can hosts access pooled memory?

Pooled memory is presented to the host as something the authors call a zNUMA node - a node with no cores, but available memory. Hosts naturally prefer to allocate memory on their local pool, and apparently bias against 0-core nodes. If we can accurately predict the fraction of memory that will be used, the unused memory can be allocated entirely in zNUMA.

Unlike a lot of other disaggregated memory approaches, this approach doesn’t rely on page faults to allocate/move around memory, opting instead for a predictive model that estimates how much memory will be unused and allocating local vs ZNUMA memory upfront, with a mechanism to adjust during runtime if necessary.

How is unused memory estimated?

Pond employs a ML model that guesses how much memory will be unused based on a shockingly small set of features - customer history, VM type, guest OS, and location.

Predicting the amount of memory that will be unused isn’t an infallible method, so Pond also continuously monitors applications for performance degradation due to CXL memory access overhead. If the overhead grows to be too large, Pond transparently migrates the VM onto a configuration will all reserved memory available locally. This allows Pond to be suitable for a broader set of applications than if they pooled memory alone.

Implementation

Pond requires both hardware and software support. This allows for faster speeds than what can be achieved by infrastructure currently deployed. The hypervisor implements an allocator that tries to avoid memory fragmentation to ensure fast allocation of slices. This part and the VM migration path seem like potential points where problems may occur. The paper does have traces from real world cloud infrastructure to benchmark against, but I’m not sure how well those traces reflect edge case situations, like large bursts, or even actual attacks against the provider. I suppose that in the worst case, everything would be allocated on local memory and the situation would be no worse than not implementing Pond at all. The paper attempts to address this by effectively over provisioning pool memory, which is probably more than sufficient for the common case, but I still have my doubts that it can robustly handle large bursts - further exploration may be needed.

My Thoughts

  • Seems like the majority of programs have between 5-25% loss of performance. That’s a pretty wide band and I’d have preferred if the authors included a better presentation of where each of the programs they sampled lie within that distribution.

  • I wonder if a VM allocated predominantly with zNUMA memory could one day emerge as a cheaper instance type. There’s definitely workloads out there that aren’t performance sensitive and require relatively low memory usage - webpages hosted off of lambda functions/serverless in general comes to mind. Bursts of accesses could trigger instantiating a VM with local memory.

  • I’d never given much thought to how research that targets how cloud infrastructure works is carried out. This paper really helped shed some light in that area for me.

  • I wonder if this could be integrated with linux’s swapping/paging mechanism - what if cold pages were “swapped” to non-local memory? Could be an interesting future direction, and maybe illustrate how the OS could make more fine-grained decisions instead of relying on the predictive engine + monitoring.

  • In general, this approach seems like it could benefit from more integration with the OS. Rather than treating VMs as opaque, I think allowing users or compilers to tag memory requests as fast/slow (incentivized by costs perhaps?) could yield to more interesting systems, but I suppose that’s also harder to implement and present as a paper.

Written on March 14, 2024