Purity: Building Fast, Highly-Available Enterprise Flash Storage from Commodity Components

Appeared in Proceedings of SIGMOD 2015: Industrial Track. Work done entirely at Pure Storage. Posted here because Ethan Miller is a faculty member at UC Santa Cruz.

Abstract

Although flash storage has largely replaced hard disks in consumer class devices, enterprise workloads pose unique challenges that have slowed adoption of flash in “performance tier” storage appliances. In this paper, we describe Purity, the foundation of Pure Storage’s Flash Arrays, the first all-flash enterprise storage system to support compression, deduplication, and high-availability. Purity borrows techniques from modern database and key-value storage architectures, and introduces novel storage primitives that have wide applicability to data management systems. For instance, all writes in Purity are monotonic, and deletions are handled using an atomic predicate-based tuple elision primitive. Purity’s redundancy mechanisms are optimized for SSD failure modes and performance characteristics, allowing for fast recovery from component failures and lower space overhead than the best hard disk systems. We built deduplication and data compression schemes atop these primitives. Flash changes storage capacity/performance tradeoffs: unlike disk-based systems, flash deployments are rarely performance bound. A single Purity appliance can provide over 7 GiB/s of throughput on 32 KiB random I/Os, even through multiple device failures, and while providing asynchronous off-site replication. Typical installations have 99.9% latencies under 1ms, and production arrays average 5.4× data reduction and 99.999% availability. Purity takes advantage of storage performance increasing more rapidly than computational performance to build a simpler (with respect to engineering, installation, and management) scale-up storage appliance that supports hundreds of terabytes of highly-available, high-performance storage. The resulting performance and capacity supports many customer deployments of multiple applications, including scale-out and parallel systems, such as MongoDB and Oracle RAC, on a single Purity appliance.

Publication date:
May 2015

Authors:
John Colgrove
John Davis
John Hayes
Ethan L. Miller
Cary Sandvig
Russell Sears
Ari Tamches
Neil Vachharajani
Feng Wang

Projects:
Storage Class Memories

Available media

Full paper text: PDF

Bibtex entry

@inproceedings{colgrove-sigmod15,
  author       = {John Colgrove and John Davis and John Hayes and Ethan L. Miller and Cary Sandvig and Russell Sears and Ari Tamches and Neil Vachharajani and Feng Wang},
  title        = {Purity: Building Fast, Highly-Available Enterprise Flash Storage from Commodity Components},
  booktitle    = {Proceedings of SIGMOD 2015: Industrial Track},
  month        = may,
  year         = {2015},
}
Last modified 5 Aug 2020