PERSES: Data Layout for Low Impact Failures

Published as Storage Systems Research Center Technical Report UCSC-SSRC-12-06.

Abstract

Disk failures remain common, and the speed of reconstruction has not kept up with the increasing size of disks. Thus, as drives become larger, systems are spending an increasing amount of time with one or more failed drives, potentially resulting in lower performance. However, if an application does not use data on the failed drives, the failure has negligible direct impact on that application and its users. We formalize this observation with PERSES, a data allocation scheme to reduce the performance impact of reconstruction after disk failure. PERSES reduces the length of degradation from the reference frame of the user by clustering data on disks such that working sets are kept together as much as possible. During a device failure, this co-location reduces the number of impacted working sets. PERSES uses statistical properties of data accesses to automatically de- termine which data to co-locate, avoiding extra administrative overhead. Trace-driven simulations show that, with PERSES, we can reduce the time lost due to failure during a trace by up to 80%, or more than 4000 project hours over the course of three years.

Publication date:
September 2012

Authors:
Avani Wildani
Ethan L. Miller
Ian Adams
Darrell D. E. Long

Projects:
Prediction and Grouping

Available media

Full paper text: PDF

Bibtex entry

@techreport{wildani-ssrctr1206,
  author       = {Avani Wildani and Ethan L. Miller and Ian Adams and Darrell D. E. Long},
  title        = {{PERSES}: Data Layout for Low Impact Failures},
  institution  = {University of California, Santa Cruz},
  number       = {UCSC-SSRC-12-06},
  month        = sep,
  year         = {2012},
}
Last modified 5 Aug 2020