Architectural Reliability: Lifetime Reliability Characterization and Management of Many-Core Processors

William Song, Saibal Mukhopadhyay, Sudhakar Yalamanchili

Research output: Contribution to journalArticlepeer-review

17 Citations (Scopus)


This paper presents a lifetime reliability characterization of many-core processors based on a full-system simulation of integrated microarchitecture, power, thermal, and reliability models. Under normal operating conditions, our model and analysis reveal that the mean-time-to-failure of cores on the die show normal distribution. From the processor-level perspective, the key insight is that reducing the variance of the distribution can improve lifetime reliability by avoiding early failures. Based on this understanding, we present two variance reduction techniques for proactive reliability management; i) proportional dynamic voltage-frequency scaling (DVFS) and ii) coordinated thread swapping. A major advantage of using variance reduction techniques is that the improvement of system lifetime reliability can be achieved without adding design margins or spare components.

Original languageEnglish
Article number6860268
Pages (from-to)103-106
Number of pages4
JournalIEEE Computer Architecture Letters
Issue number2
Publication statusPublished - 2015 Jul 1

Bibliographical note

Funding Information:
This research was supported by the Semiconductor Research Corporation under task #2084.001, IBM/SRC Graduate Fellowshp, and Sandia National Laboratories.

Publisher Copyright:
© 2015 IEEE.

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture


Dive into the research topics of 'Architectural Reliability: Lifetime Reliability Characterization and Management of Many-Core Processors'. Together they form a unique fingerprint.

Cite this