Regional Soft Error Vulnerability and Error Propagation Analysis for Gpgpu Applications
Loading...
Date
Authors
Öz, I.
Journal Title
Journal ISSN
Volume Title
Publisher
Open Access Color
Green Open Access
Yes
OpenAIRE Downloads
OpenAIRE Views
Publicly Funded
No
Abstract
The wide use of GPUs for general-purpose computations as well as graphics programs makes soft errors a critical concern. Evaluating the soft error vulnerability of GPGPU programs and employing efficient fault tolerance techniques for more reliable execution become more important. Protecting only the most error-sensitive program regions maintains an acceptable reliability level by eliminating the large performance overheads due to redundant operations. Therefore, fine-grained regional soft error vulnerability analysis is crucial for the systems targeting both performance and reliability. In this work, we present a regional fault injection framework and perform a detailed error propagation analysis to evaluate the soft error vulnerability of GPGPU applications. We evaluate both intra-kernel and inter-kernel vulnerabilities for a set of programs and quantify the severity of the data corruptions by considering metrics other than SDC rates. Our experimental study demonstrates that the code regions inside GPGPU programs exhibit different characteristics in terms of soft error vulnerability and the soft errors corrupting the variables propagate into the program output in several ways. We present the potential impact of our analysis by discussing the usage scenarios after we compile our observations acquired from our empirical work. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Description
Keywords
Fault injection, GPGPU programs, Soft error reliability, Fault tolerance, Program processors, Radiation hardening, Reliability analysis, Error propagation analysis, Fault tolerance techniques, General-purpose computations, Graphics programs, Performance and reliabilities, Reliability level, Reliable execution, Vulnerability analysis, Error correction
Fields of Science
0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology
Citation
WoS Q
Scopus Q

OpenCitations Citation Count
4
Volume
78
Issue
3
Start Page
4095
End Page
4130
PlumX Metrics
Citations
Scopus : 5
Captures
Mendeley Readers : 9
SCOPUS™ Citations
5
checked on Apr 28, 2026
Page Views
289
checked on Apr 28, 2026
Google Scholar™


