Compiler-Managed Replication of Cuda Kernels for Reliable Execution of Gpgpu Applications
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Open Access Color
Green Open Access
Yes
OpenAIRE Downloads
OpenAIRE Views
Publicly Funded
No
Abstract
As Graphics Processing Units (GPUs) evolve for general-purpose computations besides inherently fault-tolerant graphics programs, soft error reliability becomes a first-class citizen in program design. Especially, safety-critical systems utilizing GPU devices need to employ fault-tolerance techniques to recover from errors in hardware components. While software-level redundancy approaches, based on the replication of the application code, offer high reliability for safe program execution, it is essential to perform redundancy by utilizing parallel execution units in the target architecture not to hurt performance with redundant computations. In this work, we propose redundancy approaches using the parallel GPU cores and implement a compiler-level redundancy framework that enables the programmer to configure the target GPGPU program for redundant execution. We run redundant executions for GPGPU programs from the PolyBench benchmark suite by applying our kernel-level redundancy approaches and evaluate their performance by considering the parallelism level of the programs. Our results reveal that redundancy approaches utilizing parallelism offered by GPU cores yield higher performance for redundant executions, while the programs that already make use of parallel GPU cores in their original form suffer from overhead caused by contention among redundant threads. © World Scientific Publishing Company.
Description
Kaya, Ercument/0000-0001-5073-8159; Oz, Isil/0000-0002-8310-1143
Keywords
compiler support, GPU computing, redundancy, soft errors
Fields of Science
0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology, 01 natural sciences
Citation
WoS Q
Scopus Q

OpenCitations Citation Count
N/A
Volume
33
Issue
14
Start Page
End Page
PlumX Metrics
Citations
Scopus : 0
Captures
Mendeley Readers : 1
Page Views
85
checked on May 01, 2026
Google Scholar™


