Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Permanent URI for this collectionhttps://hdl.handle.net/11147/7148

Browse

Search Results

Now showing 1 - 6 of 6

Citation - WoS: 1
Citation - Scopus: 2
Predicting the Soft Error Vulnerability of Gpgpu Applications
(Institute of Electrical and Electronics Engineers Inc., 2022) Topçu, Burak; Öz, Işıl
As Graphics Processing Units (GPUs) have evolved to deliver performance increases for general-purpose computations as well as graphics and multimedia applications, soft error reliability becomes an important concern. The soft error vulnerability of the applications is evaluated via fault injection experiments. Since performing fault injection takes impractical times to cover the fault locations in complex GPU hardware structures, prediction-based techniques have been proposed to evaluate the soft error vulnerability of General-Purpose GPU (GPGPU) programs based on the hardware performance characteristics.In this work, we propose ML-based prediction models for the soft error vulnerability evaluation of GPGPU programs. We consider both program characteristics and hardware performance metrics collected from either the simulation or the profiling tools. While we utilize regression models for the prediction of the masked fault rates, we build classification models to specify the vulnerability level of the programs based on their silent data corruption (SDC) and crash rates. Our prediction models achieve maximum prediction accuracy rates of 96.6%, 82.6%, and 87% for masked fault rates, SDCs, and crashes, respectively.
Citation - WoS: 5
Citation - Scopus: 5
Regional Soft Error Vulnerability and Error Propagation Analysis for Gpgpu Applications
(Springer, 2021) Öz, Işıl; Karadaş, Ömer Faruk
The wide use of GPUs for general-purpose computations as well as graphics programs makes soft errors a critical concern. Evaluating the soft error vulnerability of GPGPU programs and employing efficient fault tolerance techniques for more reliable execution become more important. Protecting only the most error-sensitive program regions maintains an acceptable reliability level by eliminating the large performance overheads due to redundant operations. Therefore, fine-grained regional soft error vulnerability analysis is crucial for the systems targeting both performance and reliability. In this work, we present a regional fault injection framework and perform a detailed error propagation analysis to evaluate the soft error vulnerability of GPGPU applications. We evaluate both intra-kernel and inter-kernel vulnerabilities for a set of programs and quantify the severity of the data corruptions by considering metrics other than SDC rates. Our experimental study demonstrates that the code regions inside GPGPU programs exhibit different characteristics in terms of soft error vulnerability and the soft errors corrupting the variables propagate into the program output in several ways. We present the potential impact of our analysis by discussing the usage scenarios after we compile our observations acquired from our empirical work.
Citation - WoS: 1
Citation - Scopus: 1
A User-Assisted Thread-Level Vulnerability Assessment Tool
(Wiley, 2019) Öz, Işıl; Topçuoğlu, Haluk Rahmi; Tosun, Oğuz
The system reliability becomes a critical concern in modern architectures with the scale down of circuits. To deal with soft errors, the replication of system resources has been used at both hardware and software levels. Since the redundancy causes performance degradation, it is required to explore partial redundancy techniques that replicate the most vulnerable parts of the code. The redundancy level of user applications depends on user preferences and may be different for the users with different requirements. In this work, we propose a user-assisted reliability assessment tool based on critical thread analysis for redundancy in parallel architectures. Our analysis evaluates the application threads of a parallel program by considering their criticality in the execution and selects the most critical thread or threads to be replicated. Moreover, we extend our analysis by exploring critical regions of individual threads and execute redundantly only those regions to reduce redundancy overhead. Our experimental evaluation indicates that the replication of the most critical thread improves the system reliability more (up to 10% for blackscholes application) than the replication of any other thread. The partial thread replication based on critical region analysis also reduces the vulnerability of the system by considering a fine-grained approach.
Citation - WoS: 3
Citation - Scopus: 3
Regression-Based Prediction for Task-Based Program Performance
(World Scientific Publishing, 2019) Öz, Işıl; Bhatti, Muhammad Khurram; Popov, Konstantin; Brorsson, Mats
As multicore systems evolve by increasing the number of parallel execution units, parallel programming models have been released to exploit parallelism in the applications. Task-based programming model uses task abstractions to specify parallel tasks and schedules tasks onto processors at runtime. In order to increase the efficiency and get the highest performance, it is required to identify which runtime configuration is needed and how processor cores must be shared among tasks. Exploring design space for all possible scheduling and runtime options, especially for large input data, becomes infeasible and requires statistical modeling. Regression-based modeling determines the effects of multiple factors on a response variable, and makes predictions based on statistical analysis. In this work, we propose a regression-based modeling approach to predict the task-based program performance for different scheduling parameters with variable data size. We execute a set of task-based programs by varying the runtime parameters, and conduct a systematic measurement for influencing factors on execution time. Our approach uses executions with different configurations for a set of input data, and derives different regression models to predict execution time for larger input data. Our results show that regression models provide accurate predictions for validation inputs with mean error rate as low as 6.3%, and 14% on average among four task-based programs.
Citation - WoS: 7
Citation - Scopus: 11
Locality-Aware Task Scheduling for Homogeneous Parallel Computing Systems
(Springer Verlag, 2018) Bhatti, Muhammad Khurram; Öz, Işıl; Amin, Sarah; Mushtaq, Maria; Farooq, Umer; Popov, Konstantin; Brorsson, Mats
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce execution time and energy consumption of parallel applications. Locality can be exploited at various hardware and software layers. For instance, by implementing private and shared caches in a multi-level fashion, recent hardware designs are already optimised for locality. However, this would all be useless if the software scheduling does not cast the execution in a manner that promotes locality available in the programs themselves. Since programs for parallel systems consist of tasks executed simultaneously, task scheduling becomes crucial for the performance in multi-level cache architectures. This paper presents a heuristic algorithm for homogeneous multi-core systems called locality-aware task scheduling (LeTS). The LeTS heuristic is a work-conserving algorithm that takes into account both locality and load balancing in order to reduce the execution time of target applications. The working principle of LeTS is based on two distinctive phases, namely; working task group formation phase (WTG-FP) and working task group ordering phase (WTG-OP). The WTG-FP forms groups of tasks in order to capture data reuse across tasks while the WTG-OP determines an optimal order of execution for task groups that minimizes the reuse distance of shared data between tasks. We have performed experiments using randomly generated task graphs by varying three major performance parameters, namely: (1) communication to computation ratio (CCR) between 0.1 and 1.0, (2) application size, i.e., task graphs comprising of 50-, 100-, and 300-tasks per graph, and (3) number of cores with 2-, 4-, 8-, and 16-cores execution scenarios. We have also performed experiments using selected real-world applications. The LeTS heuristic reduces overall execution time of applications by exploiting inter-task data locality. Results show that LeTS outperforms state-of-the-art algorithms in amortizing inter-task communication cost.
Saydam Artıklı Çalıştırma için Vekil Tasarım Örüntüsü Kullanımı
(CEUR Workshop Proceedings, 2018) Öz, Dündar; Öz, Sinan; Öz, Işıl
In this study, we propose a transparent model for reliable execution of object-oriented software. We design a generic object-oriented programming tool for redundant software execution to provide the desired level of reliability against transient hardware faults. To achieve this, we utilize the Proxy design pattern which is one of the well-known GoF design patterns that are formed to make software systems exible and easy to maintain. Proxy design pattern provides a controlled access and a transparent mechanism for adding new functionalities to an existing object when accessing it. Combining the instruments of dynamic proxy and annotations in Java programming language, we present, Redundant- Caller, a generic, transparent, and con gurable tool for redundant execution and majority voting. Our tool takes any object and creates a dynamic proxy for it which executes the methods of the object multiple times in separate threads, and performs majority voting on the background, requiring minimum amount of change in the original user code. Thanks to annotations, users can con gure the redundant execution scheme methodwise. Our experiments demonstrate that our tool provides a signi cant level of reliability to any object-oriented software with a reasonable amount of performance degradation through multithreaded execution.

Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Browse

Filters

Settings

Sort By

Results per page

Search Results