Computer Engineering / Bilgisayar Mühendisliği

Permanent URI for this collectionhttps://hdl.handle.net/11147/10

Browse

Search Results

Now showing 1 - 7 of 7

FTGPGPU - Genel amaçlı grafik işlemci birimi uygulamaları için donanım hatası toleransı analizi
(2022) Öz, Işıl
Genel amaçlı hesaplamalar için grafik islemci birimlerinin (GPGPU) kullanımı, donanım hatalarının kritikligini arttırmakta, programların geçici hata hassasiyetini degerlendirmek ve uygun hata toleransı tekniklerini kullanmak daha önemli hale gelmektedir. Hataya en hassas program bölgelerinin korunması yoluyla, hem performansı, hem de güvenilirligi hedefleyen sistemler için ayrıntılı bölgesel hata hassasiyeti analizi çok önemlidir. Bu projede, GPGPU uygulamalarının geçici donanım hatası hassasiyetinin ölçülmesi, analiz edilmesi ve bu analizlerin sonuçlarının program özellikleri ile iliskilendirilmesi, seçimli hata toleransı yöntemi gelistirilmesi yoluyla kullanılması amaçlanmıstır. Projenin ilk katkısı, GPGPU uygulamlarının geçici hata hassasiyetlerinin bölgesel olarak belirlenmesi için yazılım ile donanım iliskisini saglayacak sekilde assembly seviyesinde hata ayıklayıcı tabanlı bir hata enjeksiyonu ve hata yayılımı analizi aracı gelistirilmesidir. Bu araç kullanılarak farklı yapıdaki, farklı özelliklere sahip GPGPU programlarının belirlenen kod bölgelerine hata enjeksiyonu saglayan deneyler yapılmıs, kod bölgelerinin hata hassasiyetleri ve olusan hatanın program süresince farklı veri yapılarına yayılımı incelenmistir. Projenin ikinci katkısı, GPGPU program kod parçalarının özellikleri ile bu kodlar çalısırken meydana gelebilecek hatalara hassasiyetleri arasındaki iliskinin incelenmesidir. GPGPU programlarındaki kod parçacıklarının performans ve mimari özellikleri profilleme ve simulasyon yöntemleriyle elde edilmis, ilk adımda gelistirilen hata enjeksiyonu aracıyla belirlenen kod parçalarına hata enjekte ederek uygulanan deney sonuçlarında sessiz veri bozunumu, çökme ve dogru çalısma durumları belirlenmistir. Program özellikleri-hata hassasiyeti ikilisi arasındaki iliski incelenerek program özellikleri verilen bir GPGPU uygulamasının hata hassasiyet degerleri makine ögrenmesi yöntemleriyle tahmin edilmistir. Gelistirilen tahminleme modelleriyle sessiz veri bozunumu için %82, çökme durumları için %87, dogru çalısma durumları için %96 dogruluk oranlarıyla tahminleme basarısı saglanmıstır. Projenin üçüncü katkısı, hataya daha hassas kod bölgelerinin çoklanmasına dayalı seçimli hata toleransı yöntemi gelistirilmesidir. Program gelistirici veya kullanıcı tarafından kaynak kodda isaretlenen kod bölgelerinin çoklanması seklinde gerçeklenen derleyici seviyesinde gelistirilen hata toleransı yapısı, belirtilen kernel fonksiyonlarının çoklanmasını artıklı kernel fonksiyonu olarak veya tek kernel fonksiyonu altında artıklı is parçacıgı olarak veya CUDA stream teknigi ile mümkün kılmaktadır. Böylece uygulamanın paralellik ve veri kullanımı özelliklerine göre farklı çoklama yürütme durumları seçilebilmekte, kaba taneli (coarsegrained) bir yapıda çıktı kontrolü ile performanslı bir sekilde çoklama saglanmaktadır.
Citation - WoS: 1
Citation - Scopus: 2
Predicting the Soft Error Vulnerability of Gpgpu Applications
(Institute of Electrical and Electronics Engineers Inc., 2022) Topçu, Burak; Öz, Işıl
As Graphics Processing Units (GPUs) have evolved to deliver performance increases for general-purpose computations as well as graphics and multimedia applications, soft error reliability becomes an important concern. The soft error vulnerability of the applications is evaluated via fault injection experiments. Since performing fault injection takes impractical times to cover the fault locations in complex GPU hardware structures, prediction-based techniques have been proposed to evaluate the soft error vulnerability of General-Purpose GPU (GPGPU) programs based on the hardware performance characteristics.In this work, we propose ML-based prediction models for the soft error vulnerability evaluation of GPGPU programs. We consider both program characteristics and hardware performance metrics collected from either the simulation or the profiling tools. While we utilize regression models for the prediction of the masked fault rates, we build classification models to specify the vulnerability level of the programs based on their silent data corruption (SDC) and crash rates. Our prediction models achieve maximum prediction accuracy rates of 96.6%, 82.6%, and 87% for masked fault rates, SDCs, and crashes, respectively.
Citation - WoS: 5
Citation - Scopus: 5
Regional Soft Error Vulnerability and Error Propagation Analysis for Gpgpu Applications
(Springer, 2021) Öz, Işıl; Karadaş, Ömer Faruk
The wide use of GPUs for general-purpose computations as well as graphics programs makes soft errors a critical concern. Evaluating the soft error vulnerability of GPGPU programs and employing efficient fault tolerance techniques for more reliable execution become more important. Protecting only the most error-sensitive program regions maintains an acceptable reliability level by eliminating the large performance overheads due to redundant operations. Therefore, fine-grained regional soft error vulnerability analysis is crucial for the systems targeting both performance and reliability. In this work, we present a regional fault injection framework and perform a detailed error propagation analysis to evaluate the soft error vulnerability of GPGPU applications. We evaluate both intra-kernel and inter-kernel vulnerabilities for a set of programs and quantify the severity of the data corruptions by considering metrics other than SDC rates. Our experimental study demonstrates that the code regions inside GPGPU programs exhibit different characteristics in terms of soft error vulnerability and the soft errors corrupting the variables propagate into the program output in several ways. We present the potential impact of our analysis by discussing the usage scenarios after we compile our observations acquired from our empirical work.
Citation - WoS: 1
Citation - Scopus: 1
A User-Assisted Thread-Level Vulnerability Assessment Tool
(Wiley, 2019) Öz, Işıl; Topçuoğlu, Haluk Rahmi; Tosun, Oğuz
The system reliability becomes a critical concern in modern architectures with the scale down of circuits. To deal with soft errors, the replication of system resources has been used at both hardware and software levels. Since the redundancy causes performance degradation, it is required to explore partial redundancy techniques that replicate the most vulnerable parts of the code. The redundancy level of user applications depends on user preferences and may be different for the users with different requirements. In this work, we propose a user-assisted reliability assessment tool based on critical thread analysis for redundancy in parallel architectures. Our analysis evaluates the application threads of a parallel program by considering their criticality in the execution and selects the most critical thread or threads to be replicated. Moreover, we extend our analysis by exploring critical regions of individual threads and execute redundantly only those regions to reduce redundancy overhead. Our experimental evaluation indicates that the replication of the most critical thread improves the system reliability more (up to 10% for blackscholes application) than the replication of any other thread. The partial thread replication based on critical region analysis also reduces the vulnerability of the system by considering a fine-grained approach.
Citation - WoS: 3
Citation - Scopus: 3
Regression-Based Prediction for Task-Based Program Performance
(World Scientific Publishing, 2019) Öz, Işıl; Bhatti, Muhammad Khurram; Popov, Konstantin; Brorsson, Mats
As multicore systems evolve by increasing the number of parallel execution units, parallel programming models have been released to exploit parallelism in the applications. Task-based programming model uses task abstractions to specify parallel tasks and schedules tasks onto processors at runtime. In order to increase the efficiency and get the highest performance, it is required to identify which runtime configuration is needed and how processor cores must be shared among tasks. Exploring design space for all possible scheduling and runtime options, especially for large input data, becomes infeasible and requires statistical modeling. Regression-based modeling determines the effects of multiple factors on a response variable, and makes predictions based on statistical analysis. In this work, we propose a regression-based modeling approach to predict the task-based program performance for different scheduling parameters with variable data size. We execute a set of task-based programs by varying the runtime parameters, and conduct a systematic measurement for influencing factors on execution time. Our approach uses executions with different configurations for a set of input data, and derives different regression models to predict execution time for larger input data. Our results show that regression models provide accurate predictions for validation inputs with mean error rate as low as 6.3%, and 14% on average among four task-based programs.
Citation - WoS: 7
Citation - Scopus: 11
Locality-Aware Task Scheduling for Homogeneous Parallel Computing Systems
(Springer Verlag, 2018) Bhatti, Muhammad Khurram; Öz, Işıl; Amin, Sarah; Mushtaq, Maria; Farooq, Umer; Popov, Konstantin; Brorsson, Mats
In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce execution time and energy consumption of parallel applications. Locality can be exploited at various hardware and software layers. For instance, by implementing private and shared caches in a multi-level fashion, recent hardware designs are already optimised for locality. However, this would all be useless if the software scheduling does not cast the execution in a manner that promotes locality available in the programs themselves. Since programs for parallel systems consist of tasks executed simultaneously, task scheduling becomes crucial for the performance in multi-level cache architectures. This paper presents a heuristic algorithm for homogeneous multi-core systems called locality-aware task scheduling (LeTS). The LeTS heuristic is a work-conserving algorithm that takes into account both locality and load balancing in order to reduce the execution time of target applications. The working principle of LeTS is based on two distinctive phases, namely; working task group formation phase (WTG-FP) and working task group ordering phase (WTG-OP). The WTG-FP forms groups of tasks in order to capture data reuse across tasks while the WTG-OP determines an optimal order of execution for task groups that minimizes the reuse distance of shared data between tasks. We have performed experiments using randomly generated task graphs by varying three major performance parameters, namely: (1) communication to computation ratio (CCR) between 0.1 and 1.0, (2) application size, i.e., task graphs comprising of 50-, 100-, and 300-tasks per graph, and (3) number of cores with 2-, 4-, 8-, and 16-cores execution scenarios. We have also performed experiments using selected real-world applications. The LeTS heuristic reduces overall execution time of applications by exploiting inter-task data locality. Results show that LeTS outperforms state-of-the-art algorithms in amortizing inter-task communication cost.
Saydam Artıklı Çalıştırma için Vekil Tasarım Örüntüsü Kullanımı
(CEUR Workshop Proceedings, 2018) Öz, Dündar; Öz, Sinan; Öz, Işıl
In this study, we propose a transparent model for reliable execution of object-oriented software. We design a generic object-oriented programming tool for redundant software execution to provide the desired level of reliability against transient hardware faults. To achieve this, we utilize the Proxy design pattern which is one of the well-known GoF design patterns that are formed to make software systems exible and easy to maintain. Proxy design pattern provides a controlled access and a transparent mechanism for adding new functionalities to an existing object when accessing it. Combining the instruments of dynamic proxy and annotations in Java programming language, we present, Redundant- Caller, a generic, transparent, and con gurable tool for redundant execution and majority voting. Our tool takes any object and creates a dynamic proxy for it which executes the methods of the object multiple times in separate threads, and performs majority voting on the background, requiring minimum amount of change in the original user code. Thanks to annotations, users can con gure the redundant execution scheme methodwise. Our experiments demonstrate that our tool provides a signi cant level of reliability to any object-oriented software with a reasonable amount of performance degradation through multithreaded execution.

Computer Engineering / Bilgisayar Mühendisliği

Browse

Filters

Settings

Sort By

Results per page

Search Results