Electrical - Electronic Engineering / Elektrik - Elektronik Mühendisliği

Permanent URI for this collectionhttps://hdl.handle.net/11147/11

Browse

Search Results

Now showing 1 - 10 of 13

Citation - Scopus: 6
Klasi̇k Türk Müzi̇ği̇ İ̇çin Otomati̇k Notaya Dökme Si̇stemi̇
(Institute of Electrical and Electronics Engineers, 2011) Bozkurt, Barış; Gedik, Ali Cenk; Karaosmanoğlu, M. Kemal
This study presents an automatic transcription system for Turkish music for the first time in literature. We first discuss the characteristics of Turkish music that are taken into consideration in the design of the system. Then, the following signal processing components of the system are described briefly in relation to each other and explaining their function in the system: f0 estimation, automatic tonic detection and makam recognition based on pitch distributions, frequency and duration quantization. © 2011 IEEE.
Citation - Scopus: 6
Music Information Retrieval for Turkish Music: Problems, Solutions and Tools
(Institute of Electrical and Electronics Engineers Inc., 2009) Bozkurt, Barış; Gedik, Ali Cenk; Karaosmanoğlu, M. Kemal
Bu çalışma bilgi erişimi uygulamaları açısından Türk müziğinin Batı müziği ile farklılıklarını tartışmaya açmaktadır. Türk müziği bilgi erişimi için frekans histogramı kullanımını önermekte ve otomatik karar sesi tespiti, makam sınıflandırma, ses sistemi analizi, kuram – icra uyuşma düzeyinin ölçülmesi gibi uygulamalar için geliştirilmiş bir dizi aracı içeren Makam Aracı (Makam Toolbox) 1.0’ın ve beraberinde büyük bir parametrik veritabanının tanıtımını yapmaktadır.
Citation - WoS: 67
Citation - Scopus: 78
Chirp Group Delay Analysis of Speech Signals
(Elsevier, 2007) Bozkurt, Barış; Couvreur, Laurent; Dutoit, Thierry
This study proposes new group delay estimation techniques that can be used for analyzing resonance patterns of short-term discrete-time signals and more specifically speech signals. Phase processing or equivalently group delay processing of speech signals are known to be difficult due to large spikes in the phase/group delay functions that mask the formant structure. In this study, we first analyze in detail the z-transform zero patterns of short-term speech signals in the z-plane and discuss the sources of spikes on group delay functions, namely the zeros closely located to the unit circle. We show that windowing largely influences these patterns, therefore short-term phase processing. Through a systematic study, we then show that reliable phase/group delay estimation for speech signals can be achieved by appropriate windowing and group delay functions can reveal formant information as well as some of the characteristics of the glottal flow component in speech signals. However, such phase estimation is highly sensitive to noise and robust extraction of group delay based parameters remains difficult in real acoustic conditions even with appropriate windowing. As an alternative, we propose processing of chirp group delay functions, i.e. group delay functions computed on a circle other than the unit circle in z-plane, which can be guaranteed to be spike-free. We finally present one application in feature extraction for automatic speech recognition (ASR). We show that chirp group delay representations are potentially useful for improving ASR performance. (c) 2007 Elsevier B.V. All rights reserved.
Citation - WoS: 4
Ramcess 2.x Framework-Expressive Voice Analysis for Realtime and Accurate Synthesis of Singing
(Springer Verlag, 2008) d'Alessandro, Nicolas; Babacan, Onur; Bozkurt, Barış; Dubuisson, Thomas; Holzapfel, Andre; Kessous, Loic; Vlieghe, Maxime
In this paper we present the work that has been achieved in the context of the second version of the RAMCESS singing synthesis framework. The main improvement of this study is the integration of new algorithms for expressive voice analysis, especially the separation of the glottal source and the vocal tract. Realtime synthesis modules have also been refined. These elements have been integrated in an existing digital instrument: the HANDSKETCH 1.X, a bimanual controller. Moreover this digital instrument is compared to existing systems.
Citation - WoS: 43
Citation - Scopus: 59
Causal-Anticausal Decomposition of Speech Using Complex Cepstrum for Glottal Source Estimation
(Elsevier Ltd., 2011) Drugman, Thomas; Bozkurt, Barış; Dutoit, Thierry
Complex cepstrum is known in the literature for linearly separating causal and anticausal components. Relying on advances achieved by the Zeros of the Z-Transform (ZZT) technique, we here investigate the possibility of using complex cepstrum for glottal flow estimation on a large-scale database. Via a systematic study of the windowing effects on the deconvolution quality, we show that the complex cepstrum causal-anticausal decomposition can be effectively used for glottal flow estimation when specific windowing criteria are met. It is also shown that this complex cepstral decomposition gives similar glottal estimates as obtained with the ZZT method. However, as complex cepstrum uses FFT operations instead of requiring the factoring of high-degree polynomials, the method benefits from a much higher speed. Finally in our tests on a large corpus of real expressive speech, we show that the proposed method has the potential to be used for voice quality analysis.
Citation - WoS: 86
Citation - Scopus: 101
A Comparative Study of Glottal Source Estimation Techniques
(Elsevier Ltd., 2012) Drugman, Thomas; Bozkurt, Barış; Dutoit, Thierry
Abstract: Source-tract decomposition (or glottal flow estimation) is one of the basic problems of speech processing. For this, several techniques have been proposed in the literature. However, studies comparing different approaches are almost nonexistent. Besides, experiments have been systematically performed either on synthetic speech or on sustained vowels. In this study we compare three of the main representative state-of-the-art methods of glottal flow estimation: closed-phase inverse filtering, iterative and adaptive inverse filtering, and mixed-phase decomposition. These techniques are first submitted to an objective assessment test on synthetic speech signals. Their sensitivity to various factors affecting the estimation quality, as well as their robustness to noise are studied. In a second experiment, their ability to label voice quality (tensed, modal, soft) is studied on a large corpus of real connected speech. It is shown that changes of voice quality are reflected by significant modifications in glottal feature distributions. Techniques based on the mixed-phase decomposition and on a closed-phase inverse filtering process turn out to give the best results on both clean synthetic and real speech signals. On the other hand, iterative and adaptive inverse filtering is recommended in noisy environments for its high robustness. © 2011 Elsevier Ltd. All rights reserved.
Citation - WoS: 13
Citation - Scopus: 20
Weighing Diverse Theoretical Models on Turkish Maqam Music Against Pitch Measurements: a Comparison of Peaks Automatically Derived From Frequency Histograms With Proposed Scale Tones
(Taylor and Francis Ltd., 2009) Bozkurt, Barış; Yarman, Ozan; Karaosmanoğlu, M. Kemal; Akkoç, Can
Since the early 20th century, various theories have been advanced in order to mathematically explain and notate modes of Traditional Turkish music known as maqams. In this article, maqam scales according to various theoretical models based on different tunings are compared with pitch measurements obtained from select recordings of master Turkish performers in order to study their level of match with analysed data. Chosen recordings are subjected to a fully computerized sequence of signal processing algorithms for the automatic determination of the set of relative pitches for each maqam scale: f0 estimation, histogram computation, tonic detection + histogram alignment, and peak picking. For nine well-recognized maqams, automatically derived relative pitches are compared with scale tones defined by theoretical models using quantitative distance measures. We analyse and interpret histogram peaks based on these measures to find the theoretical models most conforming with all the recordings, and hence, with the quotidian performance trends influenced by them.
Citation - WoS: 36
Citation - Scopus: 64
Pitch-Frequency Histogram-Based Music Information Retrieval for Turkish Music
(Elsevier Ltd., 2010) Gedik, Ali Cenk; Bozkurt, Barış
This study reviews the use of pitch histograms in music information retrieval studies for western and non-western music. The problems in applying the pitch-class histogram-based methods developed for western music to non-western music and specifically to Turkish music are discussed in detail. The main problems are the assumptions used to reduce the dimension of the pitch histogram space, such as, mapping to a low and fixed dimensional pitch-class space, the hard-coded use of western music theory, the use of the standard diapason (A4=440 Hz), analysis based on tonality and tempered tuning. We argue that it is more appropriate to use higher dimensional pitch-frequency histograms without such assumptions for Turkish music. We show in two applications, automatic tonic detection and makam recognition, that high dimensional pitch-frequency histogram representations can be successfully used in Music Information Retrieval (MIR) applications without such pre-assumptions, using the data-driven models. © 2009 Elsevier B.V. All rights reserved.
Citation - WoS: 1
Citation - Scopus: 2
Glottal Source Estimation Using an Automatic Chirp Decomposition
(Springer, 2010) Drugman, Thomas; Bozkurt, Barış; Dutoit, Thierry
In a previous work, we showed that the glottal source can be estimated from speech signals by computing the Zeros of the Z-Transform (ZZT). Decomposition was achieved by separating the roots inside (causal contribution) and outside (anticausal contribution) the unit circle. In order to guarantee a correct deconvolution, time alignment on the Glottal Closure Instants (GCIs) was shown to be essential. This paper extends the formalism of ZZT by evaluating the Z-transform on a contour possibly different from the unit circle. A method is proposed for determining automatically this contour by inspecting the root distribution. The derived Zeros of the Chirp Z-Transform (ZCZT)-based technique turns out to be much more robust to GCI location errors. © 2010 Springer-Verlag.
Citation - WoS: 32
Citation - Scopus: 61
Three Dimensions of Pitched Instrument Onset Detection
(Institute of Electrical and Electronics Engineers Inc., 2010) Holzapfel, Andre; Bozkurt, Barış; Stylianou, Yannis; Gedik, Ali Cenk; Gedik, Ali Cenk; Bozkurt, Barış
In this paper, we suggest a novel group delay based method for the onset detection of pitched instruments. It is proposed to approach the problem of onset detection by examining three dimensions separately: phase (i.e., group delay), magnitude and pitch. The evaluation of the suggested onset detectors for phase, pitch and magnitude is performed using a new publicly available and fully onset annotated database of monophonic recordings which is balanced in terms of included instruments and onset samples per instrument, while it contains different performance styles. Results show that the accuracy of onset detection depends on the type of instruments as well as on the style of performance. Combining the information contained in the three dimensions by means of a fusion at decision level leads to an improvement of onset detection by about 8% in terms of F-measure, compared to the best single dimension. © 2010 IEEE.

Electrical - Electronic Engineering / Elektrik - Elektronik Mühendisliği

Browse

Filters

Settings

Sort By

Results per page

Search Results