AES Milan 2018
Engineering Brief EB05
EB05 - e-Brief Posters—2
Friday, May 25, 16:30 — 18:00 (Arena 2)
EB05-1 Musical Polyphony Estimation—Saarish Kareer, University of Rochester - Rochester, NY, USA; Sattwik Basu, University of Rochester - Rochester, NY, USA
Knowing the number of sources present in a mixture is useful for many computer audition problems such as polyphonic music transcription, source separation, and speech enhancement. Most existing algorithms for these applications require the user to provide this number thereby limiting the possibility of complete automatization. In this paper we explore a few probabilistic and machine learning approaches for an autonomous source number estimation. We then propose an implementation of a multi-class classification method using convolutional neural networks for musical polyphony estimation. In addition, we use these results to improve the performance of an instrument classifier based on the same dataset. Our final classification results for both the networks, prove that this method is a promising starting point for further advancements in unsupervised source counting and separation algorithms for music and speech.
Engineering Brief 434 (Download now)
EB05-2 Variability of Speech to Reverberation Modulation Energy Ratio—Przemek Maziewski, Intel Technology Poland - Gdansk, Poland; Adam Kupryjanow, Intel Technology Poland - Gdansk, Poland
This paper illustrates variability of the speech to reverberation modulation energy ratio (SRMR). The presented experiments indicate SRMR inconsistencies per user, per utterance, and per microphone position. Additionally, the results show that the normalization available in the reference SRMR implementation does not limit the variability to an acceptable range. Further the paper presents a study of SRMR and reverberation time (RT) correlation. Experiments suggest that a precise relation between SRMR and RT can only be obtained for a specific utterance coming from a known user.
Engineering Brief 435 (Download now)
EB05-3 Introducing a Dataset of Guitar Sounds for Electric Guitars Model Recognition—Renato Profeta, Ilmenau University of Technology - Ilmenau, Germany; Gerald Schuller, Ilmenau University of Technology - IImenau, Germany; Fraunhofer Institute for Digital Media technology (IDMT) - Ilmenau, Germany
This engineering brief introduces a dataset of electric guitar sounds. The main goal of the dataset is to provide a set of electric guitar recordings that can be used for research in identification and/or classification of different electric guitar types. The dataset, at its current stage, consists of recordings from 30 guitars of different manufacturers and types, with around 3500 music events. All audio files are acquired in one-channel, 16-bit waveform audio file format with a sampling rate of 44100 Hz and are accompanied by parameter annotations in xml format. The dataset is planned to include recordings of over 50 guitars and will be released in uncompressed wav file format under Creative Commons License.
Engineering Brief 436 (Download now)
EB05-4 Development of the 4-pi Sampling Reverberator, VSVerb—Preliminary Experiments—Masataka Nakahara, ONFUTURE Ltd. - Tokyo, Japan; SONA Corp. - Tokyo, Japan; Akira Omoto, Kyushu University - Fukuoka, Japan; Onfuture Ltd. - Tokyo, Japan; Yasuhiko Nagatomo, Evixar Inc. - Tokyo, Japan
The strategy for capturing and restoring acoustic properties of a 4-pi sound field is one of the important issues for immersive-sound productions. Especially for post-production work, they are required to have a lot of flexibility. While some strategies have been already proposed such as Ambisonics, WFS, and BoSC, they require much effort and have less flexibility. Therefore, the authors developed a 4-pi sampling reverberator, VSVerb, which restores 4-pi reverberation by using captured virtual sound sources on site. Moreover, it enables to adjust various acoustic parameters at the post-production stage. In order to verify feasibility of the proposed method, some preliminary experiments were conducted. As a result, effectiveness of the VSVerb is confirmed, and some future subjects are also found.
Engineering Brief 437 (Download now)
EB05-5 Practical Evaluation of Sweet Spot in Current Noise Reproduction Systems—Piotr Klinke, Intel Technology Poland - Gdansk, Poland; Roksana Kostyk, Intel Technology Poland - Gdansk, Poland; Jan Banas, Intel Technology Poland - Gdansk, Poland; Przemek Maziewski, Intel Technology Poland - Gdansk, Poland; Dominik Stanczak, Intel Technology Poland - Gdansk, Poland
Quality assessment for speech recognition systems is a complex problem involving many factors but it all starts and ends with a proper synthesis of user scenarios in a controlled acoustic environment. In this paper industry leading background noise reproduction methods are presented. Implementations of two ETSI standards for noise reproduction, the ES 202 396-1 and TS 103 224 are measured and compared to determine the vulnerability for sound pressure level and frequency response deviation around the setup center.
Engineering Brief 438 (Download now)
EB05-6 Open Hardware Mobile Wireless Serial Audio Transmission Unit for Acoustical Ecological Momentary Assessment Using Bluetooth RFCOMM—Sven Franz, Jade University of Applied Sciences - Oldenburg, Germany; Holger Groenewold, Jade University of Applied Sciences - Oldenburg, Germany; Inga Holube, Jade University of Applied Sciences - Oldenburg, Germany; Jörg Bitzer, Jade Hochschule Oldenburg - Oldenburg, Germany
Acoustical Ecological Momentary Assessment (EMA) is a necessary step towards understanding noise exposure in everyday life and can also help identify obstacles in the speech understanding of hearing impaired people. Smartphones would represent a desirable tool for this task, but no simple solution has been proposed yet. Stereo audio transmission between mobile devices via Bluetooth using the Advanced Audio Distribution Profile (A2DP) is a common technique. However, due to software restrictions in Android, this profile is limited to act as a source and is prohibited from receiving a stereo audio stream. In this contribution we present a solution to transmitting uncompressed stereo audio data via Radio Frequency Communication (RFCOMM) enabling an Android smartphone to act as a receiver. Although, in contrast to A2DP, this solution is limited to stereo, 16 kHz and 16 bit, the resulting audio quality is sufficient for speech signals and acoustical measurements.
Engineering Brief 439 (Download now)
EB05-7 Sound Masking on iOS Devices. Masking Everyday Noises and Tinnitus—Lorenzo Rizzi, Suono e Vita - Acoustic Engineering - Lecco, Italy; Nadir Bertolasi, Suonoevita Ingegneria - Lecco, Italy; Gabriele Ghelfi, Suono e Vita - Acoustic Engineering - Lecco, Italy
The research started studying the mobile devices abilities and limits to perform an environmental noise analysis using the internal microphone recording. Perceptual masking algorithms have been implemented to specifically modify natural sounds for the analyzed noise. Then sound pleasantness algorithms have been applied to obtain better masking sounds. This set-up is being extended to the masking of tinnitus: the tinnitus frequency selection is being proposed to the user through sine-sweep listening.
Engineering Brief 440 (Download now)
EB05-8 Wearable Mobile Bluetooth Device for Stereo Audio Transmission to a Modified Android Smartphone—Holger Groenewold, Jade University of Applied Sciences - Oldenburg, Germany; Sven Franz, Jade University of Applied Sciences - Oldenburg, Germany; Inga Holube, Jade University of Applied Sciences - Oldenburg, Germany; Jörg Bitzer, Jade Hochschule Oldenburg - Oldenburg, Germany
For acoustical long-term measurements based on ecological momentary assessment (EMA), we needed a solution for sending A2DP audio data to an Android smartphone. Due to Android’s software limitations, the common smartphones cannot act as the necessary sink for stereo audio. Our innovation is a wearable Bluetooth device containing a wireless transmission unit with two microphones connected to a Nexus 5 smartphone. The Nexus 5 platform was chosen, since Google performed tests featuring an Android-powered car audio system, which enables a stereo Bluetooth sink. The proposed open hardware and software solution can be used for several audio application areas, where non-intrusive long-term observations are necessary, e.g., noise exposure dosimeters.
Engineering Brief 441 (Download now)
EB05-9 Theoretical and Experimental Study of an Acoustic Monopole Source—Pierluigi A. Argenta, Politecnico di Bari - Bari, Italy; Francesco Martellotta, Politecnico di Bari - Bari, Italy; Leonardo Soria, Politecnico di Bari - Bari, Italy
As it is well known, the acoustic monopole source plays a fundamental role in the fields of theoretical and experimental acoustics. In this work a sound source that well approximates the dynamic response of a theoretical monopole is designed and the parameters affecting its operational behavior are highlighted and optimized in terms of sound power and range of reproducible frequencies. We develop a hybrid lumped-parameter-finite-element model for describing the source operation. The model is first experimentally tested to validate its predictive effectiveness. Then, we perform a non-dimensional parametric optimization of the source directionality, obtaining relationships with which general design guidelines are identified, to minimize the far field directionality and, thus, achieve a quite omnidirectional behavior.
Engineering Brief 442 (Download now)
EB05-10 Development and Validation of a Full Range Acoustic Impedance Tube—Roman Schlieper, Leibniz Universität Hannover - Hannover, Germany; Song Li, Leibniz Universität Hannover - Hannover, Germany; Jürgen Peissig, Leibniz Universität Hannover - Hannover, Germany
The knowledge about the physical properties of materials is of high importance for research and development in acoustics. A standardized method for the determination of acoustic impedances is the impedance measuring tube based on the transfer functions method according to ISO 10534-2. This engineering brief presents the development of an impedance measuring tube with an internal diameter of 8 mm for acoustical impedance measurements in the range of 60 Hz to 20 kHz. The impedance tube was validated by comparison of the measurement results to analytical results of the rigid termination, the open-ended tube, and the empty sample holder.
Engineering Brief 443 (Download now)
EB05-11 Directivity and Electro-Acoustic Measurements of the IKO—Frank Schultz, University of Music and Performing Arts Graz - Graz, Austria; Markus Zaunschirm, University of Music and Performing Arts - Graz, Austria; IEM; Franz Zotter, IEM, University of Music and Performing Arts - Graz, Austria
The icosahedral loudspeaker (IKO) as a compact spherical array is capable of 3rd order Ambisonics (TOA) beamforming, and it is used as a musical and technical instrument. To develop and verify beamforming with its 20 loudspeakers flush-mounted into the faces of the regular icosahedron, electroacoustic properties must be measured. We offer a collection of measurement data of IEM’s IKO1, IKO2, and IKO3 along with analysis tools to inspect these properties. Multiple-input-multiple-output (MIMO) data comprises: (i) laser vibrometry measurements of the 20x20 transfer functions from driving voltages to loudspeaker velocities, (ii) 20x16 finite impulse responses (FIR) of the TOA decoding filters, and (iii) 648x20 directional impulse responses from driving voltages to radiated sound pressure. With the open data sets, open source code, and resulting directivity patterns, we intend to support reproducible research about beamforming with spherical loudspeaker arrays.
Engineering Brief 444 (Download now)