Session Y: SIGNAL PROCESSING FORUM - PART 2
Monday, May 13, 09:00 13:00 h Professional Audio systems target a wide variety of applications for recording, mixing, musical instruments, studios, and broadcasting. The challenge for Pro Audio systems designers is to find solutions not only to satisfy some of the more common requirements (including high sample rates, high performance and high precision) but also to meet system-specific design goals such as minimizing latency in effects processors and bit error tracking in digital mixing consoles. We describe a new method for the estimation of multiple pitch information in recorded piano music. The method works in the time-domain and makes use of a self-generating database of all possible notes. First, we show how accurate polyphonic pitch detection can be achieved given an adequate database. Then an algorithm is proposed that generates the database from the music, using estimation of predominant pitches in the frequency-domain and pitch-shifting techniques. Both systems generate a MIDI representation of the original signal. This method -that can be generalized to any solo instrument- overcomes the usual constraints of the traditional frequency-domain approach regarding intervals and quantity of notes. This paper describes an experimental prototype system developed for vocal modification. This system aims to modify a source vocal sample to match the time evolution, pitch contour and amplitude envelope of a similarly sung target vocal sample, simulating a non-parametric transfer of singing techniques from the target vocalist to the source vocalist. The system is comprised of a time-varying time/pitch/amplitude modification engine, a pitch-detector and a subsystem for the generation of modification parameters. Although the system has yet to attain a desirable level of robustness, it has successfully generated interesting synthesized samples that demonstrate the idea of a hybridizing vocal performance. The current paper focuses on the design and implementation of a phoneme recognition algorithm that is used to extract the appropriate parameters in order to drive a 3d graphics facial expression and animation procedure. This is used to emulate speech generation to 3d modeled digital characters. At the first development step, LPC, STFT analysis, wavelets, cepstrum and pattern recognition techniques were tested for phoneme recognition and speaker classification. Then, 3d graphics facial expressions and phonemes were related in a library. A client server application that processes speech, combines library data via morphing techniques and generates a digital character, virtually speaking according to the given speech, was finally designed. Possible applications include cartoon dubbing and web based virtual teleconference. The use of perceptually based (lossy) audio codecs, like MPEG 1 - layer 3 ('mp3'), has become very popular in the last few years. However, at very high compression rates the perceptual quality of the signal is degraded, which is mainly exhibited as a loss of high frequencies. We propose an efficient algorithm for extending the bandwidth of an audio signal, with the goal to create a more natural sound. This is done by adding an extra octave at the high frequency part of the spectrum. The algorithm uses a non-linearity to generate the extended octave, and can be applied to music as well as speech. This also enables application to fixed or mobile communication systems. This paper introduces two signaling techniques that may be useful in broadcast applications for combining supplemental data with program material. Such data may be used to monitor audience participation, advertising placement or to convey additional program information (including lyrics and URLs). The first technique involves replacing unused bits, or fill bits, in fixed data-rate compression systems with information-carrying data in a way that does not affect the quality of the program material and does not necessitate modification of encoders or decoders. A second technique involves modulating the bandwidth of a signal and adding perceptually shaped noise to create an inaudible, constant-rate supplemental signal. Both techniques may be integrated with the Dolby Digital perceptual coding system. The overall frequency response of a graphic or parametric equalizer bank, consisting of a number of serially connected cut or boost equalizers, may show serious deviations from the user defined gain setting. The deviations are caused by mutual interference's of the equalizers, and depend on gains, quality factors and the center frequencies of the individual equalizers. This paper discusses the known interference compensation strategies and then introduces a new approach to efficiently counteract the undesired interference effects. It is based on the "Opposite Filter Concept": Based on the user defined parameter setting of an equalizer bank, a set of simple filters counteracting the interference's are adequately parameterized and serially inserted into the equalizer bank resulting in substantial diminution of the interference effects, and consequently in generation of an overall frequency response close to the desired one . This paper proposes the improvement of artificial reverberation through decomposition of input signal into subbands and use of different subband feedback delay networks (FDN). By this means non-uniform modal density is achieved and orders of FDNs are lowered. Evaluation of filter banks and corresponding subband FDNs are described. Comparison results, obtained through objective (simulation) and subjective (listening tests) analysis, are presented. |
|