qualitybyrich.com

An Improvement to Anthropometry-Based Head and Torso HRTF Synthesis Models for Locations Near the Frontal Median Plane: A Thesis paper (.pdf) presentation (.pdf)
05.05.2007 - Due to the recent proliferation of portable media devices, headphones (and earbuds) are becoming the primary means through which people experience recorded phenomena. In the absence of processing, headphone listeners typically perceive sounds as coming from inside of their heads rather than from the surrounding space. Head-related transfer function (HRTF) based algorithms attempt to rectify this issue; however, their need to be personalized for every individual through expensive, and often impractical, methods prevents these implementations from being effective. Such a need has produced an extensive body of research focused on linking the perceptually significant features of HRTFs to anthropometry. The ultimate goal of such work is to synthesize a complete set of personalized HRTFs strictly from morphological measurements. Recent research has produced an anthropometry-based head and torso (HAT) model that accurately approximates the effects that those body parts have on an incident sound. These HAT-based synthesis models produce very convincing lateral localization effects, and a weak sense of elevation far away from the median plane, but they lack the primary elevation cues that are caused by the external ear (pinna). The work presented herein adds pinna-based elevation cues to an existing HAT model that are most effective near the median plane--an area where the HAT’s torso-based elevation cues are particularly poor. The aforementioned cues are created by modeling the known resonances and the primary reflections of the external ear using digital filters whose parameters are determined from an individual’s anthropometry. The eventual result of cascading an existing HAT model with the introduced pinna model is the creation of customized HRTFs. Objective results are provided and indicate that the proposed synthesis method approximates the frequency response of measured HRTFs better than a simple HAT model. Psychoacoustic validation reveals that the model is effective at creating an accurate sense of elevation near the median plane for 67% of the subjects tested. This proves the hypothesis for certain cases and leaves room for future improvements.

Image Distortion Correction link
10.12.2006 - The images captured from low end digital cameras are often geometrically inaccurate due to the low quality of their lenses. This distortion can either be of the barrel effect or of the pincushion effect. The barrel effect gives the image an inflated appearance—the image looks as if it is on the edge of a blown up
balloon. Vertical lines that are supposed to be straight have an outward curvature to them. An image suffers from pincushion distortion if the opposite is true: vertical lines have an inward curvature to them. Both of these types of distortion, while they do affect the entire image to some extent, are more apparent at the extreme edges of the image. This paper examines barrel distortion, in particular, by devising and implementing an algorithm in MATLAB to rectify it.

An Analysis of Interpolated Finite Impulse Response Filters and Their Improvements link
10.12.2006 - This paper offers a brief overview of Interpolated Finite Impulse Response (IFIR) filters followed by a comprehensive and analytical literary survey of the improvements to the original design that have been made to reduce their computational complexity. Enhancements to the original design that are theoretically examined and compared are: stretching factor (L) optimization, arithmetic operation reduction, mipizing, alternate interpolator designs and coefficient re-quantization. Design examples are presented to accompany explanation and offer comparisons between various cases when applicable.

An Audio Units Plug-in to Simulate Spatial Audio Synthesis link
10.12.2006 - Using the fundamentals of head-related transfer functions (HRTFs) an Audio Units plug-in to binauralize a mono sound for accurate spatial perception through headphones is presented. This binauralization process is dependent upon the user to input the desired location and is performed in real time. In order to provide the background information necessary for understanding the implementation of the plug-in a summary of the relevant research conducted by the authors is presented in the first part of this paper. In the next section the motivation for the creation of such a plug-in is explained. The subsequent section details the steps of the plug-in’s algorithm and finally, in the concluding section, possibilities for future improvements and additions to the presented work are discussed.

Speech Production Using Concatenated Tubes link
09.01.2006 - Voiced speech sounds are synthesized by using a finite amount of concatenated tubes and an all pole model to approximate them. The second part of this project uses linear prediction to re-synthesize speech as it would be after a low bandwidth transmission (eg. a celluar phone).

Isolated Word Recognition link
09.01.2006 - A speaker dependent word recognition system is developed and implemented using Dynamic Time Warping--a popular, yet rudimentary, technique used in the speech processing world. Results are quantified and improvements are then made to the original algorithm. The results of the enhanced algorithm are also provided.

The Acoustic Features of Speech Sounds link
09.01.2006 - A study of the phonemes of the English language. Audio samples of words that demonstrate each phoneme are available along with time domain plots and spectrograms (narrowband and wideband) for both the phoneme and the word. Brief discussions on each category of phonemes are also included.