New M.E.Thesis Submitted from cse student


This thesis presents the design and evaluation of an approach to concatenative speech synthesis using the Time-Domain Pitch-Synchronous Overlap-Add (TD-PSOLA) signal processing algorithm. Concatenative synthesis systems make use of pre-recorded speech segments stored in a speech corpus. At synthesis time, the best segments available to synthesize the new utterances are chosen from the corpus using a process known as unit selection. During the synthesis process, the pitch and duration of these segments may be modified to generate the desired prosody. The TD-PSOLA algorithm provides an efficient and essentially successful solution to perform these modifications. Despite the popularity of the TD-PSOLA algorithm, research has been undertaken to address this recognized problem of distortion. The approach in the thesis has been developed towards reducing the perceived distortion that is introduced when TD-PSOLA is applied to speech. To investigate the occurrence of this distortion, a psychoacoustic evaluation of the effect of pitch modification using the TD-PSOLA algorithm is presented. Subjective experiments in the form of a set of listening tests were undertaken using word-level stimuli that had been manipulated using TD-PSOLA. The data collected from these experiments were analyzed for patterns of co- occurrence or correlations to investigate where this distortion may occur. The aim is to produce synthetic speech with less perceptible distortion and to apply pitch modifications on the concatenated speech. The success of the framework, in terms of generating synthetic speech with reduced distortion, was evaluated. A listening test showed that the TD-PSOLA balanced speech corpus may be capable of generating pitch-modified synthetic sentences with significantly less distortion than those generated using a typical phonetically balanced corpus.

Leisure Readings :