Página 1 dos resultados de 305 itens digitais encontrados em 0.005 segundos

Phonetic events from the labeling the european Portuguese database for speech synthesis, FEUP/IPB-DB

Teixeira, João Paulo; Freitas, D.; Braga, Daniela; Barros, Maria João; Latsch, Vagner
Fonte: ISCA Publicador: ISCA
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
57.466836%
In this paper a labeled new speech signal database (FEUP/IPB-DB) in Standard European Portuguese (hereafter SEP) is presented. The objective of this work is, on one hand, to provide phonetic material for Text-to-Speech (TTS) systems construction, either from the start or to improve the quality of existing ones, and, on the other hand, to place at service of the SEP scientific community a phonetically and prosodically valuable speech corpus, essential for Speech Synthesis or Phonetics research. Our purpose is to make it available for the scientific community, since there isn’t any other DB of its kind for EP. The main features of the database will be described as well as some basic statistical aspects. A discussion of some methodological problems and some observed phenomena in experimental phonetics deriving from the speech signal labeling is also done. The approach in our work is to produce a resource that can be further improved in subsequent steps with minimal re-work. The phonetic, linguistic and technical consistency are guaranteed through the involvement of a multidisciplinary team.

Didactic speech synthesizer – acoustic module, formants model

Teixeira, João Paulo; Fernandes, Anildo
Fonte: Instituto Politécnico de Bragança Publicador: Instituto Politécnico de Bragança
Tipo: Conferência ou Objeto de Conferência
Português
Relevância na Pesquisa
57.466836%
Text-to-speech synthesis is the main subject treated in this work. It will be presented the constitution of a generic text-to-speech system conversion, explained the functions of the various modules and described the development techniques using the formants model. The development of a didactic formant synthesiser under Matlab environment will also be described. This didactic synthesiser is intended for a didactic understanding of the formant model of speech production.

Didactic speech synthesizer – acoustic module, formants model

Teixeira, João Paulo; Fernandes, Anildo
Fonte: Instituto Politécnico de Bragança Publicador: Instituto Politécnico de Bragança
Tipo: Conferência ou Objeto de Conferência
Português
Relevância na Pesquisa
57.466836%
Text-to-speech synthesis is lhe main subjecl treated in this work. II will be presented the conslilution of a generic lext-to-speech system conversion, explained lhe functions 01 the various modules and described lhe developmenl lechniques using lhe formants model. The development of a didactic forman! synthesiser under Matlab environmenl will also be described. This didactic synthesiser is inlended for a didactic understanding of lhe formant modelaI speech producllon.

Estudo de um sistema de conversão texto-fala baseado em HMM; Study of a HMM-based text-to-speech system

Sarah Negreiros de Carvalho
Fonte: Biblioteca Digital da Unicamp Publicador: Biblioteca Digital da Unicamp
Tipo: Dissertação de Mestrado Formato: application/pdf
Publicado em 18/02/2013 Português
Relevância na Pesquisa
47.676797%
Com o contínuo desenvolvimento da tecnologia, há uma demanda crescente por sistemas de síntese de fala que sejam capazes de falar como humanos, para integrá-los nas mais diversas aplicações, seja no âmbito da automação robótica, sejam para acessibilidade de pessoas com deficiências, seja em aplicativos destinados a cultura e lazer. A síntese de fala baseada em modelos ocultos de Markov (HMM) mostra-se promissora em suprir esta necessidade tecnológica. A sua natureza estatística e paramétrica a tornam um sistema flexível, capaz de adaptar vozes artificiais, inserir emoções no discurso e obter fala sintética de boa qualidade usando uma base de treinamento limitada. Esta dissertação apresenta o estudo realizado sobre o sistema de síntese de fala baseado em HMM (HTS), descrevendo as etapas que envolvem o treinamento dos modelos HMMs e a geração do sinal de fala. São apresentados os modelos espectrais, de pitch e de duração que constituem estes modelos HMM dos fonemas dependentes de contexto, considerando as diversas técnicas de estruturação deles. Alguns dos problemas encontrados no HTS, tais como a característica abafada e monótona da fala artificial, são analisados juntamente com algumas técnicas propostas para aprimorar a qualidade final do sinal de fala sintetizado.; With the continuous development of technology...

Models of speech synthesis.

Carlson, R
Fonte: PubMed Publicador: PubMed
Tipo: Artigo de Revista Científica
Publicado em 24/10/1995 Português
Relevância na Pesquisa
47.73638%
The term "speech synthesis" has been used for diverse technical approaches. In this paper, some of the approaches used to generate synthetic speech in a text-to-speech system are reviewed, and some of the basic motivations for choosing one method over another are discussed. It is important to keep in mind, however, that speech synthesis models are needed not just for speech generation but to help us understand how speech is created, or even how articulation can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages are discussed as special challenges facing the speech synthesis community.

A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis

Gallardo-Antolín, Ascensión; Montero, Juan Manuel; King, Simon
Fonte: International Speech Communication Association Publicador: International Speech Communication Association
Tipo: info:eu-repo/semantics/publishedVersion; info:eu-repo/semantics/bookPart; info:eu-repo/semantics/conferenceObject
Publicado em //2014 Português
Relevância na Pesquisa
57.2424%
Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and fore-ground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.; This work has been carried out during the research stay of A. Gallardo-Antolín and J. M. Montero at the Centre for Speech Technology Research (CSTR), University of Edinburgh, supported by the Spanish Ministry of Education, Culture and Sports under the National Program of Human Resources Mobility from the I+D+i 2008-2011 National Program, extended by agreement of the Council of Ministers in October 7th, 2011. The work leading to these results has received funding from the European Union under grant agreement No 287678. It has also been supported by EPSRC Programme Grant grant...

Sintese e reconhecimento da fala humana; Synthesis and recognition of human speech

Rumiko Oishi Stolfi
Fonte: Biblioteca Digital da Unicamp Publicador: Biblioteca Digital da Unicamp
Tipo: Dissertação de Mestrado Formato: application/pdf
Publicado em 31/10/2006 Português
Relevância na Pesquisa
47.761006%
O objetivo deste trabalho é apresentar uma revisão dos principais conceitos e métodos envolvidos na síntese, processamento e reconhecimento da fala humana por computador.Estas tecnologias têm inúmeras aplicações, que têm aumentado substancialmente nos últimos anos com a popularização de equipamentos de comunicação portáteis (celulares, laptops, palmtops) e a universalização da Internet. A primeira parte deste trabalho é uma revisão dos conceitos básicos de processamento de sinais, incluindo transformada de Fourier, espectro de potência e espectrograma, filtros, digitalização de sinais e o teorema de Nyquist. A segunda parte descreve as principais características da fala humana, os mecanismos envolvidos em sua produção e percepção, e o conceito de fone (unidade lingüística de som). Nessa parte também descrevemos brevemente as principais técnicas para a conversão ortográfica-fonética, para a síntese de fala a partir da descrição fonética, e para o reconhecimento da fala natural. A terceira parte descreve um projeto prático que desenvolvemos para consolidar os conhecimentos adquiridos neste mestrado: um programa que gera canções populares japonesas a partir de uma descrição textual da letra de música...

Auto Spell Suggestion for High Quality Speech Synthesis in Hindi

Kabra, Shikha; Agarwal, Ritika
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 15/02/2014 Português
Relevância na Pesquisa
47.617603%
The goal of Text-to-Speech (TTS) synthesis in a particular language is to convert arbitrary input text to intelligible and natural sounding speech. However, for a particular language like Hindi, which is a highly confusing language (due to very close spellings), it is not an easy task to identify errors/mistakes in input text and an incorrect text degrade the quality of output speech hence this paper is a contribution to the development of high quality speech synthesis with the involvement of Spellchecker which generates spell suggestions for misspelled words automatically. Involvement of spellchecker would increase the efficiency of speech synthesis by providing spell suggestions for incorrect input text. Furthermore, we have provided the comparative study for evaluating the resultant effect on to phonetic text by adding spellchecker on to input text.; Comment: 4 pages, 5 figures. International Journal of Computer Applications, 2014

F0 Modeling In Hmm-Based Speech Synthesis System Using Deep Belief Network

Mukherjee, Sankar; Mandal, Shyamal Kumar Das
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 18/02/2015 Português
Relevância na Pesquisa
47.656743%
In recent years multilayer perceptrons (MLPs) with many hid- den layers Deep Neural Network (DNN) has performed sur- prisingly well in many speech tasks, i.e. speech recognition, speaker verification, speech synthesis etc. Although in the context of F0 modeling these techniques has not been ex- ploited properly. In this paper, Deep Belief Network (DBN), a class of DNN family has been employed and applied to model the F0 contour of synthesized speech which was generated by HMM-based speech synthesis system. The experiment was done on Bengali language. Several DBN-DNN architectures ranging from four to seven hidden layers and up to 200 hid- den units per hidden layer was presented and evaluated. The results were compared against clustering tree techniques pop- ularly found in statistical parametric speech synthesis. We show that from textual inputs DBN-DNN learns a high level structure which in turn improves F0 contour in terms of ob- jective and subjective tests.; Comment: OCOCOSDA 2014

A Waveform Representation Framework for High-quality Statistical Parametric Speech Synthesis

Fan, Bo; Lee, Siu Wa; Tian, Xiaohai; Xie, Lei; Dong, Minghui
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 06/10/2015 Português
Relevância na Pesquisa
47.70575%
State-of-the-art statistical parametric speech synthesis (SPSS) generally uses a vocoder to represent speech signals and parameterize them into features for subsequent modeling. Magnitude spectrum has been a dominant feature over the years. Although perceptual studies have shown that phase spectrum is essential to the quality of synthesized speech, it is often ignored by using a minimum phase filter during synthesis and the speech quality suffers. To bypass this bottleneck in vocoded speech, this paper proposes a phase-embedded waveform representation framework and establishes a magnitude-phase joint modeling platform for high-quality SPSS. Our experiments on waveform reconstruction show that the performance is better than that of the widely-used STRAIGHT. Furthermore, the proposed modeling and synthesis platform outperforms a leading-edge, vocoded, deep bidirectional long short-term memory recurrent neural network (DBLSTM-RNN)-based baseline system in various objective evaluation metrics conducted.; Comment: accepted and will appear in APSIPA2015; keywords: speech synthesis, LSTM-RNN, vocoder, phase, waveform, modeling

A Deep Learning Approach to Data-driven Parameterizations for Statistical Parametric Speech Synthesis

Muthukumar, Prasanna Kumar; Black, Alan W.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 30/09/2014 Português
Relevância na Pesquisa
47.614844%
Nearly all Statistical Parametric Speech Synthesizers today use Mel Cepstral coefficients as the vocal tract parameterization of the speech signal. Mel Cepstral coefficients were never intended to work in a parametric speech synthesis framework, but as yet, there has been little success in creating a better parameterization that is more suited to synthesis. In this paper, we use deep learning algorithms to investigate a data-driven parameterization technique that is designed for the specific requirements of synthesis. We create an invertible, low-dimensional, noise-robust encoding of the Mel Log Spectrum by training a tapered Stacked Denoising Autoencoder (SDA). This SDA is then unwrapped and used as the initialization for a Multi-Layer Perceptron (MLP). The MLP is fine-tuned by training it to reconstruct the input at the output layer. This MLP is then split down the middle to form encoding and decoding networks. These networks produce a parameterization of the Mel Log Spectrum that is intended to better fulfill the requirements of synthesis. Results are reported for experiments conducted using this resulting parameterization with the ClusterGen speech synthesizer.

Speaker and Expression Factorization for Audiobook Data: Expressiveness and Transplantation

Chen, Langzhou; Braunschweiler, Norbert; Gales, Mark J. F.
Fonte: IEEE Publicador: IEEE
Tipo: Article; accepted version
Português
Relevância na Pesquisa
47.788853%
This is the accepted manuscript. The final version is available from IEEE at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6995936&filter%3DAND%28p_IS_Number%3A7055953%29.; Expressive synthesis from text is a challenging problem. There are two issues. First, read text is often highly expressive to convey the emotion and scenario in the text. Second, since the expressive training speech is not always available for different speakers, it is necessary to develop methods to share the expressive information over speakers. This paper investigates the approach of using very expressive, highly diverse audiobook data from multiple speakers to build an expressive speech synthesis system. Both of two problems are addressed by considering a factorized framework where speaker and emotion are modelled in separate sub-spaces of a cluster adaptive training (CAT) parametric speech synthesis system. The sub-spaces for the expressive state of a speaker and the characteristics of the speaker are jointly trained using a set of audiobooks. In this work, the expressive speech synthesis system works in two distinct modes. In the first mode, the expressive information is given by audio data and the adaptation method is used to extract the expressive information in the audio data. In the second mode...

Unsupervised intralingual and cross-lingual speaker adaptation for HMM-based speech synthesis using two-pass decision tree construction

Gibson, Matthew; Byrne, William
Fonte: IEEE Transactions on Audio, Speech and Language Processing Publicador: IEEE Transactions on Audio, Speech and Language Processing
Tipo: Article; accepted version
Português
Relevância na Pesquisa
77.81948%
Hidden Markov model (HMM)-based speech synthesis systems possess several advantages over concatenative synthesis systems. One such advantage is the relative ease with which HMM-based systems are adapted to speakers not present in the training dataset. Speaker adaptation methods used in the field of HMM-based automatic speech recognition (ASR) are adopted for this task. In the case of unsupervised speaker adaptation, previous work has used a supplementary set of acoustic models to estimate the transcription of the adaptation data. This paper firstly presents an approach to the unsupervised speaker adaptation task for HMM-based speech synthesis models which avoids the need for such supplementary acoustic models. This is achieved by defining a mapping between HMM-based synthesis models and ASR-style models, via a two-pass decision tree construction process. Secondly, it is shown that this mapping also enables unsupervised adaptation of HMM-based speech synthesis models without the need to perform linguistic analysis of the estimated transcription of the adaptation data. Thirdly, this paper demonstrates how this technique lends itself to the task of unsupervised cross-lingual adaptation of HMM-based speech synthesis models, and explains the advantages of such an approach. Finally...

Unsupervised cross-lingual speaker adaptation for hmm-based speech synthesis using two-pass decision tree construction

Gibson, Matthew
Fonte: Universidade de Cambridge Publicador: Universidade de Cambridge
Tipo: Conferência ou Objeto de Conferência
Português
Relevância na Pesquisa
67.316006%
This paper demonstrates how unsupervised cross-lingual adaptation of HMM-based speech synthesis models may be performed without explicit knowledge of the adaptation data language. A two-pass decision tree construction technique is deployed for this purpose. Using parallel translated datasets, cross-lingual and intralingual adaptation are compared in a controlled manner. Listener evaluations reveal that the proposed method delivers performance approaching that of unsupervised intralingual adaptation.

Two-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models

Gibson, Matthew
Fonte: Universidade de Cambridge Publicador: Universidade de Cambridge
Tipo: Article; accepted version
Português
Relevância na Pesquisa
57.677%
Hidden Markov model (HMM) -based speech synthesis systems possess several advantages over concatenative synthesis systems. One such advantage is the relative ease with which HMM-based systems are adapted to speakers not present in the training dataset. Speaker adaptation methods used in the field of HMM-based automatic speech recognition (ASR) are adopted for this task. In the case of unsupervised speaker adaptation, previous work has used a supplementary set of acoustic models to firstly estimate the transcription of the adaptation data. By defining a mapping between HMM-based synthesis models and ASR-style models, this paper introduces an approach to the unsupervised speaker adaptation task for HMM-based speech synthesis models which avoids the need for supplementary acoustic models. Further, this enables unsupervised adaptation of HMM-based speech synthesis models without the need to perform linguistic analysis of the estimated transcription of the adaptation data.

Autoregressive HMMs for speech synthesis

Shannon, Matt; Byrne, William
Fonte: ISCA (International Speech Communication Association) Publicador: ISCA (International Speech Communication Association)
Tipo: Article; accepted version
Português
Relevância na Pesquisa
77.62627%
We propose the autoregressive HMM for speech synthesis. We show that the autoregressive HMM supports efficient EM parameter estimation and that we can use established effective synthesis techniques such as synthesis considering global variance with minimal modification. The autoregressive HMM uses the same model for parameter estimation and synthesis in a consistent way, in contrast to the standard HMM synthesis framework, and supports easy and efficient parameter estimation, in contrast to the trajectory HMM. We find that the autoregressive HMM gives performance comparable to the standard HMM synthesis framework on a Blizzard Challenge-style naturalness evaluation.; This research was funded by the European Community's Seventh Framework Programme (FP7/2007-2013), grant agreement 213845 (EMIME).

Autoregressive clustering for HMM speech synthesis

Shannon, Matt; Byrne, William
Fonte: ISCA (International Speech Communication Association) Publicador: ISCA (International Speech Communication Association)
Tipo: Article; accepted version
Português
Relevância na Pesquisa
57.42181%
The autoregressive HMM has been shown to provide efficient parameter estimation and high-quality synthesis, but in previous experiments decision trees derived from a non-autoregressive system were used. In this paper we investigate the use of autoregressive clustering for autoregressive HMM-based speech synthesis. We describe decision tree clustering for the autoregressive HMM and highlight differences to the standard clustering procedure. Subjective listening evaluation results suggest that autoregressive clustering improves the naturalness of the resulting speech. We find that the standard minimum description length (MDL) criterion for selecting model complexity is inappropriate for the autoregressive HMM. Investigating the effect of model complexity on naturalness, we find that a large degree of overfitting is tolerated without a substantial decrease in naturalness.; This research was funded by the European Community's Seventh Framework Programme (FP7/2007-2013), grant agreement 213845 (EMIME).

Short time spectrum analysis-synthesis of speech

Vujcic, Dusan
Fonte: Rochester Instituto de Tecnologia Publicador: Rochester Instituto de Tecnologia
Tipo: Tese de Doutorado
Português
Relevância na Pesquisa
47.741147%
This thesis presents an extensive study of the short time spectrum analysis-synthesis of speech. This type of speech processing attempts to synthesize speech without the short time Fourier transform phase information. Several existing analysis-synthesis methods are presented. In addition, five other methods are attempted experimentally. The results of these experiments are explained, identifying some very important properties of the short time spectrum analysis-synthesis. One of the methods was shown to synthesize a high fidelity speech. Finally, a comparison is made between this method and other known methods which are based on short time Fourier transform approach.

DESIGN AND DEVELOPMENT OF A SPEECH SYNTHESIS SOFTWARE FOR COLOMBIAN SPANISH APPLIED TO COMMUNICATION THROUGH MOBILE DEVICES

RUEDA CH.,HOOVER F.; CORREA P.,CLAUDIA V.; ARGUELLO FUENTES,HENRY
Fonte: DYNA Publicador: DYNA
Tipo: Artigo de Revista Científica Formato: text/html
Publicado em 01/06/2012 Português
Relevância na Pesquisa
57.304253%
In several scenarios of everyday life, there is a need to communicate orally with other people. However, various technological solutions such as mobile phones cannot be used in places such as meetings, classrooms, or conference rooms without disrupting the activities of people around the speaker. This research develops a tool that enables people to establish a conversation in a public place without disrupting the surrounding environment. To this end, a speech synthesizer is implemented on a personal computer connected to a cell phone, which allows one to establish a mobile call without using the human voice. The speech synthesizer uses the diphone concatenation technique and is developed specifically for the Spanish from Colombia. A mathematical description of the synthesizer shows the decomposition of the synthesizer into various mutually independent processes. Several user-acceptance and quality tests of the obtained signal were performed to evaluate the performance of the tool. The results show a high signal to noise ratio of generated signals and a high intelligibility of the tool.

Optical Character Recognition Based Speech Synthesis System Using LabVIEW

Singla,S. K.; Yadav,R.K.
Fonte: UNAM, Centro de Ciencias Aplicadas y Desarrollo Tecnológico Publicador: UNAM, Centro de Ciencias Aplicadas y Desarrollo Tecnológico
Tipo: Artigo de Revista Científica Formato: text/html
Publicado em 01/01/2014 Português
Relevância na Pesquisa
67.565835%
Knowledge extraction by just listening to sounds is a distinctive property. Speech signal is more effective means of communication than text because blind and visually impaired persons can also respond to sounds. This paper aims to develop a cost effective, and user friendly optical character recognition (OCR) based speech synthesis system. The OCR based speech synthesis system has been developed using Laboratory virtual instruments engineering workbench (LabVIEW) 7.1.