Página 1 dos resultados de 12 itens digitais encontrados em 0.007 segundos

Redução de dimensionalidade aplicada à diarização de locutor; Dimensionality reduction applied to speaker diarization

Silva, Sérgio Montazzolli
Fonte: Universidade Federal do Rio Grande do Sul Publicador: Universidade Federal do Rio Grande do Sul
Tipo: Dissertação Formato: application/pdf
Português
Relevância na Pesquisa
70.73302%
Atualmente existe uma grande quantidade de dados multimídia sendo geradas todos os dias. Estes dados são oriundos de diversas fontes, como transmissões de rádio ou televisão, gravações de palestras, encontros, conversas telefônicas, vídeos e fotos capturados por celular, entre outros. Com isto, nos últimos anos o interesse pela transcrição de dados multimídia tem crescido, onde, no processamento de voz, podemos destacar as áreas de Reconhecimento de Locutor, Reconhecimento de Fala, Diarização de Locutor e Rastreamento de Locutores. O desenvolvimento destas áreas vem sendo impulsionado e direcionado pelo NIST, que periodicamente realiza avaliações sobre o estado-da-arte. Desde 2000, a tarefa de Diarização de Locutor tem se destacado como uma das principáis frentes de pesquisa em transcrição de dados de voz, tendo sido avaliada pelo NIST por diversas vezes na última década. O objetivo desta tarefa é encontrar o número de locutores presentes em um áudio, e rotular seus respectivos trechos de fala, sem que nenhuma informação tenha sido previamente fornecida. Em outras palavras, costuma-se dizer que o objetivo é responder a questão "Quem falou e quando?". Um dos grandes problemas nesta área é se conseguir obter um bom modelo para cada locutor presente no áudio...

A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis

Gallardo-Antolín, Ascensión; Montero, Juan Manuel; King, Simon
Fonte: International Speech Communication Association Publicador: International Speech Communication Association
Tipo: info:eu-repo/semantics/publishedVersion; info:eu-repo/semantics/bookPart; info:eu-repo/semantics/conferenceObject
Publicado em //2014 Português
Relevância na Pesquisa
39.411655%
Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems must be able to deal with imperfect data (multi-speaker recordings, background and fore-ground music and noise), filtering out low-quality audio segments and creating mono-speaker clusters. In this paper we compare several architectures for combining speaker diarization and music and noise detection which improve the precision and overall quality of the segmentation.; This work has been carried out during the research stay of A. Gallardo-Antolín and J. M. Montero at the Centre for Speech Technology Research (CSTR), University of Edinburgh, supported by the Spanish Ministry of Education, Culture and Sports under the National Program of Human Resources Mobility from the I+D+i 2008-2011 National Program, extended by agreement of the Council of Ministers in October 7th, 2011. The work leading to these results has received funding from the European Union under grant agreement No 287678. It has also been supported by EPSRC Programme Grant grant...

A Novel Method For Speech Segmentation Based On Speakers' Characteristics

Abdolali, Behrouz; Sameti, Hossein
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 08/05/2012 Português
Relevância na Pesquisa
29.71004%
Speech Segmentation is the process change point detection for partitioning an input audio stream into regions each of which corresponds to only one audio source or one speaker. One application of this system is in Speaker Diarization systems. There are several methods for speaker segmentation; however, most of the Speaker Diarization Systems use BIC-based Segmentation methods. The main goal of this paper is to propose a new method for speaker segmentation with higher speed than the current methods - e.g. BIC - and acceptable accuracy. Our proposed method is based on the pitch frequency of the speech. The accuracy of this method is similar to the accuracy of common speaker segmentation methods. However, its computation cost is much less than theirs. We show that our method is about 2.4 times faster than the BIC-based method, while the average accuracy of pitch-based method is slightly higher than that of the BIC-based method.; Comment: 14 pages, 8 figures

Unsupervised Adaptation of SPLDA

Villalba, Jesús
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 20/11/2015 Português
Relevância na Pesquisa
28.696738%
State-of-the-art speaker recognition relays on models that need a large amount of training data. This models are successful in tasks like NIST SRE because there is sufficient data available. However, in real applications, we usually do not have so much data and, in many cases, the speaker labels are unknown. We present a method to adapt a PLDA model from a domain with a large amount of labeled data to another with unlabeled data. We describe a generative model that produces both sets of data where the unknown labels are modeled like latent variables. We used variational Bayes to estimate the hidden variables. Here, we derive the equations for this model. This model has been used in the papers: "UNSUPERVISED ADAPTATION OF PLDA BY USING VARIATIONAL BAYES METHODS" publised at ICASSP 2014, "Unsupervised Training of PLDA with Variational Bayes" published at Iberspeech 2014, and "VARIATIONAL BAYESIAN PLDA FOR SPEAKER DIARIZATION IN THE MGB CHALLENGE" published at ASRU 2015.; Comment: Technical Report, ViVolab, I3A, University of Zaragoza, Spain. arXiv admin note: text overlap with arXiv:1511.07318

The Hierarchical Dirichlet Process Hidden Semi-Markov Model

Johnson, Matthew J.; Willsky, Alan
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 15/03/2012 Português
Relevância na Pesquisa
28.045723%
There is much interest in the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) as a natural Bayesian nonparametric extension of the traditional HMM. However, in many settings the HDP-HMM's strict Markovian constraints are undesirable, particularly if we wish to learn or encode non-geometric state durations. We can extend the HDP-HMM to capture such structure by drawing upon explicit-duration semi-Markovianity, which has been developed in the parametric setting to allow construction of highly interpretable models that admit natural prior information on state durations. In this paper we introduce the explicitduration HDP-HSMM and develop posterior sampling algorithms for efficient inference in both the direct-assignment and weak-limit approximation settings. We demonstrate the utility of the model and our inference methods on synthetic data as well as experiments on a speaker diarization problem and an example of learning the patterns in Morse code.; Comment: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

A sticky HDP-HMM with application to speaker diarization

Fox, Emily B.; Sudderth, Erik B.; Jordan, Michael I.; Willsky, Alan S.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
50.31041%
We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006) 1566--1581]. Although the basic HDP-HMM tends to over-segment the audio data---creating redundant states and rapidly switching among them---we describe an augmented HDP-HMM that provides effective control over the switching rate. We also show that this augmentation makes it possible to treat emission distributions nonparametrically. To scale the resulting architecture to realistic diarization problems, we develop a sampling algorithm that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence, greatly improving mixing rates. Working with a benchmark NIST data set, we show that our Bayesian nonparametric architecture yields state-of-the-art speaker diarization results.; Comment: Published in at http://dx.doi.org/10.1214/10-AOAS395 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Speaker diarization and speech recognition in the semi-automatization of audio description : an exploratory study on future possibilities?

Delgado Flores, Héctor; Matamala, Anna; Serrano, Javier
Fonte: Universidade Autônoma de Barcelona Publicador: Universidade Autônoma de Barcelona
Tipo: Artigo de Revista Científica Formato: application/pdf
Publicado em //2015 Português
Relevância na Pesquisa
69.786133%
This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized; Este artículo presenta una visión panorámica de los componentes tecnológicos usados en el proceso de audiodescripción y propone un nuevo escenario en el que se aplicarían el reconocimiento de habla, la traducción automática y la síntesis de habla, con su correspondiente revisión humana, para incrementar la cantidad de audiodescripciones disponibles. El artículo describe un proceso en el que la diarización y el reconocimiento de habla permiten obtener una transcripción semiautomática de la audiodescripción. El artículo presenta detalladamente el proceso técnico así como un resumen de los resultados experimentales.- In a second language

The e-Sentencias prototype: a procedural ontology for legal multimedia applications in the Spanish Civil Courts

Casanovas, Pompeu; Binefa i Valls, Xavier; Gracia, Ciro; Teodoro, Emma; Galera, Núria; Blázquez, Mercedes; Poblet, Marta; Carrabina, Jordi; Montón i Macián, Màrius; Montero, Carlos; Serrano, Javier; López-Cobo, José Manuel
Fonte: Amsterdam IOS Press Publicador: Amsterdam IOS Press
Tipo: Parte de Livro Formato: application/pdf
Publicado em //2009; 2009 Português
Relevância na Pesquisa
28.045723%
Search, retrieval, and management of multimedia contents are challenging tasks for users and researchers alike. We introduce a software-hardware system for the global management of the multimedia contents produced by Spanish Civil Courts. The ultimate goal is to obtain an automatic classification of images and segments of the audiovisual records that, coupled with textual semantics, allows an efficient navigation and retrieval of judicial documents and additional legal sources. This paper describes our knowledge acquisition process, sets a typology of Spanish Civil hearings as performed in practice, and a preliminary procedural ontology at its actual stage of development (e-Sentencias ontology). A discussion on procedural, contextual and multimedia ontologies is also provided.

Developing ontologies for legal multimedia applications

Binefa i Valls, Xavier; Gracia, Ciro; Montón i Macián, Màrius; Carrabina, Jordi; Montero, Carlos; Serrano, Javier; Blázquez, Mercedes; Benjamins, Richard; Teodoro, Emma; Poblet, Marta; Casanovas, Pompeu
Fonte: Universidade Autônoma de Barcelona Publicador: Universidade Autônoma de Barcelona
Tipo: Conferência ou Objeto de Conferência Formato: application/pdf
Publicado em //2007 Português
Relevância na Pesquisa
28.045723%

Fast cross-session speaker diarization

Delgado Flores, Héctor
Fonte: [Barcelona] : Universitat Autònoma de Barcelona, Publicador: [Barcelona] : Universitat Autònoma de Barcelona,
Tipo: Tesis i dissertacions electròniques; info:eu-repo/semantics/doctoralThesis; info:eu-repo/semantics/publishedVersion Formato: application/pdf
Publicado em //2015 Português
Relevância na Pesquisa
61.133643%
Actualmente se crean, almacenan, editan y distribuyen grandes cantidades de contenidos audiovisuales, en parte debido a la capacidad de almacenamiento prácticamente ilimitada, al acceso a los medios necesarios por todo el mundo y en cualquier parte, y a la ubicua conectividad proporcionada por Internet. En este contexto, se requiere una gestión adecuada y sostenible que permita la búsqueda y recuperación de la información de interés. Es aquí donde las técnicas de procesamiento del habla juegan un papel crucial en el etiquetado y anotación automáticos de contenidos audiovisuales. La diarización de locutores es un proceso de apoyo clave para otros sistemas de procesamiento del habla, tales como el reconocimiento automático del habla y el reconocimiento automático de locutores, frecuentemente usados para la extracción automática de metadatos de documentos hablados. Entre las distintas colecciones de contenidos, puede haber locutores recurrentes que participen en diferentes sesiones dentro de una colección determinada (por ejemplo, en contenidos de televisión y radio). Dada la naturaleza local de la tecnología de diarización de locutores, un locutor recurrente arbitrario probablemente recibirá identificadores locales diferentes entre las distintas sesiones donde tal locutor participa. En esta situación tendría más sentido que los locutores recurrentes recibieran el mismo identificador abstracto...

Modeling Temporal and Spatial Data Dependence with Bayesian Nonparametrics

Ren, Lu
Fonte: Universidade Duke Publicador: Universidade Duke
Tipo: Dissertação Formato: 3738297 bytes; application/pdf
Publicado em //2010 Português
Relevância na Pesquisa
28.045723%

In this thesis, temporal and spatial dependence are considered within nonparametric priors to help infer patterns, clusters or segments in data. In traditional nonparametric mixture models, observations are usually assumed exchangeable, even though dependence often exists associated with the space or time at which data are generated.

Focused on model-based clustering and segmentation, this thesis addresses the issue in different ways, for temporal and spatial dependence.

For sequential data analysis, the dynamic hierarchical Dirichlet process is proposed to capture the temporal dependence across different groups. The data collected at any time point are represented via a mixture associated with an appropriate underlying model; the statistical properties of data collected at consecutive time points are linked via a random parameter that controls their probabilistic similarity. The new model favors a smooth evolutionary clustering while allowing innovative patterns to be inferred. Experimental analysis is performed on music, and may also be employed on text data for learning topics.

Spatially dependent data is more challenging to model due to its spatially-grid structure and often large computational cost of analysis. As a non-parametric clustering prior...

Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?; Diarización y reconocimiento de habla en la semiautomatización de la audiodescripción: un estudio exploratorio sobre posibilidades futuras

Delgado, Héctor; Universitat Autònoma de Barcelona; Matamala, Anna; Universitat Autònoma de Barcelona; Serrano, Javier; Universitat Autònoma de Barcelona
Fonte: Universidade Federal de Santa Catarina Publicador: Universidade Federal de Santa Catarina
Tipo: info:eu-repo/semantics/article; info:eu-repo/semantics/publishedVersion; ; experimental and technological research; Formato: application/pdf
Publicado em 17/06/2015 Português
Relevância na Pesquisa
69.786133%
http://dx.doi.org/10.5007/2175-7968.2015v35n2p308This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.; Este artículo presenta una visión panorámica de los componentes tecnológicos usados en el proceso de audiodescripción y propone un nuevo escenario en el que se aplicarían el reconocimiento de habla, la traducción automática y la síntesis de habla, con su correspondiente revisión humana, para incrementar la cantidad de audiodescripciones disponibles. El artículo describe un proceso en el que la diarización y el reconocimiento de habla permiten obtener una transcripción semiautomática de la audiodescripción. El artículo presenta detalladamente el proceso técnico así como un resumen de los resultados experimentales.