SP2018 > Keynotes

Conference Keynote Speakers

Prof. Hervé Bourlard, Idiap Research Institute and EPFL,Switzerland
Deep Learning for Automatic Speech Recognition Early Years and Current Trends. Read more...less

Bio: Hervé Bourlard is Director of the Idiap Research Institute, Full Professor at the Swiss Federal Institute of Technology Lausanne (EPFL), and Founding Director of the Swiss NSF National Centre of Competence in Research on “Interactive Multimodal Information Management (IM2)” (2001-2013). He is also an External Fellow of the International Computer Science Institute (ICSI), Berkeley, CA.

His main research interests mainly include statistical pattern classification, signal processing, multi-channel processing, artificial neural networks, and applied mathematics, with applications to a wide range of Information and Communication Technologies, including spoken language processing, speech and speaker recognition, language modeling, multimodal interaction, and augmented multi-party interaction.

H. Bourlard is the author/coauthor/editor of 8 books, and over 330 reviewed papers (including one IEEE paper award). He is a Fellow of IEEE and ISCA, and a Senior Member and Member of the European Council of ACM. He is the recipient of several scientific and entrepreneurship awards.

Abstract: Over the last few years, artificial neural networks, now often referred to as deep learning or Deep Neural Networks (DNNs) have significantly reshaped research and development in a variety of signal and information processing tasks, while further pushing the state-of-the-art in Automatic Speech Recognition (ASR).

In this talk, and starting with an historical account of DNNs, we will provide an overview of deep learning methodology applied to ASR, and recall/revisit key links with statistical inference, linear algebra, and more recent trends towards novel approaches such as sparse recovery modeling.

This overview will discuss the main feed-forward, convolutional or recurrent DNN properties, when either used as very efficient discriminant classifiers, strong posterior probability estimators (of the output classes conditioned on the input vectors in temporal context), or feature extractors. We will then discuss the impact of those properties on current and future ASR technology.

Dr. Niko Brummer, Nuance Communications, South Africa

Meta-Embeddings for Speaker Recognition. Read more...less

Bio: Niko Brummer received B.Eng (1986), M.Eng (1988) and Ph.D. (2010) degrees, all in electronic engineering, from Stellenbosch University. He worked as researcher at DataFusion (later called Spescom DataVoice), and AGNITIO and is currently with Nuance Communications.

Most of his research for the last 25 years has been applied to automatic speaker and language recognition and he has been participating in most of the NIST SRE and LRE evaluations in these technologies, from the year 2000 to the present. He has been contributing to the Odyssey Workshop series since 2001 and was organizer of Odyssey 2008 in Stellenbosch. His FoCal and Bosaris Toolkits are widely used for fusion and calibration in speaker and language recognition research.

His research interests include development of new algorithms for speaker and language recognition, as well as evaluation methodologies for these technologies. In both cases, his emphasis is on probabilistic modelling. He has worked with both generative (eigenchannel, JFA, i-vector PLDA) and discriminative (system fusion, discriminative JFA and PLDA) recognizers. In evaluation, his focus is on judging the goodness of classifiers that produce probabilistic outputs in the form of well calibrated class likelihoods.

Abstract: Embeddings in machine learning are low-dimensional representations of complex input patterns with the property that simple geometric operations like Euclidean distances and dot products can be used for classification and comparison tasks. In speaker recognition, the i-vector, extracted with the help of a Gaussian mixture model, is a good example of an embedding. Recently, more general embeddings extracted with deep neural nets, known as x-vectors, have been disrupting the long reign of the i-vector.

Although i-vectors and x-vectors are powerful representations of speaker information, they do form a bottleneck that fails to quantify the uncertainty about the speaker that is inherent in low quality inputs, such as short or noisy recordings. We propose meta-embeddings, a more powerful representation that is designed to allow for the propagation of this uncertainty. This ultimately allows for more accurate speaker recognition, especially in cases where the quality of the input is highly variable. Meta-embeddings live in Euclidean space and can be compared using dot products. Meta-embeddings can be interpreted as distributed embeddings, but they are also points in Hilbert function space, such that inner products in this space can be used for comparisons in the form of likelihood-ratio scores. This talk introduces the general theory, a first practical implementation and some encouraging experimental results.

Satellite Event Keynote Speaker

Dr. Maya Ackerman Santa Clara University & CEO of WaveAI Inc

Cluster Analysis: Bridging Theory and Practice. Read more...less

Bio: Maya Ackerman, PhD. specializes in Machine Learning, spanning from theoretical foundations to emerging applications in Computational Creativity. She received her PhD in Computer Science from the University Waterloo, was a Postdoctoral Fellow at Caltech and UC San Diego, and is currently an Assistant Professor in the Department of Computer Engineering at Santa Clara University. Maya is also a trained singer, specializing in opera. Dr. Ackerman is the CEO/Co-founder of WaveAI Inc., where she and her team created ALYSIA, an Artificial Intelligence that makes songwriting accessible to everyone. Maya is also the author of a Holocaust memoir, Running from Giants: The Holocaust Through the Eyes of a Child, which shared her grandfather's story as a child during the Holocaust.

Abstract: Clustering is a central unsupervised learning task with a wide variety of applications. However, in spite of its popularity, it lacks unified formal foundations, leaving a substantial gap between clustering theory and practice. We will look at efforts towards bridging this gap by considering various stages in the clustering pipeline, which spans from clusterability analysis, to clustering algorithm selection, concluding with the evaluation of cluster quality. We present recent findings on clusterability, which aims to determine whether data possesses sufficient cluster structure to warrant clustering, focusing on the identification of practical methods suitable for a given application. We will also discuss a framework for one of the most prominent practical problems in the field, the selection of a clustering algorithm for a specific task. Our framework rests on the identification of central properties capturing the input-output behavior of clustering paradigms. Using the weighted clustering setting, we will share properties that can be used to guide algorithm selection. We will conclude with open problems in the field. Joint work with: Naomi Brownstein, Andreas Adolffson, Shai Ben-David, Simina Branzei, and David Loker.

Platinum Sponsor

Silver Sponsor

Prof. Hervé Bourlard, Idiap Research Institute and EPFL,Switzerland Deep Learning for Automatic Speech Recognition Early Years and Current Trends. Read more...less

Dr. Niko Brummer, Nuance Communications, South Africa Meta-Embeddings for Speaker Recognition. Read more...less

Dr. Maya Ackerman Santa Clara University & CEO of WaveAI Inc Cluster Analysis: Bridging Theory and Practice. Read more...less

Platinum Sponsor

Professional Association Sponsor

Prof. Hervé Bourlard, Idiap Research Institute and EPFL,Switzerland
Deep Learning for Automatic Speech Recognition Early Years and Current Trends. Read more...less

Dr. Niko Brummer, Nuance Communications, South Africa

Meta-Embeddings for Speaker Recognition. Read more...less

Dr. Maya Ackerman Santa Clara University & CEO of WaveAI Inc

Cluster Analysis: Bridging Theory and Practice. Read more...less