History of Voice Recognition

André Bastié
André Bastié
Posted in Media
2 min read
History of Voice Recognition

Voice recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format. Designing a machine that mimics human behavior, especially the capability of speaking and responding to it, has intrigued engineers and scientists for centuries. Speech technologies

Voice recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format.

Designing a machine that mimics human behavior, especially the capability of speaking and responding to it, has intrigued engineers and scientists for centuries. Speech technologies have witnessed a dramatic transformation, from what started as a speech machine using resonance tubes to Graham Bell’s first recording device to Dictaphone and the first voice synthesizer, Voice Operating Demonstrator (VODER) to today’s smart virtual assistants like Apple’s Siri or Amazon’s Alexa . Thanks to the advancements in AI, Voice recognition technology is gaining popularity. According to a recent U.S. Cellular survey, 36% of smartphone owners use a virtual assistant daily and 30% use smart home technology daily. This connectivity is expected to increase with the number of devices and sensors predicted to rise 200% to 46 billion by 2021.

The idea is to transform recorded audio into a sequence of words, as an alternative to typing on the keyboard. From helping people with physical disabilities, transcription of interviews, learning a new language or accessing a file via voice commands, speech recognition finds use in a number of applications. Voice recognition systems facilitate the interaction with technology, enabling hands-free requests.

Keys

From 1952 to today.

The earliest voice recognition technologies could only comprehend digits. Audrey system, built by Bell Labs in 1952 considered to be the first speech recognition device, recognised only ten digits spoken by a single voice. This was followed by the Shoebox machine, developed by IBM in 1962, which could recognise 16 English words, 10 digits and 6 arithmetic commands.

The U.S. Department of Defence made great contributions towards the development speech recognition systems. From 1971 to 1976, it funded the DARPA SUR (Speech Understanding Research) program, which led to the development of Harpy by Carnegie Mellon that could comprehend 1011 words. At around the same time, the first commercial speech recognition company, Threshold Technology was founded and Bell Labs introduced a system that could interpret multiple people’s voices. In 1978, Texas Instruments introduced Speak & Spell, which was a milestone in speech development because of its use of speech chip, leading to more human-like digital synthesis sound. The development of hidden Markov model, which considered the probability of unknown sounds using statistics proved to be a major breakthrough, it even entered the home, in the form of Worlds of Wonder’s Julie doll.

Faster microprocessors

Thanks to the introduction of faster microprocessors, speech, in 1990, the world’s first speech recognition software for consumers was developed. It was the first continuous dictation software, meaning one did not have to pause between words. In 1992, Apple also produced its real-time continuous speech recognition system that could recognise as many as 20,000 words.

Smart Assistant

By 2001, speech recognition development had hit a plateau, until in 2008, Google emerged with its Google Voice Search application for iPhones. In 2010, Google introduced personalized recognition on Android devices which would record different users’ voice queries to develop an enhanced speech model. It consists of 230 billion English words. Eventually, Apple’s Siri was implemented in iPhone 4S in 2011, which relied on cloud computing as well.

The Breakthrough

A Stanford study revealed that speech recognition is now about three times as fast as typing on a cell phone. Once 8.5%, the error rate has now dropped to 4.9%. These technological advances have given rise to multiple applications like transcription assistant tools including Happy Scribe.

Little Known Facts About Speech Recognition Technology

  1. Technically speaking, speech recognition goes way back to 1877 when Thomas Edison invented the phonograph, the first device to record and reproduce sound.

  2. When it comes to speech recognition, accuracy is measured by a Word Error Rate calculation, which tracks how often a word is transcribed incorrectly.

Authors :

Akanksha Tiwari (akanksha.tiwari2@mail.dcu.ie) Saikruti Kesipeddi (saikruti.kesipeddi2@mail.dcu.ie) Sumer Jagda (sumer.jagda2@mail.dcu.ie)

Get 1 hour of Transcription for Free with Happy Scribe!

Happy Scribe is a Transcription plateform that convert all Formats of Audio & Video into text for more than +119 languages.

START FREE TRIAL ➜

Related posts

A lady making a video content with her computer and translating

Best Practices in SDH Subtitling for Professionals

Niek Leermakers
Niek Leermakers
Posted in Media
6 min read

This article explores SDH subtitling as a complex art that enriches the viewing experience for the deaf and hard of hearing by blending dialogue, sound effects, and emotional depth. It discusses foundational aspects, methods, obstacles, and technological advancements in SDH subtitling, emphasizing the critical roles of precision, timing, and comprehensive audio cues, and anticipates the use of AI and cloud technology to improve subtitling accessibility and efficiency.

A lady making an SDH content with her pc

How To Provide SDH Subtitles for Live Broadcasts

Niek Leermakers
Niek Leermakers
Posted in Media
6 min read

Exploring the intricate world of providing SDH subtitles for live broadcasts, this article delves into the technical, ethical, and logistical challenges of ensuring live shows are accessible to all, highlighting the importance of inclusivity in modern media.

subtitling

The Challenges in Producing Accurate SDH Subtitles

Niek Leermakers
Niek Leermakers
Posted in Media
6 min read

This article examines the complexities of creating effective and accurate SDH subtitles, highlighting the technical, linguistic, and cultural challenges involved in making media content accessible and inclusive for all viewers.

Media Localisation AI

How to Automate Media Localization Workflows with AI

Henni Paulsen
Henni Paulsen
Posted in Media
5 min read

With zettabytes of digital content being produced every minute, there has been an explosion of audiovisual (AV) content, with streaming platforms like Netflix and Amazon Prime Video, and video content platforms like YouTube, Vimeo, Patreon, and TikTok hosting huge amounts of videos.