Voice recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format. Designing a machine that mimics human behavior, especially the capability of speaking and responding to it, has intrigued engineers and scientists for centuries. Speech technologies
Voice recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format.
Designing a machine that mimics human behavior, especially the capability of speaking and responding to it, has intrigued engineers and scientists for centuries. Speech technologies have witnessed a dramatic transformation, from what started as a speech machine using resonance tubes to Graham Bell’s first recording device to Dictaphone and the first voice synthesizer, Voice Operating Demonstrator (VODER) to today’s smart virtual assistants like Apple’s Siri or Amazon’s Alexa . Thanks to the advancements in AI, Voice recognition technology is gaining popularity. According to a recent U.S. Cellular survey, 36% of smartphone owners use a virtual assistant daily and 30% use smart home technology daily. This connectivity is expected to increase with the number of devices and sensors predicted to rise 200% to 46 billion by 2021.
The idea is to transform recorded audio into a sequence of words, as an alternative to typing on the keyboard. From helping people with physical disabilities, transcription of interviews, learning a new language or accessing a file via voice commands, speech recognition finds use in a number of applications. Voice recognition systems facilitate the interaction with technology, enabling hands-free requests.
From 1952 to today.
The earliest voice recognition technologies could only comprehend digits. Audrey system, built by Bell Labs in 1952 considered to be the first speech recognition device, recognised only ten digits spoken by a single voice. This was followed by the Shoebox machine, developed by IBM in 1962, which could recognise 16 English words, 10 digits and 6 arithmetic commands.
The U.S. Department of Defence made great contributions towards the development speech recognition systems. From 1971 to 1976, it funded the DARPA SUR (Speech Understanding Research) program, which led to the development of Harpy by Carnegie Mellon that could comprehend 1011 words. At around the same time, the first commercial speech recognition company, Threshold Technology was founded and Bell Labs introduced a system that could interpret multiple people’s voices. In 1978, Texas Instruments introduced Speak & Spell, which was a milestone in speech development because of its use of speech chip, leading to more human-like digital synthesis sound. The development of hidden Markov model, which considered the probability of unknown sounds using statistics proved to be a major breakthrough, it even entered the home, in the form of Worlds of Wonder’s Julie doll.
Faster microprocessors
Thanks to the introduction of faster microprocessors, speech, in 1990, the world’s first speech recognition software for consumers was developed. It was the first continuous dictation software, meaning one did not have to pause between words. In 1992, Apple also produced its real-time continuous speech recognition system that could recognise as many as 20,000 words.
Smart Assistant
By 2001, speech recognition development had hit a plateau, until in 2008, Google emerged with its Google Voice Search application for iPhones. In 2010, Google introduced personalized recognition on Android devices which would record different users’ voice queries to develop an enhanced speech model. It consists of 230 billion English words. Eventually, Apple’s Siri was implemented in iPhone 4S in 2011, which relied on cloud computing as well.
The Breakthrough
A Stanford study revealed that speech recognition is now about three times as fast as typing on a cell phone. Once 8.5%, the error rate has now dropped to 4.9%. These technological advances have given rise to multiple applications like transcription assistant tools including Happy Scribe.
Little Known Facts About Speech Recognition Technology
Technically speaking, speech recognition goes way back to 1877 when Thomas Edison invented the phonograph, the first device to record and reproduce sound.
When it comes to speech recognition, accuracy is measured by a Word Error Rate calculation, which tracks how often a word is transcribed incorrectly.
Happy Scribe is thrilled to announce a new partnership with GALA, The Globalization and Localization Association, where Happy Scribe provides English subtitling services for GALA’s videos in 2024.
This article provides a step-by-step guide on creating SDH subtitles for movies, offering tips and techniques on how to make them accessible and visually appealing to viewers.
This article explores SDH subtitling as a complex art that enriches the viewing experience for the deaf and hard of hearing by blending dialogue, sound effects, and emotional depth. It discusses foundational aspects, methods, obstacles, and technological advancements in SDH subtitling, emphasizing the critical roles of precision, timing, and comprehensive audio cues, and anticipates the use of AI and cloud technology to improve subtitling accessibility and efficiency.
Exploring the intricate world of providing SDH subtitles for live broadcasts, this article delves into the technical, ethical, and logistical challenges of ensuring live shows are accessible to all, highlighting the importance of inclusivity in modern media.
This article delves into the crucial role of SDH subtitles in fostering media inclusivity and compliance, highlighting their importance in making content accessible to diverse audiences and ensuring adherence to accessibility laws.
This article examines the complexities of creating effective and accurate SDH subtitles, highlighting the technical, linguistic, and cultural challenges involved in making media content accessible and inclusive for all viewers.
With zettabytes of digital content being produced every minute, there has been an explosion of audiovisual (AV) content, with streaming platforms like Netflix and Amazon Prime Video, and video content platforms like YouTube, Vimeo, Patreon, and TikTok hosting huge amounts of videos.
This article covers the ways in which an interface can be designed and built to be usable, that is, easy to use while requiring minimal instruction or guidance.
This article addresses the technologies that are helping meet an explosion of global media demand with dubbed content, especially for short format, non-theatrical video.