Web Speech API: Voice Input for JavaScript

The Web Speech API gives browsers a native interface for voice input without any third-party plugin. You attach a SpeechRecognition object to a button or form field, call .start(), and the browser prompts the user for microphone access, transcribes the audio, and fires an onresult event with the text.

Support is widest in Chromium-based browsers. Firefox and Safari have partial or behind-flag implementations. If your target users are on Chrome or Edge, this API works without any polyfill.

How the API came together

The proposal originated at Google and was incubated by the HTML Speech Incubator Group, formed in August 2010. Members included Microsoft, Google, Voxeo, AT&T, Mozilla, and OpenReach. The group published a final report in December 2011 covering 17 voice-interaction use cases:

Voice Search
Voice command interfaces
Domain-specific contingent grammars
Continuous open-dialog recognition
Domain-specific grammars
Voice interfaces where no GUI is needed
Voice Activity Detection
Hello world demonstrations
Speech translation
Voicemail clients
Dialog systems
In-car voice handling
Multimodal interaction
Multimodal video

Two use cases were deferred: re-recognition and temporal synthesis structure for visual feedback.

Google published a JavaScript Voice API proposal in January 2012 that covered 15 of those 17 cases. The Speech API W3C Community Group, led by Glen Shires and Hans Wennborg of Google, then carried the spec forward. The API is not a formal W3C standard, but it is the de-facto implementation in Chrome and Edge.

Enabling voice on a form field

Add the speech attribute (plus the vendor prefix x-webkit-speech for broader Chrome compatibility) to any text input:

<input type="text" name="recognition" speech x-webkit-speech />

Chrome renders a small microphone icon inside the field. On Android, the keyboard surfaces a microphone key instead. Browsers that do not support the attribute display a standard text field.

Reading the result with JavaScript

The onspeechchange and onwebkitspeechchange events fire when the user finishes speaking into a form field. The example below confirms the transcribed text before accepting it:

function texto(input) {
  if (!confirm('Did you say ' + input.value + '?')) {
    input.value = '';
  }
}

<input
  type="text"
  name="number"
  speech
  x-webkit-speech
  onspeechchange="texto(this)"
  onwebkitspeechchange="texto(this)"
/>

Using the SpeechRecognition object directly

For more control, use window.SpeechRecognition (or the vendor-prefixed window.webkitSpeechRecognition in Chrome). This approach lets you trigger recognition from any user action and process results in JavaScript:

function obtenerTexto() {
  var RecognitionClass = window.SpeechRecognition || window.webkitSpeechRecognition;
  if (RecognitionClass) {
    var reconocimientoTexto = new RecognitionClass();
    reconocimientoTexto.onresult = function (event) {
      document.getElementById('test').innerHTML = event.results[0][0].transcript;
    };
    reconocimientoTexto.onerror = function (event) {
      console.error('Speech recognition error:', event.error);
    };
    reconocimientoTexto.start();
  } else {
    alert('Browser not supported');
  }
}

<input type="button" value="Listen" onclick="obtenerTexto()" />
<div id="test"></div>

Call .start() inside a user-gesture handler (click or key event). Browsers block microphone access if recognition starts without a direct user action.

Core events

Event	Fires when
`onstart`	Recognition begins and microphone is active
`onresult`	One or more results are available
`onerror`	Recognition fails (permission denied, no speech, network)
`onend`	Recognition session closes, with or without results

Setting the recognition language

var recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();
recognition.lang = 'en-US';
recognition.start();

Pass any BCP 47 language tag. 'es-MX' targets Mexican Spanish; 'en-GB' targets British English. The default is typically the browser’s UI language.

Reading the transcript

The result sits at event.results[0][0].transcript. The outer index selects the recognition phrase; the inner index selects the alternative hypothesis ranked by confidence. For single-phrase capture, [0][0] is always the top result.

Text-to-speech

The API specification also defines a SpeechSynthesis interface for converting text back to audio. As of 2026, window.speechSynthesis has broad support across Chrome, Edge, Firefox, and Safari:

var utterance = new SpeechSynthesisUtterance('Hello from the browser');
utterance.lang = 'en-US';
window.speechSynthesis.speak(utterance);

This completes the loop: capture voice with SpeechRecognition, process it in JavaScript, and respond with SpeechSynthesis.

What to check before shipping

HTTPS only. Microphone access requires a secure origin. localhost is the one exception for local development.
User gesture required. Call .start() inside a click or keydown handler. Autostarting on page load is blocked.
Graceful fallback. Detect support with if (window.SpeechRecognition || window.webkitSpeechRecognition) before creating the object, and surface a plain text input when the API is absent.
Error handling. The onerror event covers not-allowed (permission denied), no-speech, network, and aborted. Log these or surface a useful message to the user.

Programming assignments involving the Web Speech API typically combine JavaScript event handling, DOM manipulation, and browser API integration. If you need help building or debugging a project like this, the team at GeeksProgramming handles JavaScript and web development work. See Do My Programming Homework for details on how submissions work, including pricing from $29 and the 50% upfront / 50% after verification split.

Related reading: Functional Programming in JavaScript covers higher-order functions and patterns you’ll use when processing SpeechRecognition results. How Does React.js Work? shows how to wire browser APIs into a component lifecycle.

Web Speech API: Voice Input for JavaScript

How the API came together

Enabling voice on a form field

Reading the result with JavaScript

Using the SpeechRecognition object directly

Core events

Setting the recognition language

Reading the transcript

Text-to-speech

What to check before shipping

Related articles

Advanced Java Data Management Techniques

Java File I/O: Read, Write, and Manage Files

Exception Handling in Java: Full Guide

Stuck on a programming assignment?