How to Build a Text-to-Voice Application With JavaScript

This tutorial will cover how to convert text into speech using JavaScript using WebSpeechAPI. It will feature a simple interface where the user adds the text to be spoken, then clicks a button to generate the corresponding speech.

Our Text-to-Speech Demo

Here’s what we’re going to build. Type anything you want in the textarea, select the language you’ve written it in, and click the button to hear the result!

A Quick Introduction to the WebSpeechAPI

Before we build anything, let’s quickly get acquainted with the API we’ll be using.

The WebSpeechAPI is a JavaScript API that allows developers to add voice functionality to web applications. It consists of the speechSynthesis component, which enables text-to-voice conversion. SpeechSynthesis allows you to pass text content and other optional parameters such as language, pitch, rate, and voice.

It also has several methods for starting, pausing, resuming, and canceling. To generate voice from text, the first thing is to create an instance of the SpeechSynthesisUtterance component like this:

1	utterance = new SpeechSynthesisUtterance();

SpeechSynthesisUtterance is the part which represents the text you want the system to say. The next step is to specify the text content using the text property like this;

1	utterance.text = "Hello world";

Here we want the computer to say the words "Hello world".

Finally, call the speak() method, which will speak the given utterance (SpeechSynthesisUtterance object) defined above.

1	speechSynthesis.speak(utterance);

We can also set a language, for example, English -US,

1	utterance.lang = 'en-US'

If you don’t pass a language to the SpeechSynthesisUtterance constructor, the default language configured by the browser will be applied.

The SpeechSynthesis controller also provides a getVoices() method, which returns a list of system-supported voices for the current device, allowing users to choose a custom voice.

HTML Structure

Okay, let’s start building. The HTML Structure will consist of the following elements:

a <textarea> for the text to be converted.
A <select> element. Inside the select element, we will populate language options.
A generate <button> which, when clicked, will speak the text content provided.

To keep us focused on functionality, we’ll use Bootstrap to build the interface. Ensure you add the Bootstrap CDN link in your header like this:

 <link
  href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css"
  rel="stylesheet"
  integrity="sha384-QWTKZyjpPEjISv5WaRU9OFeRpok6YctnYmDr5pNlyT2bRjXh0JMhjY6hW+ALEwIH"
  crossorigin="anonymous"
/>

Add the HTML Structure.

<div class="container">
  <div class="message alert alert-warning" role="alert">
  </div>
  <h1>Text to Voice Converter</h1>
  <form>
  <div class="form-group">
    <label for="text">Enter your text:</label>
    <textarea name="text" class="content form-control form-control-lg" rows="6"></textarea>
  </div>
  <div class="form-group">
    <label for="voices">Choose your language:</label>
    <select class="select-voices form-control form-control-lg" name="voices">
    </select>
  </div>
  <button type="button" class="convert btn btn-primary">🔊 Convert Text to Voice</button>
  </form>
</div>

Additional Styling with CSS

Bootstrap handles pretty much all the styling for us. But let’s add some custom CSS properties to our design. These will give us a custom font, a container, some extra spacing for the elements in the form, and a rule to hide our alert message.

@import url("https://fonts.googleapis.com/css2?family=DM+Mono:ital,wght@0,300;0,400;0,500;1,300;1,400;1,500&display=swap");
2
body {
  font-family: "DM Mono", monospace;
}
.container {
  width: 100%;
  max-width: 600px;
  padding: 2rem 0;
}
.form-group {
  margin: 2rem 0;
}
label {
  margin-bottom: 1rem;
}
.message{
    display: none;
}

We have set display:none to the alert component so that it will only appear if there are error messages to display.

JavaScript Functionality

As I explained in the introduction, we can obtain voices using the speechSynthesis.getVoices() method; let’s start by getting and storing them in an array like this.

const voices = [
  { name: "Google Deutsch", lang: "de-DE" },
  { name: "Google US English", lang: "en-US" },
  { name: "Google UK English Female", lang: "en-GB" },
  { name: "Google UK English Male", lang: "en-GB" },
  { name: "Google español", lang: "es-ES" },
  { name: "Google español de Estados Unidos", lang: "es-US" },
  { name: "Google français", lang: "fr-FR" },
  { name: "Google हिन्दी", lang: "hi-IN" },
  { name: "Google Bahasa Indonesia", lang: "id-ID" },
  { name: "Google italiano", lang: "it-IT" },
  { name: "Google 日本語", lang: "ja-JP" },
  { name: "Google 한국의", lang: "ko-KR" },
  { name: "Google Nederlands", lang: "nl-NL" },
  { name: "Google polski", lang: "pl-PL" },
  { name: "Google português do Brasil", lang: "pt-BR" },
  { name: "Google русский", lang: "ru-RU" },
  { name: "Google 普通话（中国大陆）", lang: "zh-CN" },
  { name: "Google 粤語（香港）", lang: "zh-HK" },
  { name: "Google 國語（臺灣）", lang: "zh-TW" }
];

Identify the Required Elements

Next, use the Document Object Model (DOM) to obtain the alert, select, and button elements.

1	const optionsContainer = document.querySelector(".select-voices");
2	const convertBtn = document.querySelector(".convert");
3	const messageContainer = document.querySelector(".message")

Create Voices Selection

The optionsContainer represents the <select> element for the drop-down list of voices from which the user will select an option.

We want to populate it with the voices from the voices array. Create a function called addVoices().

1	function addVoices(){
2	// populate options with the voices from array
3
4	}

Inside the function, use the forEach() method to loop through the voices array, and for each voice object, set option.value = voice.lang and option.text = voice.name, then append the option to the select element.

function addVoices() {
  console.log(voices);
  voices.forEach((voice) => {
    let option = document.createElement("option");
    option.value = voice.lang;
    option.textContent = voice.name;
    optionsContainer.appendChild(option);
8
    if (voice.lang === "en-US") {
      option.selected = true;
    }
  });
}

We need to invoke the addVoices() function to apply the functionality, however, for the Chrome browser, we need to listen to the voiceschanged event and then call the addVoices() function. So we’ll add a condition:

if (navigator.userAgent.indexOf("Chrome") !== -1) {
  speechSynthesis.addEventListener("voiceschanged", addVoices);
} else {
  addVoices();
}

The voiceschanged event is a JavaScript event fired when the list of available speech synthesis voices changes. The event happens when the list of available voices is ready to use.

Button Event Listener

Add a click event listener to the generate button.

1	convertBtn.addEventListener("click", function () {
2	// display an alert message if content is empty
3	// pass the arguments to convertToSpeech()
4	});

Inside the event listener function, we want to display an alert if the content is not provided, get the text from the textarea, get the selected language, and pass the values to the convertToSpeech() function.

Update the event listener as follows.

convertBtn.addEventListener("click", function () {
  convertText = document.querySelector(".content").value;
3
  if (convertText === "") {
  messageContainer.textContent = " Please provide some text";
  messageContainer.style.display = "block";
7
  setTimeout(() => {
    messageContainer.textContent = ""; 
    messageContainer.style.display = "none";
  }, 2000);
12
  return;
}
15
  const selectedLang =
    optionsContainer.options[optionsContainer.selectedIndex].value;
18
19
  convertToSpeech(convertText, selectedLang);
});

Create the convertToSpeech() function and add the code below.

function convertToSpeech(text, lang) {
  if (!("speechSynthesis" in window)) {
    messageContainer.textContent =
      " Your browser is not supported, try another browser";
      messageContainer.style.display ="block"
    return;
  }
  let utterance = new SpeechSynthesisUtterance();
  utterance.lang = lang;
  utterance.text = text;
11
  speechSynthesis.speak(utterance);
13
}

The covertToSpeech() function will take the two parameters, i.e., the text to be converted and the language the text should be spoken in.

Let’s break it down:

First, we will check if the browser supports speech synthesis; if it doesn't, we will display the message “Your browser is not supported; try another browser”
If speech synthesis is supported, we will create a new SpeechSynthesisUtterance instance and assign it to the variable utterance.
Then we apply the text to the speech request with utterance.text and the language with utterance.lang.
Finally, the browser will speak the text using speechSynthesis.speak(utterance).

Conclusion

I hope you enjoyed this tutorial and learned something useful! We covered everything you need to create text-to-voice apps by leveraging the capabilities of WebSpeechApi. Incorporating text-to-voice functionality in your application will cater to diverse user needs and will improve its overall accessibility.

Let’s remind ourselves what we created:

Written by: Elis Wanyama
Posted on: April 17, 2024