The Azure Guide/Speech services

In this section, we take a look at another of Azure's services: speech and text recognition.

Introduction edit

It is a family of tools which enable developers to quickly and easily add speech and text (both ways) functionalities to their applications (which can be in various platforms). It also works online, and hence there is no need to pack anything with the program.

Unlike Compute Vision, this one does have a free tier[1], but it is significantly restricted in terms on the volume that can be processed. It is usually a better idea to use the paid tier for all but the most simple applications. Additionally, there is a dedicated 30-day trial which might be a better option[2].

Usage edit

Speech-to-text from Azure Speech Services, also known as speech-to-text, enables real-time transcription of audio streams into text that your applications, tools, or devices can consume, display, and take action on as command input. This service is powered by the same recognition technology that Microsoft uses for Cortana and Office products, and works seamlessly with the translation and text-to-speech.

It depends on the language used. For example, for Java projects, you need to use Maven[3]. It is recommended that you study the sample code and use that as a starting point to integrate in your final application. Note that there is no sample code available for Java when it comes for Text to Speech[4], so you'll have to try to understand it from the C# or Python code provided.

Remember that you'll need to get the Speech API key first from the Azure portal ( and create Speech API keys to use for your application.

If you are previously using Bing Speech or Custom Speech Service, you'll need to migrate to Azure's Speech services[5] [6].

References edit