June 28th, 2021

Introducing the Ingestion Client for Azure Speech

Services

[Speech](https://azure.microsoft.com/en-us/services/cognitive-services/speech-services/) is an Azure Cognitive Service that enables you to build scalable solutions that can handle a variety of speech-related tasks, such as transcribing audio, producing natural sounding voices, recognising who is speaking and translating speech. Today, we are introducing the **Ingestion Client,** an Azure [solution](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/ingestion) that will monitor your dedicated [Azure Storage](https://azure.microsoft.com/en-us/product-categories/storage/) container so that audio files landing in that storage are automatically transcribed. We created this [tool](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/ingestion) to help you set-up a full-blown, scalable and secure transcription pipeline through simple configuration and without any development effort. The **Ingestion Client** incorporates best practices to maximise transcription requests in terms of scaling (to hundreds of thousands of files), error management, retry logic and various other optimisations. The setup is carried out through ARM-deployment. The architecture of the solution this ARM template deploys is described in the figure below. ![Graphical user interface, diagram, application Description automatically generated](https://azurecomcdn.azureedge.net/mediahandler/acomblog/updates/UpdatesV2/blog/8fc4de7b-25bf-4a91-9eea-0985a1f5ee42.png) When a user uploads an audio file to the dedicated [Azure Storage](https://azure.microsoft.com/en-us/product-categories/storage/) container, timer triggered [Azure Functions](https://azure.microsoft.com/en-us/services/functions/) picks this file up and creates a transcription request using either the [Speech-to-text REST API v3.0](https://docs.microsoft.com/azure/cognitive-services/speech-service/rest-speech-to-text#speech-to-text-rest-api-v30) or [Speech SDK](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-sdk) (user’s choice). When the transcription is successfully completed, the solution writes the transcript to the containers from which the audio file was obtained. Additionally, users can choose to apply analytics on the transcript, produce reports or redact, all of which are the result of additional resources being deployed through the ARM template. Explore our [guide](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/ingestion/ingestion-client/Setup/guide.md) for more information about the tool and installation notes and download the code from this [Github ](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/ingestion)repo. * Speech-to-Text * Speech services * Operating System * SDK and Tools