Transcribe - AWS


Transcribe - AWS Service



AWS Transcribe is a deep learning method that uses the automatic speech recognition (ASR) service to convert the audio files into text files. It is useful for customer service calls decoding, create captioning and subtitling, and generate metadata for media transcriptions as searchable archives.


This helps us to add speech to text capability to the applications very easily. We can analyze the audio files stored in Amazon S3 and have the service return a text file of the transcribed speech by calling the Amazon Transcribe API.

1. Initialize the service. Before this, store the Audio file in S3 Bucket. 

transcribe = boto3.client('transcribe')

2. Create the Lambda function with Python Runtime and call the Transcribe API


jobname = uuid.uuid1()

                job_uri = "https://"+bucket+".s3.amazonaws.com/"+key

                logger.info(job_uri)
                transcribe.start_transcription_job(
                    TranscriptionJobName=str(jobname),
                    Media={'MediaFileUri': job_uri},
                    MediaFormat='mp3',
                    LanguageCode='en-US',
                    OutputBucketName=outfile_bucket)

3. Check the Job status and verify the JSON output,

transcribe.get_transcription_job(TranscriptionJobName=jobname)

{
           "start_time":"0.710",
           "end_time":"1.080",
           "alternatives":[
               {
                   "confidence":0.91,
                   "word":"Hello"
               }
           ]
}


Recent Posts