Voice Data Analytics using AWS Services


This model depicts the use-case to create the data insights from the audio conversations happen between patient and medical transcriptionist. 

Essentially, Voice analytics data is a valuable asset to organizations in just about every industry. The data insights leverage better business intelligence, it’s a resource that consistently delivers game-changing growth, leading to vast and positive changes. 

Many voice technologies are automating or simplifying communication between patients and medical agents. Intelligent AI Services can save clinical staff valuable time and complete tasks - like collecting the history of the patient, generate the report before meeting the doctors, suggest data insights to dockers and so on. 


This use considered the sample medical transcripts of audio files as source input and performs the data processing such as convert the audio files to JSON data, extract the needed information, performs the ETL furthermore and finally creates the data insights. 

  • Convert the MP3 files to text data
  • Analyze and Extract the needed information
  • Make the visualization.

1. Transcribe Module

Convert the Audio File to JSON File
Transcribe Module




There is a Lambda function created in Python with the help of Boto3, which will invoke transcribe API calls and convert the Audio file to JSON data. 

Below snippet will describe the API invocation,

def lambda_handler(event, context):
    try:
        bucket = event['Records'][0]['s3']['bucket']['name']
        get_bucket = s3_resource.Bucket(bucket)
        for obj in get_bucket.objects.filter():
            key = obj.key
            if key[-4]=='.':
                jobname = uuid.uuid1()
                job_uri = "https://"+bucket+".s3.amazonaws.com/"+key
                logger.info(job_uri)
                transcribe.start_transcription_job(
                    TranscriptionJobName=str(jobname),
                    Media={'MediaFileUri': job_uri},
                    MediaFormat='mp3',
                    LanguageCode='en-US',
                    OutputBucketName=outfile_bucket)
    except Exception as e:
        logger.info(e)
        #logger.info('Error getting object {} from bucket {}.'.format(key, bucket))
        raise e

2. Comprehend Module






Comprehend will further process the JSON file taken from Transcribe output, and store the needed information into DynamoDB Table. 




def __init__(self,ID, raw_text):
           comprehend = boto3.client(service_name='comprehendmedical')
        self.resume_input = raw_text
        self.entity_list = comprehend.detect_entities(Text = self.resume_text_input)['Entities']

        # Initailizing name
        for entity in self.entity_list:
            if entity['Type'] =='Name':
                self.name = entity['Text']
                break




Comprehend Medical API will get invoked for the given text and extracts the data fields as per the requirement. (Here I have extracted the "Name" for an instance). Store the data into the DynamoDB table as per below

db = boto3.resource('dynamodb', region_name = 'us-east-2')
table = db.Table(table_name)
        response = table.put_item(
            Item = {
                'ID' : self.ID,
                'DATE' : self.current_date,
                'NAME' : self.name }



3. Data Insights




Transform the DynamoDB data into quicksights and perform the data analytics as per the requirement. (Here I have performed some analysis like finding the strength of medical against the form, frequency, and dosage of all the medicines, count of forms by generic name)










Recent Posts