DataCloudies: Voice Data Analytics using AWS Services

This model depicts the use-case to create the data insights from the audio conversations happen between patient and medical transcriptionist.

Essentially, Voice analytics data is a valuable asset to organizations in just about every industry. The data insights leverage better business intelligence, it’s a resource that consistently delivers game-changing growth, leading to vast and positive changes.

Many voice technologies are automating or simplifying communication between patients and medical agents. Intelligent AI Services can save clinical staff valuable time and complete tasks - like collecting the history of the patient, generate the report before meeting the doctors, suggest data insights to dockers and so on.

This use considered the sample medical transcripts of audio files as source input and performs the data processing such as convert the audio files to JSON data, extract the needed information, performs the ETL furthermore and finally creates the data insights.

Convert the MP3 files to text data
Analyze and Extract the needed information
Make the visualization.

1. Transcribe Module

Transcribe Module

There is a Lambda function created in Python with the help of Boto3, which will invoke transcribe API calls and convert the Audio file to JSON data.

Below snippet will describe the API invocation,

def lambda_handler(event, context):

try:

bucket = event['Records'][0]['s3']['bucket']['name']

get_bucket = s3_resource.Bucket(bucket)

for obj in get_bucket.objects.filter():

key = obj.key

if key[-4]=='.':

jobname = uuid.uuid1()

job_uri = "https://"+bucket+".s3.amazonaws.com/"+key

logger.info(job_uri)

transcribe.start_transcription_job(

TranscriptionJobName=str(jobname),

Media={'MediaFileUri': job_uri},

MediaFormat='mp3',

LanguageCode='en-US',

OutputBucketName=outfile_bucket)

except Exception as e:

logger.info(e)

#logger.info('Error getting object {} from bucket {}.'.format(key, bucket))

raise e

2. Comprehend Module

Comprehend will further process the JSON file taken from Transcribe output, and store the needed information into DynamoDB Table.

def __init__(self,ID, raw_text):

comprehend = boto3.client(service_name='comprehendmedical')

self.resume_input = raw_text

self.entity_list = comprehend.detect_entities(Text = self.resume_text_input)['Entities']

# Initailizing name

for entity in self.entity_list:

if entity['Type'] =='Name':

self.name = entity['Text']

break

Comprehend Medical API will get invoked for the given text and extracts the data fields as per the requirement. (Here I have extracted the "Name" for an instance). Store the data into the DynamoDB table as per below

db = boto3.resource('dynamodb', region_name = 'us-east-2')

table = db.Table(table_name)

response = table.put_item(

Item = {

'ID' : self.ID,

'DATE' : self.current_date,

'NAME' : self.name }

3. Data Insights

Transform the DynamoDB data into quicksights and perform the data analytics as per the requirement. (Here I have performed some analysis like finding the strength of medical against the form, frequency, and dosage of all the medicines, count of forms by generic name)

Pages

Voice Data Analytics using AWS Services

1. Transcribe Module

2. Comprehend Module

Recent Posts