Harnessing AI for Visual Understanding: A Deep Dive into Azure Computer Vision

Harnessing AI for Visual Understanding: A Deep Dive into Azure Computer Vision

Empower Your Image Analysis: Exploring Azure Computer Vision API with Python for Advanced Visual Insights

ยท

6 min read

Unlock the Boundless Potential of Azure Computer Vision API for Image Analysis, Recognition, and Beyond Using Python

In this technical exposรฉ, we embark on a voyage into the heart of advanced image analysis and understanding with the Azure Computer Vision API and Python. With a deep dive into both code examples and the expansive capabilities of Azure's Computer Vision service, we will traverse the landscape of image object detection, optical character recognition (OCR), brand identification, facial analysis, sentiment assessment, and more. Let's embark on this quest of code and cognition, unveiling the latent capabilities of the Azure Computer Vision API

What is Azure Computer Vision

The cloud-based Azure AI Vision API provides developers with access to advanced algorithms for processing images and returning information. By uploading an image or specifying an image URL, Azure AI Vision algorithms can analyze visual content in different ways based on inputs and user choices. Learn more here

Let's Begin

Step 1 - Setting Up Azure Computer Vision

  1. Head on to Azure Portal and login with credentials

  2. Create a New Vision Service, under AI+ Machine Learning Resources

  1. Continue to Create Resources, add details of Resource Group and pricing Tier, based on usage

  1. Click Review+ create and Create Resource and wait for deployment

  1. Open Resources and Review the overview. Save KEY 1 and Endpoint for further use.

  1. Under networking Enable All networks

  2. Review Vision Studio, Although its not being used for this particular article

Custom DataSets and Models can be added here.

Step 2 - Installing Libraries

  1. python3 -m venv venv

  2. source venv/bin/activate

  3. pip install pip install azure-cognitiveservices-vision-computervision

In your Local PC ensure, you are using OpenSSL >=1.1.0

  1. openssl version

  2. brew install openssl

Step 3 - Code

Libraries & Client

import os
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials
import requests
import cv2
import numpy as np
import matplotlib.pyplot as plt
import io


key = <KEY 1>
endpoint = "https://<server_name>.cognitiveservices.azure.com/"

# Create a Computer Vision client
credentials = CognitiveServicesCredentials(key)
client = ComputerVisionClient(endpoint, credentials)

We are using a Creative Commons licensed Image

#Image
image_link = "https://upload.wikimedia.org/wikipedia/commons/5/51/Crossing_the_Hudson_River_on_the_George_Washington_Bridge_from_Fort_Lee%2C_New_Jersey_to_Manhattan%2C_New_York_%287237796950%29.jpg"

Functions

  1. Analyse Image

     # Analyse Image
     features = [VisualFeatureTypes.tags, VisualFeatureTypes.description]
     analysis = client.analyze_image(image_link, visual_features=features)
     # Print out the results
     print("Tags:")
     for tag in analysis.tags:
         print(tag.name, tag.confidence)
    
     print("\nDescription:")
     print(analysis.description.captions[0].text)
     print("============================================")
    

    Output

     Tags:
     text 0.999910831451416
     outdoor 0.9962673783302307
     road 0.9954007863998413
     sky 0.9767497181892395
     vehicle 0.970876932144165
     land vehicle 0.9642893075942993
     infrastructure 0.9435274600982666
     freeway 0.9393666982650757
     vehicle registration plate 0.9338929653167725
     highway 0.9248238801956177
     street 0.9214962720870972
     lane 0.9157841205596924
     billboard 0.8976778984069824
     thoroughfare 0.8912621736526489
     traffic 0.8668385744094849
     asphalt 0.8635005950927734
     wheel 0.8605480194091797
     sign 0.8491865992546082
     traffic sign 0.8463172912597656
     overpass 0.8442752361297607
     car 0.8412759304046631
     expressway 0.7439208626747131
     driving 0.649591863155365
     bridge 0.6195515394210815
    
     Description:
     a group of cars on a highway
     ============================================
    
    • features = [VisualFeatureTypes.tags, VisualFeatureTypes.description]: In this line, a list named features is created containing two elements: VisualFeatureTypes.tags and VisualFeatureTypes.description. These represent the specific visual features you want to analyze in the image, which include identifying tags and generating a description.

    • analysis = client.analyze_image(image_link, visual_features=features): This line of code invokes the analyze_image method of the client object, which is an instance of the ComputerVisionClient class. It sends a request to the Azure Computer Vision API to analyze the image at the specified image_link URL using the selected visual features from the features list. The result of this analysis is stored in the analysis variable.

  2. Perform OCR on an image

     ocr_results = client.recognize_printed_text(image_link)
    
     # Print out the recognized text
     for region in ocr_results.regions:
         for line in region.lines:
             for word in line.words:
                 print(word.text, end=' ')
             print()
    
     print("============================================")
    

    Output

     10 
     ii$JIJ' 
     NORTH 
     INTERSTATE 
     INTERSTATE 
     TO 87 
     I CLEARANCE 13-6" 
     EXIT 1 
     W 178 St 
     H Hudson 
     Pkwy 
     3/4 MILE 
     ============================================
    

    ocr_results = client.recognize_printed_text(image_link): This line of code invokes the recognize_printed_text method of the client object. It sends a request to the Azure Computer Vision API to perform Optical Character Recognition (OCR) on the image located at the image_link URL. The ocr_results variable will store the results of the OCR analysis.

  3. Detect objects in an image & Show on Image

     detect_objects_results = client.detect_objects(image_link)
     for object in detect_objects_results.objects:
         print(object.object_property)
         print(object.rectangle)
         print(object.confidence)
    
     #SHOW ON IMAGE
     try:
         # Get the image from the URL
         headers = {
             'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
         response = requests.get(image_link, headers=headers)
         if response.status_code == 200:
             image_data = response.content
    
             # Convert image data to a NumPy array
             image_np = np.frombuffer(image_data, np.uint8)
    
             # Decode the image using OpenCV
             image_cv2 = cv2.imdecode(image_np, cv2.IMREAD_COLOR)
    
             # Draw rectangles and labels
             for object in detect_objects_results.objects:
                 # Get rectangle
                 rect = object.rectangle
                 # Draw rectangle
                 cv2.rectangle(image_cv2, (rect.x, rect.y), (rect.x + rect.w, rect.y + rect.h), color=(255, 0, 0),
                               thickness=10)
                 # Draw label
                 cv2.putText(image_cv2, object.object_property, (rect.x, rect.y - 20), cv2.FONT_HERSHEY_PLAIN, 5,
                             (255, 0, 0), 5)
    
             # Show the image
             print("Image data successfully processed.")
             print("Image data type:", image_cv2.dtype)
             plt.imshow(image_cv2)
             plt.show()
         else:
             print("Failed to retrieve image data. HTTP status code:", response.status_code)
     except Exception as e:
         print("An error occurred:", e)
    

    Output

     car
     {'additional_properties': {}, 'x': 483, 'y': 1970, 'w': 448, 'h': 208}
     0.573
     car
     {'additional_properties': {}, 'x': 1191, 'y': 1990, 'w': 417, 'h': 359}
     0.571
     car
     {'additional_properties': {}, 'x': 1762, 'y': 1979, 'w': 330, 'h': 310}
     0.61
     car
     {'additional_properties': {}, 'x': 3288, 'y': 2082, 'w': 420, 'h': 292}
     0.573
     car
     {'additional_properties': {}, 'x': 2653, 'y': 1969, 'w': 641, 'h': 488}
     0.62
     Image data successfully processed.
     Image data type: uint8
    

    detect_objects_results = client.detect_objects(image_link): This line of code invokes the detect_objects method of the client object. It sends a request to the Azure Computer Vision API to detect objects in the image located at the image_link URL. The detect_objects_results variable will store the results of the object detection analysis.

Additional Analysis Functions

  1. Detect Domain-specific Content

     # Detect domain-specific content in an image
     detect_domain_results = client.analyze_image_by_domain("landmarks", image_link)
     for landmark in detect_domain_results.result["landmarks"]:
         print(landmark["name"], landmark["confidence"])
    
  2. Detect the type of image

     # Detect the type of image
     detect_type_results = client.analyze_image(image_link, visual_features=[VisualFeatureTypes.image_type])
     print("Clip art type:", detect_type_results.image_type.clip_art_type)
     print("Line drawing type:", detect_type_results.image_type.line_drawing_type)
    
  3. Detect Landmarks

     # Detect landmarks in an image
     detect_landmarks_results = client.analyze_image_by_domain("landmarks", image_link)
     for landmark in detect_landmarks_results.result["landmarks"]:
         print(landmark["name"], landmark["confidence"])
    
  4. Detect Celebrities

     detect_celebrities_results = client.analyze_image_by_domain("celebrities", image_link)
     for celebrity in detect_celebrities_results.result["celebrities"]:
         print(celebrity["name"], celebrity["confidence"])
    
  5. Generate Thumbnails

     # Generate a thumbnail image
     generate_thumbnail_results = client.generate_thumbnail(100, 100, image_link)
     # Save the thumbnail
     with open("thumbnail.png", "wb") as thumbnail_file:
         for chunk in generate_thumbnail_results:
             thumbnail_file.write(chunk)
    

Conclusion

In conclusion, this article delved into the impressive capabilities of the Azure Computer Vision API in tandem with Python's Azure SDK. By harnessing the power of Azure's advanced image analysis services, we embarked on a journey through a multitude of tasks, ranging from image tagging and description generation to optical character recognition (OCR) and object detection. Through this comprehensive guide, developers and image analysis enthusiasts have gained a deeper understanding of how to extract valuable insights from images, enabling them to make informed decisions across various domains. The Azure Computer Vision API, with its intuitive interface and robust capabilities, emerges as an indispensable resource for unlocking hidden information within images and shaping data-driven strategies across a myriad of industries. Whether it's automating content tagging, enhancing search functionality, or enabling smart image-based applications, the Azure Computer Vision API is poised to elevate the way we interact with and understand visual data.

Did you find this article valuable?

Support Ankur by becoming a sponsor. Any amount is appreciated!

ย