Smart OCR solution using Xilinx Ultrascale+ and Vitis AI

The rich, precise high-level semantics embodied in the text helps understand the world around us and build autonomous-capable solutions that can be deployed in a live environment. Therefore, automatic text reading from natural environments, also known as scene text detection/recognition or PhotoOCR, has become an increasingly popular and important research topic in computer vision.

As the written form of human languages evolved, we developed thousands of unique font-families. When we add case (capitals/lower case/uni-case/small caps), skew (italic/roman), proportion (horizontal scale), weight, size-specific (display/text), swash, and serifization (serif/sans in super-families), the number grows in millions, and it makes text identification an exciting discipline for Machine Learning.

Xilinx as a choice for OCR solutions

Today, Xilinx powers 7 out of 10 new developments through its wide variety of powerful platforms and leads the FPGA-based system design trends. Softnautics chose Xilinx for implementing this solution because of the integrated Vitis™ AI stack and strong hardware capabilities.

Xilinx Vitis™ is a free and open-source development platform that packages hardware modules as software-callable functions and is compatible with standard development environments, tools, and open-source libraries. It automatically adapts software and algorithms to Xilinx hardware without the need for VHDL or Verilog expertise.

Selecting the right Xilinx Platform

The comprehensive and rich Xilinx toolset and ecosystem make prototyping a very predictable process expedites the development of the solutions to reduce overall development time by up to 70%.
Softnautics chose Xilinx Ultrascale+ platform as it offers the best of application processing and FPGA acceleration capabilities. It also provides impressive high-level synthesis capability resulting in 5x system-level performance per watt compared to earlier variants. It supports Xilinx Vitis AI that offers a wide range of capabilities to build AI inferencing using acceleration libraries.

Softnautics used Xilinx Vitis AI stack and acceleration utilizing the software to create a hybrid application and implemented LSTM functionality for effective sequence prediction by porting/migrating TensorFlow-lite to ARM. It is running on Processing Side (PS) using the N2Cube Software. Image pre- and post-processing was achieved using HLS through Vivado and Vitis was used for inferencing using CTPN (Connectionist Text Proposal Network). We eventually graduated the solution to real-time scene text detection with video pipeline and improved the model with a robust dataset.

Scene Text Detection

There are many implementations available, and new ones are being researched. Still, a series of grand challenges may still be encountered when detecting and recognizing text in the wild. The difficulties in natural scene mainly stem from three differences when compared to scripts in documents:

  • Diversity and Variability are arising from languages, colors, fonts, sizes, orientations, etc.
  • Vibrant background on which text is written
  • The aspect ratios and layouts of scene text may vary significantly

This type of solution has extensive applicability in various fields requiring real-time text detection on a video stream with higher accuracy and quick recognition. Few of these application areas are:

  • Parking validation — Cities and towns are using mobile OCR to validate if cars are parked according to city regulations automatically. Parking inspectors can use a mobile device with OCR to scan license plates of vehicles and check with an online database to see if they are permitted to park.
  • Mobile document scanning — A variety of mobile applications allow users to take a photo of a document and convert it to text. This OCR task is more challenging than traditional document scanners because photos have unpredictable image angles, lighting conditions, and text quality.
  • Digital asset management – The software helps organize rich media assets such as images, videos, and animations. A key aspect of DAM systems is the search-ability of rich media. By running OCR on uploaded images and video frames, DAM can make rich media searchable and enrich it with meaningful tags.

Softnautics team has been working on Xilinx FPGA based solutions that require design and software framework implementation. Our vast experience with Xilinx and understanding of intricacies ensured we took this solution from conceptualization to proof-of-concept within 4 weeks. Using our end-to-end solution building expertise, you can visualize your ideas with the fastest concept realization service on Xilinx Platforms and achieve greatly reduced time-to-market.

Read our success stories related to Machine Learning expertise to know more about our services for accelerated AI solutions.

Contact us at for any queries related to your solution or for consultancy.

Source: Xilinx

Scroll to Top