Softnautics

VLSI Design & Verification

Multi Voltage SoC Power Design Technique

Minimizing power consumption is a major factor that contributes to the modern-day development of IC designs, especially in the consumer electronics segment. The heating of the devices, the time it takes to turn on/off the features of handheld devices, battery life, etc are still under reforms. Hence it becomes important that best practices of chip design are adopted to aid the power consumption in SoCs (System on Chip) and other ICs (Integrated Circuit). According to market Research Future, the global System-on-Chip market was valued at USD 131.83 billion in 2021, and it is predicted to reach USD 214.8 Billion by the end of 2030, with a CAGR of 8.30 % from 2021 to 2030. The performance of the Silicon is greatly influenced by power management for SoCs and RTL designs. To attain power statistics, industries utilize power-aware designs. This blog’s focus is on multi-Voltage design terminology that can be used in HDL coding to determine silicon’s power performance. These aid in comprehending the design parameters when putting into practice power-conscious designs.

Multiple Voltage Design (Multi Voltage Power Domain) Method

Power supply has a direct relationship with dynamic power which consists of switching & short-circuit power. Therefore, reducing power naturally enhances power performance. The decreased threshold voltage causes an increase in gate delay. Lowering the voltage of SoC blocks is perhaps, the first design implementation that is used to meet power performance goals. In Figure 1 the system shows different voltage levels.

Figure 1

Lowering the voltage, lowers the current flowing and increases the delay in the gates, and by that means, the designs may not be able to run at desired clock frequencies. Lowering the voltage may cost the performance statistics but the performance can still be met as seen in Figure 1. Here the VLSI chip performance is achieved by lowering individual voltages of different modules.
Figure 1 can also be referred to as a multi-VDD design. The logic is partitioned into different domains called power domains. The structural model or a gate-level netlist derived from behavioural Verilog uses a different voltage line for each domain. The individual domains can be run as per performance objectives. Figure 2 shows an elaboration of the same.

Figure 2

The power intent derived using the IEEE standard 1801-2018 Unified Power Format UPF 3.1 is used by many companies to define the power parameters of a chip. The power architect makes use of this technology to create files that describe the power and power control intent of an electric design. Supply sets, power switches, level shifters, and memory retention techniques are all included in the annotation. Power states, transitions, a collection of sim states, the pg (power/ground pin) type and function properties of nets, and the -update parameter to assist the gradual refinement of the power intent are all definable descriptions of the potential power applied to the electronic system.

Requirements to create a multi-voltage design

Level shifters
As shown in figure 3. level shifters will shift the level of voltages to ensure different modules operating at different voltages will operate correctly when LS (Level shifter) circuits are attached. The circuits are implemented in HDL, and they can also be made to implement the drive strength. The figure shows a low to high voltage level shifter (A) and a high to low voltage level shifter (B). Vi and Vo are the sources and destinations of different voltage levels in different modules.

Figure 3

Power gating
The method in figure 4 is referred to as Disconnecting the power of the gates which are not in use. The figure shows an implementation of such a situation. Power gating is used to reduce leakage power. This step is performed at an architecture level while computing the performance factors of the low power module, or module at a sleep state while other priority modules are ON, or module where power is to be disconnected by software, or while shutting down the power.
Power gating is significantly used while modern-day traditional terms like SLEEP/WAKE events of the device. The wakeup and sleep sequence follow certain architectural decisions to enable or disable a sequence of operations that controls the power logic of the chip.

Figure 4

Special care must be taken while implementing power gating as output signals from a power gated block pose special challenges. This considers isolation and retention strategies at the micro-Architecture level while performing the wake-up or sleep sequence. The placement of circuits of retention and isolation strategy in the circuits should not affect the power performance factors. Retention cells are used to save the state of the chip to use during the wakeup sequence of a module. Figure 5 shows a state being saved upon the assertion of the saving sequence. Vdd_sw (Switched supply voltage) is controlled by switch and Vdd is always on voltages to power up the circuit. When save (saving sequence) is asserted, the output of the module is latched and is available as feedback.

Figure 5

Figure 6 illustrates where isolation cells are introduced when a shutdown or a sleep phase is isolated from the receiving end. Isolation cells keep them turned off and block output to a predefined value. In these ways isolation cells are attached to reduce crowbar currents, thereby reducing power leakage.

Figure 6

Clock Gating
This method is referred to as turning off the clock transitions when the circuit does not encounter switching of internal signals when there is no activity to be performed. This help controls the frequency of transitions from the power equation. Almost all EDA tools identify and support this.

The complexity of SoCs has expanded, introducing new demands for power management. The supply of the various SoC power domains must be flexible enough to be controlled by developers to control power dissipation and improve battery autonomy. Careful power analysis and knowledge of the capabilities of the tools at hand are prerequisites for selecting the best solutions. Power-related crises can be prevented by analysing power demand as early as possible in the design flow. Power goals are also made simpler to achieve by early analysis because higher-level techniques save the most power.

At Softnautics, we provide comprehensive semiconductor design and verification services including end-to-end ASIC/FPGA/SoC design from idea to realization to deployment. Our RTL design team can create power intent at module-system as well as chip level to meet power statistics of a predefined specification. We also have VLSI design & verification teams to validate the same power intent using static or dynamic verification.

Read our success stories related to VLSI Design Services to know more about our high-performance silicon services.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

Author: Sarth Rana

Sarth Rana is a Senior VLSI Engineer at Softnautics, a microelectronics enthusiast having experience working in RTL designs and verifying complicated ASICs like USB, and PCIe. He has worked on FPGAs, High-speed memory Verification, Functional verification, Low power verification, and Assertion based verification. His hobbies include sketching and reading books.

 

Model Compression Techniques for Edge AI

Deep learning is growing at a tremendous pace in terms of models and their datasets. In terms of applications, the deep learning market is dominated by image recognition followed by optical character recognition, and facial and object recognition. According to Allied market research, the global deep learning market was valued at$ 6.85 billion in 2020, and it is predicted to reach $ 179.96 billion by 2030, with a CAGR of 39.2% percent from 2021 to 2030. Well, at one point in time it was believed that large and complex models perform better, but now it’s almost a myth. With the evolution of Edge AI, more and more techniques came in to convert a large and complex model into a simple model that can be run on edge and all these techniques combine to perform model compression.

What is Model Compression?

Model Compression is a process of deploying SOTA (state of the art) deep learning models on edge devices that have low computing power and memory without compromising on models’ performance in terms of accuracy, precision, recall, etc. Model Compression broadly reduces two things in the model viz. size and latency. Size reduction focuses on making the model simpler by reducing model parameters, thereby reducing RAM requirements in execution and storage requirements in memory. Latency reduction refers to decreasing the time taken by a model to make a prediction or infer a result. Model size and latency often go together, and most techniques reduce both.

Popular Model Compression Techniques

Pruning
Pruning is the most popular technique for model compression which works by removing redundant and inconsequential parameters. These parameters in a neural network can be connectors, neurons, channels, or even layers. It is popular because it simultaneously decreases models’ size and improves latency.

Pruning

Pruning can be done while we train the model or even post-training. There are different types of pruning techniques which are weight/connection pruning, Neuron Pruning, Filter Pruning, and Layer pruning..

Quantization:
As we remove neurons, connections, filters, layers, etc. in pruning to lower the number of weighted parameters, the size of the weights is decreased during quantization. Values from a large set are mapped to values in a smaller set in this process. In comparison to the input network, the output network has a narrower range of values but retains most of the information. For further details on this method, you may read our in-depth article regarding model quantization here.

Knowledge Distillation
In the Knowledge distillation process, we train a complex and large model on a very large dataset. After fine-tuning the large model, it works well on unseen data. Once achieved, this knowledge is transferred to smaller Neural Networks or models. Both, the teacher network (a larger model) and the student network (a smaller model) are used. There exist two aspects here which is, knowledge distillation in which we don’t tweak the teacher model whereas in transfer learning we use the exact model and weight, alter the model to some extent, and adjust it for the related task.

knowledge distillation system

The knowledge, the distillation algorithm, and the teacher-student architecture models are the three main parts of a typical knowledge distillation system, as shown in the diagram above.

Low Matrix Factorization:
Matrices form the bulk of most deep neural architectures. This technique aims to identify redundant parameters by applying matrix or tensor decomposition and making them into smaller matrices. This technique when applied on dense DNN (Deep Neural Networks) decreases the storage requirements and factorization of CNN (Convolutional Neural Network) layers and improves inference time. A weight matrix A with two dimensions and having a rank r can be decomposed into smaller matrices as below.

Low Matrix Factorization

Model accuracy and performance highly depend on proper factorization and rank selection. The main challenge in the low-rank factorization process is harder implementation and it is computationally intensive. Overall, factorization of the dense layer matrices results in a smaller model and faster performance when compared to full-rank matrix representation.

Due to Edge AI, model compression strategies have become incredibly important. These methods are complementary to one another and can be used across stages of the entire AI pipeline. Popular frameworks like TensorFlow and Pytorch now include techniques like Pruning and Quantization. Eventually, there will be an increase in the number of techniques used in this area.

At Softnautics, we provide AI Engineering and Machine Learning services with expertise on cloud platforms accelerators like Azure, AMD, edge platforms (FPGA, TPU, Controllers), NN compiler for the edge, and tools like Docker, GIT, AWS DeepLens, Jetpack SDK, TensorFlow, TensorFlow Lite, and many more targeted for domains like Multimedia, Industrial IoT, Automotive, Healthcare, Consumer, and Security-Surveillance. We collaborate with organizations to develop high-performance cloud-to-edge machine learning solutions like face/gesture recognition, people counting, object/lane detection, weapon detection, food classification, and more across a variety of platforms.

Read our success stories related to Machine Learning expertise to know more about our services for accelerated AI solutions.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

rakesh-softnautics

Author: Rakesh Nakod

Rakesh is an Associate Principal Engineer at Softnautics, an AI proficient having experience in developing and deploying AI solutions across computer vision, NLP, audio intelligence, and document mining. He also has vast experience in developing AI-based enterprise solutions and strives to solve real-world problems with AI. He is an avid food lover, passionate about sharing knowledge, and enjoys gaming, and playing cricket in his free time.

 

An overview of Embedded Machine Learning techniques and their associated benefits

Owing to revolutionary developments in computer architecture and ground-breaking advances in AI & machine learning applications, embedded systems technology is going through a transformational period. By design, machine learning models use a lot of resources and demand a powerful computer infrastructure. They are therefore typically run-on devices with more resources, like PCs or cloud servers, where data processing is efficient. Machine learning applications, ML frameworks, and processor computing capacity may now be deployed directly on embedded devices, thanks to recent developments in machine learning, and advanced algorithms. This is referred to as Embedded Machine Learning (E-ML).

The processing is moved closer to the edge, where the sensors collect data, using embedded machine learning techniques. This aids in removing obstacles like bandwidth and connection problems, security breaches by data transfer via the internet, and data transmission power usage. Additionally, it supports the use of neural networks and other machine learning frameworks, as well as signal processing services, model construction, gesture recognition, etc. Between 2021 to 2026, the global market for embedded AI is anticipated to expand at a 5.4 percent CAGR and reach about USD 38.87 billion, as per the maximize market research group reports.

The Underlying Concept of Embedded Machine Learning

Today, embedded computing systems are quickly spreading into every sphere of the human venture, finding practical use in things starting from wearable health monitoring systems, wireless surveillance systems, networked systems found on the internet of things (IoT), smart appliances for home automation to antilock braking systems in automobiles. The Common ML techniques used for embedded platforms include SVMs (Support Vector Machine), CNNs (convolutional neural network), DNNs (Deep Neural networks), k-NNs (K-Nearest Neighbour), and Naive Bayes. Large processing and memory resources are needed for efficient training and inference using these techniques. Even with deep cache memory structures, multicore improvements, etc., general-purpose CPUs are unable to handle the high computational demands of deep learning models. The constraints can be overcome by utilizing resources such as GPU and TPU processors. This is mainly because sophisticated linear algebraic computations, such as matrix and vector operations, are a component of non-trivial deep learning applications. Deep learning algorithms can be run very effectively and quickly on GPUs and TPUs, which makes them ideal computing platforms.

Running machine learning models on embedded hardware is referred to as embedded machine learning. The latter works according to the following fundamental precept: While model execution and inference processes take place on embedded devices, the training of ML models like neural networks takes place on computing clusters or in the cloud. Contrary to popular belief, it turns out that deep learning matrix operations can be effectively carried out on hardware with constrained CPU capabilities or even on tiny 16-bit/32-bit microcontrollers.

The type of embedded machine learning that uses extremely small pieces of hardware, such as ultra-low-power microcontrollers, to run ML models is called TinyML.Machine Learning approaches can be divided into three main categories: reinforcement learning, unsupervised learning, and supervised learning. In supervised learning, labelled data can be learned; in unsupervised learning, hidden patterns in unlabelled data can be found; and in reinforcement learning, a system can learn from its immediate environment by a trial-and-error approach. The learning process is known as the model’s “training phase,” and it is frequently carried out utilizing computer architectures with plenty of processing power, like several GPUs. The trained model is then applied to new data to make intelligent decisions after learning. The inference phase of the implementation is what is referred to as this procedure. IoT and mobile computing devices, as well as other user devices with limited processing resources, are frequently meant to do the inference.

Machine Learning Techniques

Application Areas of Embedded Machine Learning

Intelligent Sensor Systems
The effective application of machine learning techniques within embedded sensor network systems is generating considerable interest. Numerous machine learning algorithms, including GMMs (Gaussian mixture model), SVMs, and DNNs, are finding practical uses in important fields such as mobile ad hoc networks, intelligent wearable systems, and intelligent sensor networks.

Heterogeneous Computing Systems
Computer systems containing multiple types of processing cores are referred to as heterogeneous computing systems. Most heterogeneous computing systems are employed as acceleration units to shift computationally demanding tasks away from the CPU and speed up the system. Heterogeneous Multicore Architecture is an area of application where to speed up computationally expensive machine learning techniques, the middleware platform integrates a GPU accelerator into an already-existing CPU-based architecture thereby enhancing the processing efficiency of ML data model sets.

Embedded FPGAs
Due to their low cost, great performance, energy economy, and flexibility, FPGAs are becoming increasingly popular in the computing industry. They are frequently used to pre-implement ASIC architectures and design acceleration units. CNN Optimization using FPGAs and OpenCL-based FPGA Hardware Acceleration are the areas of application where FPGA architectures are used to speed up the execution of machine learning models.

Benefits

Efficient Network Bandwidth and Power Consumption
Machine learning models running on embedded hardware make it possible to extract features and insights directly from the data source. As a result, there is no longer any need to transport relevant data to edge or cloud servers, saving bandwidth and system resources. Microcontrollers are among the many power-efficient embedded systems that may function for long durations without being charged. In contrast to machine learning application that is carried out on mobile computing systems which consumes a substantial amount of power, TinyML can increase the power autonomy of machine learning applications to a greater extent for embedded platforms.

Comprehensive Privacy
Embedded machine learning eliminates the need for data transfer and storage of data on cloud servers. This lessens the likelihood of data breaches and privacy leaks, which is crucial for applications that handle sensitive data such as personal information about individuals, medical data, information about intellectual property (IP), and classified information.

Low Latency
Embedded ML supports low-latency operations as it eliminates the requirement of extensive data transfers to the cloud. As a result, when it comes to enabling real-time use cases like field actuating and controlling in various industrial scenarios, embedded machine learning is a great option.

Embedded machine learning applications are built using methods and tools that make it possible to create and deploy machine learning models on nodes with limited resources. They offer a plethora of innovative opportunities for businesses looking to maximize the value of their data. It also aids in the optimization of the bandwidth, space, and latencies of their machine learning applications.

Softnautics AI/ML experts have extensive expertise in creating efficient ML solutions for a variety of edge platforms, including CPUs, GPUs, TPUs, and neural network compilers. We also offer secure embedded systems development and FPGA design services by combining the best design methodologies with the appropriate technology stacks. We help businesses in building high-performance cloud and edge-based ML solutions like object/lane detection, face/gesture recognition, human counting, key-phrase/voice command detection, and more across various platforms.

Read our success stories related to Machine Learning expertise to know more about our services for accelerated AI solutions.

Contact us at business@softnautics.com for any queries related to your solution or for consultancy.

srinivas-prasasd

Author: V Srinivas Durga Prasad

Srinivas is a Marketing professional at Softnautics working on techno-commercial write-ups, marketing research and trend analysis. He is a marketing enthusiast with 6+ years of experience belonging to diversified industries. He loves to travel and is fond of adventures.