For the past 10 years, **Computer Vision models have been growing exponentially in size **every year... for a very simple reason. If enough data is fed into the model,** increasing its size and complexity usually increases its performance**.

However, bigger models need **more memory and computing power at inference**. In almost **any computer vision use-case,** **inference time is very important **and can be decisive in improving (or worsening) user experience.

Fortunately, there is a **cheat code** to address this seemingly inflexible tradeoff between performance and resource intensity:** Pruning**.

## What is Pruning?

Pruning builds on the observation that **most of a trained model’s parameters are useless**. Yes, you read that right. In your usual Computer Vision CNN, **a very small fraction of its weights does all the heavy lifting** while the rest are very close to zero.

Pruning algorithms allow you to **keep the useful connections in your model and kill the useless ones**. As you can see, you can achieve almost the same level of performance with only 10% of the weights. **10x smaller and almost as accurate!**

### Only the fittest will survive: Computer Vision Darwinism

Pruning works in **cycles**. Each cycle consists of:

- Removing the
**least important weights in your Computer Vision model.** **Fine-tuning**the rest of the network

As you may have guessed, the way you select these “least important” connections and the schedule of these cycles constitute **the two main degrees of freedom** of pruning techniques.

**First**, you have to choose a **criterion** to use to **determine which weights to set to zero**. There are **two main families** of criteria:

**Structured**pruning, where**each layer**of the model is pruned independently.**Unstructured**pruning, which treats**all the weights of the model the same**.

For each of these families, the algorithm determines which weights to prune using the** L1 norm, L2 norm, or any other norm.**

**Second**, you need to determine a **pruning schedule**. This describes how the algorithm alternates between pruning and fine-tuning to reach your **target pruning rate (or sparsity)**.

The most common pruning schedules are:

**One-shot pruning**, where there is only one pruning step and one fine-tuning step.**Linear pruning**, where every step prunes the same number of weights until the target sparsity is reached.is a more sophisticated pruning schedule*Automated Gradual Pruning (AGP)*This schedule prunes*.***more weights at the beginning**of the pruning process. It then prunes**less and less as the network approaches the target sparsity value**.

Such a gradual pruning allows **the different layers of the Computer Vision** model to have **more time to adapt** to the raising sparsity through fine-tuning.

## How easy is it to implement?

Pruning is ** very easy to implement**.

Both PyTorch and TensorFlow, the two major deep learning frameworks have **ready-to-use pruning implementations**. Here is an example of how to prune a network with ** one line of code** in PyTorch.

Pruning in **TensorFlow** is a bit less straightforward as it is **necessary to recompile** the pruned network. However, Google’s framework offers

**implementations of a few pruning schedules**, whereas they would have to be explicitly implemented in PyTorch.

### What’s the catch?

As you may have guessed, there had to be a catch. These previous out-of-the-box implementations of pruning sadly **achieve no real gain in memory usage or inference time**.

In fact, under the hood, the two Python frameworks **simply apply a mask on the parameters** of the models. The values of the mask are either ones or zeros, depending on whether or not the parameter has been pruned.

This allows you to prune and fine-tune your model, and to experiment with various pruning techniques. **But it** **is not meant to actually prune the parameters**.

As the pruned parameters are **not removed from the computational graph**, the model will still use the **same amount of resources**, even if it is 100% pruned.

### Same performance, 2x faster

There is **no free lunch in Computer Vision**, it seems! Fortunately, **there are a few solutions **to redeem the benefits of pruning, which mainly rely on the idea of * sparse matrix encoding.* In simple terms, it means storing and manipulating them as lists of indices and corresponding values.

These matrices are **a lot trickier to manipulate**, especially when using common libraries, like NumPy arrays or Torch tensors, which are designed for dense ones. In fact, SciPy’s sparse matrix multiplication is **50 to 200 times slower**!

Luckily, **researchers are working on optimizing the inference** of sparse models (such as pruned ones). By exploiting some properties of sparse networks, this paper from Google and Stanford achieves **“1.2–2.1× speedups and up to 12.8× memory savings without sacrificing accuracy”**.

This 2x speedup roughly corresponds to **90% sparsity** on the paper’s figure. This is also the **level of sparsity we found in our earlier experiment with LeNet** to retain the model’s accuracy. It all comes together quite nicely!

## Computer Vision and pruning: Final Words

Pruning is a **very powerful tool** as it allows to dramatically speed up and lighten computer vision models, up to **10x smaller, 2x faster and the same performance!**

Today, **Computer Vision models are getting orders of magnitude bigger**, as shown by Google’s latest Vision Transformer which has more than **2 Billion parameters!**

This means that in most practical use-cases, these huge models **will have to be scaled down**. Pruning allows you to train a sizeable and highly complex model, then **prune it into an acceptable size**, while retaining most of the original performance, which is **one heck of a cheat code!** If you’re looking for another one, you should check out how to speed up your Python code with JAX.

After reading this, you must be feeling an urge to **experiment with pruning** in your projects. What better way to do it than using DVC and Makefile to streamline your experiments and make them reproducible?

If you’re looking for Computer Vision experts who will know how to deliver tailor-made and efficient models at scale, you’re in luck! Contact us here.