Building and running darknet with nnpack on raspberry pi

Running darknet (yolo) on Raspberry Pi devices alone is not a difficult task, yet, attaining a decent performance is. The general assumption is that neural networks are trained and tested to run on high-end GPUs, perhaps with CUDA support on NVIDIA GPUs, to execute computations in parallel at mind-blowing speed. This is not the case for Raspberry Pis due to obvious hardware limitations.

I decided to prepare this article to introduce the topic of running yolo models on embedded devices and guide those who struggle building from source, which is most likely to occur since conflicts between different versions of repositories tend to emerge. I spent many hours getting stuff to work while gathering information online, so I hope I can make it as seamless as possible for those who follow this guide.

Raspberry Pi Models

Before we move on, I’d like to present some essential specifications of the Raspberry Pi models in the following table (find details here):

  Raspberry Pi 4 B 8GB Raspberry Pi 4 B Raspberry Pi 3 Model A+ Raspberry Pi 3 B+ Raspberry Pi Zero WH Raspberry Pi Zero W Raspberry Pi A+ Raspberry Pi 3 Raspberry Pi Zero Raspberry Pi 2 Raspberry Pi B
Release date 2020 May 28 2019 Jun 24 2018 Nov 15 2018 Mar 14 2018 Jan 12 2017 Feb 28 2014 Nov 10 2016 Feb 29 2015 Nov 30 2015 Feb 1 2012 Feb 15
Price US$75.00 US$35.00 US$25.00 US$35.00 US$15.00 US$10.00 US$35.00 US$35.00 US$5.00 US$35.00 US$35.00
SOC Type Broadcom BCM2711 Broadcom BCM2711 Broadcom BCM2837B0 Broadcom BCM2837B0 Broadcom BCM2835 Broadcom BCM2835 Broadcom BCM2835 Broadcom BCM2837 Broadcom BCM2835 Broadcom BCM2836 Broadcom BCM2835
Core Type Cortex-A72 (ARM v8) 64-bit Cortex-A72 (ARM v8) 64-bit Cortex-A53 64-bit Cortex-A53 64-bit ARM1176JZF-S ARM1176JZF-S ARM1176JZF-S Cortex-A53 64-bit ARM1176JZF-S Cortex-A7 ARM1176JZF-S
No. Of Cores 4 4 4 4 1 1 1 4 1 4 1
GPU VideoCore VI VideoCore VI VideoCore IV VideoCore IV VideoCore IV VideoCore IV VideoCore IV VideoCore IV 1080p@30 VideoCore IV VideoCore IV VideoCore IV 1080p@30
CPU Clock 1.5 GHz 1.5 GHz 1.4 GHz 1.4 GHz 1 GHz 1 GHz 700 MHz 1.2 GHz 1 GHz 900 MHz 700 MHz
RAM 8 GB LPDDR4 1 GB , 2 GB, 4 GB LPDDR4 512 MB DDR2 1 GB DDR2 512 MB 512 MB 256 MB 1 GB DDR2 512 MB 1 GB 512 MB

NNPACK

When such hardware limitations exist, NNPACK comes to the rescue. NNPACK is an acceleration package that aims to provide high-performance neural network computations by utilizing multi-cores on CPUs.

The key takeaway here is that, the more CPU cores, the higher performance of the model when using NNPACK (in terms of speed). Surely, higher CPU clock and RAM capacity are important factors which should be taken into consideration. But in general, parallelization through multi-cores will have more impact on the overall performance. Clearly, we would expect an expontential performance boost if we were to switch from Raspberry Pi Zero with 1 core to Raspberry Pi 3 or later models with 4 cores.

Requirements

NNPACK has its own requirements too. NNPACK currently supports the following OS environments and CPU architectures:

Environment Architecture CPU requirements
Linux x86-64 AVX2 and 3-level cache hierarchy
Linux ARM NEON
Linux ARM64  
macOS x86-64 AVX2 and 3-level cache hierarchy
Android ARM NEON
Android ARM64  
Android x86  
Android x86-64  
iOS ARM  
iOS ARM64  
Emscripten Asm.js  
Emscripten WebAssembly  

This means that you need to make sure to know your machine’s specifications beforehand. You can type:

lscpu

or either of the following:

dmesg | grep 'CPU\|cpu'
hwinfo --cpu
lshw -class processor

to learn more about your CPU.

To the best of my knowledge, Windows OS are not officially supported. If you intend to deploy on Raspberry Pis only, then you can safely ignore these warnings, as they are equipped with ARMv6,7,8 compatible processors.

Darknet-NNPACK

Darknet-NNPACK is a repository that uses NNPACK to optimize AlexeyAB/darknet. In the repository, inference results on Raspberry Pi 4, using yolov3-tiny models trained on two datasets (COCO and Pascal VOC) are shown. NB: It is vital to use tiny models on embedded devices.

COCO

cfg Build Options mAP Prediction Time (seconds)
yolov3-tiny.cfg NNPACK=1 33.1 1.1
yolov3-tiny.cfg NNPACK=0   14.5
yolov3-tiny-prn.cfg NNPACK=1 33.1 0.86
yolov3-tiny-prn.cfg NNPACK=0   9.3

Pascal VOC

cfg Build Options mAP Prediction Time (seconds)
yolov3-tiny-voc.cfg NNPACK=1 65.9 1.0
yolov3-tiny-voc.cfg NNPACK=0   14.0
yolov3-tiny-prn-voc.cfg NNPACK=1 65.2 0.77
yolov3-tiny-prn-voc.cfg NNPACK=0   8.9
Gaussian_yolov3-tiny-voc.cfg NNPACK=1 65.7 1.0

The performance boost is approximately 10-14 times using NNPACK. Notice that the prediction FPS is generally around 1 FPS even on one of the most recent Raspberry Pi models. I’ve tested on a Raspberry Pi Zero W, and single inference using NNPACK took around ~100 seconds (0.01 FPS) (not to mention loading the weights). So it is possible, but generally infeasible at real-time inference.

zero

tiny-yolo prediction on zero

Installation

SSH to Raspberry Pi and update && upgrade

sudo apt-get update && sudo apt-get upgrade

Previous guides would tell you to install PeachPy and confu here. As far as I know, they are no longer used when compiling NNPACK.

Go ahead and install the following packages.

sudo apt-get install cmake
sudo apt-get install ninja-build || brew install ninja

Not sure why, but the following block is also said to be required.

sudo apt-get install clang

Get NNPACK.

git clone https://github.com/shizukachan/NNPACK.git
cd NNPACK
mkdir build
cd build

Different paths to go from here, if you are on Raspberry Pi Zero:

cmake -G Ninja -DBUILD_SHARED_LIBS=on -DNNPACK_BACKEND=scalar ..

else:

cmake -G Ninja -DBUILD_SHARED_LIBS=on ..

finally,

sudo ninja
sudo ninja install

the last line above solves linker errors like cannot find -lnnpack or lpthreadpool.

In my case with Raspberry Pi Zero, I received an error saying: Unrecognized CMAKE_SYSTEM_PROCESSOR = armv6l. If you received such an error, take note of that processor name, edit CMakeLists.txt in /NNPACK, go to the first line with several system processors and add that string at the end, such as the following:

...
ELSEIF(NOT CMAKE_SYSTEM_PROCESSOR MATCHES "^(i686|x86_64|armv5te|armv7-a|armv7l|aarch64|armv6l)$")
...

Okay, we are done with NNPACK. Moving on to darknet-NNPACK.

We should get development packages of OpenCV.

sudo apt-get install libopencv-core-dev

Get darknet-NNPACK.

cd
git clone https://github.com/shizukachan/darknet-nnpack.git
cd darknet-nnpack

Enable OpenCV and NNPACK in Makefile of darknet-nnpack.

sudo nano Makefile
GPU=0
CUDNN=0
CUDNN_HALF=0
OPENCV=1
AVX=0
OPENMP=0
LIBSO=0
ZED_CAMERA=0
NNPACK=1
...

Finally, build!

make

If above fails due to an error such as: ./darknet: error while loading shared libraries: libnnpack.so: cannot open shared object file: No such file or directory, then do

sudo /bin/bash -c 'echo "/usr/local/lib" > /etc/ld.so.conf.d/opencv.conf'
sudo ldconfig

[source]

If it fails to find opencv.pc, then note its directory using:

sudo find / -iname opencv.pc

or

sudo grep -Ril opencv.pc /

then edit and execute the following:

export PKG_CONFIG_PATH=/path/to/pkg-config/file

Testing

In /darknet-NNPACK/, you can test if darknet works properly.

#./darknet detector [train/test/valid/demo/map] [data] [cfg] [weights (optional)]

Get yolov3-tiny.weights.

wget https://pjreddie.com/media/files/yolov3-tiny.weights

Test on single image.

./darknet detector test cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights data/dog.jpg

If you have a custom .cfg file, make sure to change its [net] settings to Testing, such as the following:

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=2
...

If it fails to predict anything when built with NNPACK=1, and predicts as expected when built with NNPACK=0, then the chances are that your CPU architecture is incompatible with NNPACK, i.e. NNPACK error (51). For the full list of NNPACK errors, visit here.

To run on live stream, such as webcam:

./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights

Troubleshooting

If you still get any errors when building, then make sure to check Issues sections in Darknet-NNPACK and NNPACK.

Also, check out this helpful guide made by HaroldSP. Using different repos might be the solution to your problem. HaroldSP used shizukachan’s repos: NNPACK and darknet-NNPACK.

Written on September 5, 2020