Building and running darknet with nnpack on raspberry pi
Running darknet (yolo) on Raspberry Pi devices alone is not a difficult task, yet, attaining a decent performance is. The general assumption is that neural networks are trained and tested to run on high-end GPUs, perhaps with CUDA support on NVIDIA GPUs, to execute computations in parallel at mind-blowing speed. This is not the case for Raspberry Pis due to obvious hardware limitations.
I decided to prepare this article to introduce the topic of running yolo models on embedded devices and guide those who struggle building from source, which is most likely to occur since conflicts between different versions of repositories tend to emerge. I spent many hours getting stuff to work while gathering information online, so I hope I can make it as seamless as possible for those who follow this guide.
Raspberry Pi Models
Before we move on, I’d like to present some essential specifications of the Raspberry Pi models in the following table (find details here):
Raspberry Pi 4 B 8GB | Raspberry Pi 4 B | Raspberry Pi 3 Model A+ | Raspberry Pi 3 B+ | Raspberry Pi Zero WH | Raspberry Pi Zero W | Raspberry Pi A+ | Raspberry Pi 3 | Raspberry Pi Zero | Raspberry Pi 2 | Raspberry Pi B | |
---|---|---|---|---|---|---|---|---|---|---|---|
Release date | 2020 May 28 | 2019 Jun 24 | 2018 Nov 15 | 2018 Mar 14 | 2018 Jan 12 | 2017 Feb 28 | 2014 Nov 10 | 2016 Feb 29 | 2015 Nov 30 | 2015 Feb 1 | 2012 Feb 15 |
Price | US$75.00 | US$35.00 | US$25.00 | US$35.00 | US$15.00 | US$10.00 | US$35.00 | US$35.00 | US$5.00 | US$35.00 | US$35.00 |
SOC Type | Broadcom BCM2711 | Broadcom BCM2711 | Broadcom BCM2837B0 | Broadcom BCM2837B0 | Broadcom BCM2835 | Broadcom BCM2835 | Broadcom BCM2835 | Broadcom BCM2837 | Broadcom BCM2835 | Broadcom BCM2836 | Broadcom BCM2835 |
Core Type | Cortex-A72 (ARM v8) 64-bit | Cortex-A72 (ARM v8) 64-bit | Cortex-A53 64-bit | Cortex-A53 64-bit | ARM1176JZF-S | ARM1176JZF-S | ARM1176JZF-S | Cortex-A53 64-bit | ARM1176JZF-S | Cortex-A7 | ARM1176JZF-S |
No. Of Cores | 4 | 4 | 4 | 4 | 1 | 1 | 1 | 4 | 1 | 4 | 1 |
GPU | VideoCore VI | VideoCore VI | VideoCore IV | VideoCore IV | VideoCore IV | VideoCore IV | VideoCore IV | VideoCore IV 1080p@30 | VideoCore IV | VideoCore IV | VideoCore IV 1080p@30 |
CPU Clock | 1.5 GHz | 1.5 GHz | 1.4 GHz | 1.4 GHz | 1 GHz | 1 GHz | 700 MHz | 1.2 GHz | 1 GHz | 900 MHz | 700 MHz |
RAM | 8 GB LPDDR4 | 1 GB , 2 GB, 4 GB LPDDR4 | 512 MB DDR2 | 1 GB DDR2 | 512 MB | 512 MB | 256 MB | 1 GB DDR2 | 512 MB | 1 GB | 512 MB |
NNPACK
When such hardware limitations exist, NNPACK comes to the rescue. NNPACK is an acceleration package that aims to provide high-performance neural network computations by utilizing multi-cores on CPUs.
The key takeaway here is that, the more CPU cores, the higher performance of the model when using NNPACK (in terms of speed). Surely, higher CPU clock and RAM capacity are important factors which should be taken into consideration. But in general, parallelization through multi-cores will have more impact on the overall performance. Clearly, we would expect an expontential performance boost if we were to switch from Raspberry Pi Zero with 1 core to Raspberry Pi 3 or later models with 4 cores.
Requirements
NNPACK has its own requirements too. NNPACK currently supports the following OS environments and CPU architectures:
Environment | Architecture | CPU requirements |
---|---|---|
Linux | x86-64 | AVX2 and 3-level cache hierarchy |
Linux | ARM | NEON |
Linux | ARM64 | |
macOS | x86-64 | AVX2 and 3-level cache hierarchy |
Android | ARM | NEON |
Android | ARM64 | |
Android | x86 | |
Android | x86-64 | |
iOS | ARM | |
iOS | ARM64 | |
Emscripten | Asm.js | |
Emscripten | WebAssembly |
This means that you need to make sure to know your machine’s specifications beforehand. You can type:
lscpu
or either of the following:
dmesg | grep 'CPU\|cpu'
hwinfo --cpu
lshw -class processor
to learn more about your CPU.
To the best of my knowledge, Windows OS are not officially supported. If you intend to deploy on Raspberry Pis only, then you can safely ignore these warnings, as they are equipped with ARMv6,7,8 compatible processors.
Darknet-NNPACK
Darknet-NNPACK is a repository that uses NNPACK to optimize AlexeyAB/darknet. In the repository, inference results on Raspberry Pi 4, using yolov3-tiny models trained on two datasets (COCO and Pascal VOC) are shown. NB: It is vital to use tiny models on embedded devices.
COCO
cfg | Build Options | mAP | Prediction Time (seconds) |
---|---|---|---|
yolov3-tiny.cfg | NNPACK=1 | 33.1 | 1.1 |
yolov3-tiny.cfg | NNPACK=0 | 14.5 | |
yolov3-tiny-prn.cfg | NNPACK=1 | 33.1 | 0.86 |
yolov3-tiny-prn.cfg | NNPACK=0 | 9.3 |
Pascal VOC
cfg | Build Options | mAP | Prediction Time (seconds) |
---|---|---|---|
yolov3-tiny-voc.cfg | NNPACK=1 | 65.9 | 1.0 |
yolov3-tiny-voc.cfg | NNPACK=0 | 14.0 | |
yolov3-tiny-prn-voc.cfg | NNPACK=1 | 65.2 | 0.77 |
yolov3-tiny-prn-voc.cfg | NNPACK=0 | 8.9 | |
Gaussian_yolov3-tiny-voc.cfg | NNPACK=1 | 65.7 | 1.0 |
The performance boost is approximately 10-14 times using NNPACK. Notice that the prediction FPS is generally around 1 FPS even on one of the most recent Raspberry Pi models. I’ve tested on a Raspberry Pi Zero W, and single inference using NNPACK took around ~100 seconds (0.01 FPS) (not to mention loading the weights). So it is possible, but generally infeasible at real-time inference.
Installation
SSH to Raspberry Pi and update && upgrade
sudo apt-get update && sudo apt-get upgrade
Previous guides would tell you to install PeachPy and confu here. As far as I know, they are no longer used when compiling NNPACK.
Go ahead and install the following packages.
sudo apt-get install cmake
sudo apt-get install ninja-build || brew install ninja
Not sure why, but the following block is also said to be required.
sudo apt-get install clang
Get NNPACK.
git clone https://github.com/shizukachan/NNPACK.git
cd NNPACK
mkdir build
cd build
Different paths to go from here, if you are on Raspberry Pi Zero:
cmake -G Ninja -DBUILD_SHARED_LIBS=on -DNNPACK_BACKEND=scalar ..
else:
cmake -G Ninja -DBUILD_SHARED_LIBS=on ..
finally,
sudo ninja
sudo ninja install
the last line above solves linker errors like cannot find -lnnpack
or lpthreadpool
.
In my case with Raspberry Pi Zero, I received an error saying:
Unrecognized CMAKE_SYSTEM_PROCESSOR = armv6l
. If you received such an error, take note of that processor name, editCMakeLists.txt
in/NNPACK
, go to the first line with several system processors and add that string at the end, such as the following:... ELSEIF(NOT CMAKE_SYSTEM_PROCESSOR MATCHES "^(i686|x86_64|armv5te|armv7-a|armv7l|aarch64|armv6l)$") ...
Okay, we are done with NNPACK. Moving on to darknet-NNPACK.
We should get development packages of OpenCV.
sudo apt-get install libopencv-core-dev
Get darknet-NNPACK.
cd
git clone https://github.com/shizukachan/darknet-nnpack.git
cd darknet-nnpack
Enable OpenCV and NNPACK in Makefile of darknet-nnpack.
sudo nano Makefile
GPU=0 CUDNN=0 CUDNN_HALF=0 OPENCV=1 AVX=0 OPENMP=0 LIBSO=0 ZED_CAMERA=0 NNPACK=1 ...
Finally, build!
make
If above fails due to an error such as:
./darknet: error while loading shared libraries: libnnpack.so: cannot open shared object file: No such file or directory
, then dosudo /bin/bash -c 'echo "/usr/local/lib" > /etc/ld.so.conf.d/opencv.conf' sudo ldconfig
If it fails to find
opencv.pc
, then note its directory using:sudo find / -iname opencv.pc
or
sudo grep -Ril opencv.pc /
then edit and execute the following:
export PKG_CONFIG_PATH=/path/to/pkg-config/file
Testing
In /darknet-NNPACK/
, you can test if darknet works properly.
#./darknet detector [train/test/valid/demo/map] [data] [cfg] [weights (optional)]
Get yolov3-tiny.weights
.
wget https://pjreddie.com/media/files/yolov3-tiny.weights
Test on single image.
./darknet detector test cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights data/dog.jpg
If you have a custom .cfg file, make sure to change its
[net]
settings to Testing, such as the following:[net] # Testing batch=1 subdivisions=1 # Training # batch=64 # subdivisions=2 ...
If it fails to predict anything when built with
NNPACK=1
, and predicts as expected when built withNNPACK=0
, then the chances are that your CPU architecture is incompatible with NNPACK, i.e.NNPACK error (51)
. For the full list of NNPACK errors, visit here.
To run on live stream, such as webcam:
./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights
Troubleshooting
If you still get any errors when building, then make sure to check Issues
sections in Darknet-NNPACK and NNPACK.
Also, check out this helpful guide made by HaroldSP. Using different repos might be the solution to your problem. HaroldSP used shizukachan’s repos: NNPACK and darknet-NNPACK.