Building a Multiple Object Detection Model with TensorFlow’s Object Detection API
This post isn’t meant to be an in-depth explanation of machine or deep learning, but rather, provide a practical guide on setting up object detection for projects. This blog post will cover building a custom object detection system using TensorFlow’s Object Detection API. I have written another blog post on how to build a custom, single object classification model using Fast AI, which is linked here!
This is part one of two on building a custom object detection system for web-based and local applications. The second part is written by my coworker, Allison Youngdahl, and will illustrate how to implement this custom object detection system in a React web application and on Google Cloud Platform (GCP).
While there are a few examples of how to implement object detection models online, many are deprecated, do not provide clear documentation on troubleshooting, do not provide customization instructions, or do not provide instruction on exporting the machine learning model. For TensorFlow specifically, the Object Detection API is difficult to navigate, and the troubleshooting process took quite a bit of time. It is my hope that this blog post provides some troubleshooting tips and easy, step-by-step instructions for setting up a custom object detection system. Additionally, many existing tutorials or examples use an ML model that is very slow and would not be practical on mobile.
For this blog post, I ran everything with the following specs:
macOS Catalina, Version 10.15.4, 16GB RA, 2.3 GHz, 8-Core Intel i9.
The impetus for this project is for use in an object-detection web application for detecting products in real-time. In the age of coronavirus, this application is useful as it allows customers to immediately gain information about products without requiring people to physically touch them. Additionally, the app is accessible as a web application rather than a smartphone-specific application and creates exciting opportunities for personalization due to its portability, ease of use, and detection capabilities.
References & Acknowledgements
Before beginning the post, I’d like to acknowledge the following people, of which the work is heavily based off of or referenced. Tanner Gilbert has done some great work with documenting how to use the Object Detection API on his YouTube channel. His work and code reference previous work done by Dat Tran and EdjeElectronics on using the Object Detection API. I found the following guide by Adrià Gil to be incredibly useful for troubleshooting and learning how to properly export TensorFlow models. Additionally, I’d like to thank and acknowledge Allison Youngdahl for her help with proofreading this article and for assistance with troubleshooting as well.
When building the web application, the team looked into several tools for object detection. The two tools that came to mind first were TensorFlow and Fast AI. However, the team also looked into MediaPipe as a potential solution. The team eventually chose TensorFlow because of available documentation on porting TensorFlow models to web applications. Fast AI doesn’t, at the time of writing this blog, have an explicit tutorial on multiple object detection — a desired feature of the web application. The team strayed away from MediaPipe due to a lack of available documentation as well. This blog post will walk through TensorFlow’s Object Detection API for multiple object detection, which was used to build a model for the web application.
TensorFlow’s Object Detection API
TensorFlow’s Object Detection API is an open-source framework that’s built on top of TensorFlow to construct, train, and deploy object detection models. There are new models being added even today, with the most recent addition in March 2020 at the time of writing this article. By employing transfer learning (repurposing a pre-trained model for use with items outside the original training data set), the Object Detection API powers multiple object detection for custom items provided you have an appropriately built/sized dataset.
Building a Custom Model with TensorFlow’s Object Detection API
Disclaimer: For the object detection API, I am writing the instructions assuming that you are using a Mac. If you are following along and use Windows, I cannot guarantee that the same steps will work for you. Thank you for your understanding.
Step 1) Clone the Repository and Install Dependencies
The first step is to clone the TensorFlow models repository and set up the Object Detection API. Tanner Gilbert has dockerized this process, which is available here. If doing it manually, you can begin by first cloning the TensorFlow models repository by typing:
git clone https://github.com/tensorflow/models. After cloning the repository it is a good idea to install all the dependencies. But first, we should probably install Anaconda. Assuming that you have homebrew installed, you’re going to want to install Anaconda via
brew cask install anaconda. Then, you need to insert the following line in your
This will enable things to work. The terminal command to get this working:
echo ‘export PATH=/usr/local/bin:$PATH’ >>~/.bash_profile
Now, setting up a conda environment. I found a pretty helpful conda cheat sheet online. If you run
conda info , you can check if anaconda was installed properly. To create a conda environment with a specific version of Python, run the following code (replace parenthesis as necessary):
conda create — name (name of project) python=(version you want)
conda create — name tf-object-detection python=3.7.4
Now, you need to activate the environment by doing the following:
conda activate tf-object-detection.
Now, installing those pesky dependencies. I would suggest using conda to install but you can also use pip, both of which are shown below:
pip install — user Cython
pip install — user contextlib2
pip install — user pillow
pip install — user lxml
pip install — user jupyter
pip install — user matplotlib
conda install Cython
conda install contextlib2
conda install pillow
conda install lxml
conda install jupyter
conda install matplotlib
conda install tensorflow=1
Installing the COCO API
COCO is a large image dataset designed for object detection, segmentation, person keypoints detection, stuff segmentation, and caption generation. If you want to use the dataset and evaluation metrics, you need to clone the cocoapi repository and copy the pycocotools subfolder to the tensorflow/models/research directory. Here’s what that looked like on my local machine:
git clone https://github.com/cocodataset/cocoapi.git
cp -r pycocotools <path_to_tensorflow>/models/research/
cp -r pycocotools /Users/**put username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/models/research
Using make won’t work on Windows. To install the cocoapi on Windows the following command can be used:
pip install “git+https://github.com/philferriere/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI"
Protobuf Installation & Compilation
The Tensorflow Object Detection API uses .proto files. These files need to be compiled into .py files in order for the Object Detection API to work properly. Google provides a program called Protobuf that can compile these files. Protobuf can be downloaded here. Place the downloaded file anywhere you want (for example in the Desktop folder). The specific file I needed to download was the following:
protoc-3.11.4-osx-x86_64.zip. After extracting the folder, you need to go into models/research and use protobuf to extract python files from the proto files in the object_detection/protos directory.
The official installation guide uses protobuf like:
./bin/protoc object_detection/protos/*.proto — python_out=.
This script should work if you installed and did everything correctly. The steps below are if, for some reason, they aren’t working (which is mostly an issue if you’re using a Windows computer). Sometimes, the
*, which stands for all files, doesn’t work for people so you can use this Python script to execute the command for each .proto file.
args = sys.argv
directory = args
protoc_path = args
for file in os.listdir(directory):
os.system(protoc_path+” “+directory+”/”+file+” — python_out=.”)
This file needs to be saved inside the research folder and I named it use_protobuf.py. I had renamed the downloaded protoc folder from protoc_macosx_version to
protoc and moved it to the research folder. Again, the python script command using
use_protobuf is only if the original protoc script command isn’t working! Now we can use it by going into the console and typing:
python use_protobuf.py <path to directory> <path to protoc file>
In my case, I had to run the following commands:
xattr -d com.apple.quarantine protoc/bin/protoc
xattr -d com.apple.quarantine protoc/bin/
protoc/bin/protoc object_detection/protos/*.proto — python_out=.
Adding Necessary Environment Variables & Finishing the TensorFlow Object Detection API Installation
Lastly, we need to add the research and research slim folder to our environment variables and run the setup.py file. To add the paths to environment variables in Linux you need to type (in terminal):
export PYTHONPATH=$PYTHONPATH:<PATH_TO_TF>/TensorFlow/models/researchexport PYTHONPATH=$PYTHONPATH:<PATH_TO_TF>/TensorFlow/models/research/object_detectionexport PYTHONPATH=$PYTHONPATH:<PATH_TO_TF>/TensorFlow/models/research/slim
In my case, I would run the following:
export PYTHONPATH=$PYTHONPATH:/Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/models/researchexport PYTHONPATH=$PYTHONPATH:/Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/models/research/object_detectionexport PYTHONPATH=$PYTHONPATH:/Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/models/research/slim
To run the setup.py file we need to navigate to ../models/research and run:
# From within /models/research/
python setup.py build
python setup.py install
Now, run the object_detection_tutorial.ipynb from the object_detection folder (Tanner Gilbert has created this helpful Jupyter Notebook, which is available here). You can also check everything is working by simply importing object_detection inside a python shell:
import object_detection. If there’s no output, it’s likely working. If things go well, your jupyter notebook looks like the following:
Gathering Data for Transfer Learning
Now that the Tensorflow Object Detection API is ready to go, we need to gather the images needed for training. To train a robust model, we need lots of pictures (at least 50 for each item being trained with 50 images of various items in the same photo) that should vary as much as possible from each other. That means that they should have different lighting conditions, different backgrounds, and lots of random objects in them. You can either take the pictures yourself or you can download them from the internet. I’ve included a separate repository that walks through formatting images and exporting them here. The only difference is you should also run
conda install pyqt.
Fix_Image folder, there is a folder called
images which should be empty. Before training the model or creating the testing or training directories, it’s essential to reformat the images (at least, for the way I’m doing it in TensorFlow) to reduce the resolution of the images. This is crucial to prevent the training process from taking too long. Let’s say that you have taken your photos and you’ve added them to the images folder. The next step is to make sure you’re in the
Fix_Image directory. Next, run the following command in terminal:
python transform_image_resolution.py -d images/ -s 800 600
This automatically changes the resolution of all the photos in the images folder via a Python script. The script has run properly if you get no output. You can check if the script has actually worked by going into the images folder and seeing the resized images — they will look different than the original photos!
After you have all the images move about 80% to the object_detection/images/train directory and the other 20% to the object_detection/images/test directory. Make sure that the images in both directories have a good variety of classes.
It’s important to note that having .png and .jpg copies with the same name may mess up things when generating XML files as noted by folks who have followed my instructions. For example, if you have picture1.jpg and picture1.png, this may create issues when generating XML files (next step). One way to prevent yourself from having duplicate photo names is the following method (which I haven’t tested):
Download the following script here. Place the downloaded files in a folder called google-images-download. Then navigate to the folder where the python script is and execute (where the item of interest is the object you’re interested in collecting images for):
python google_images_download.py — keywords “(item of interest)” — limit 100 — format jpg
With all the pictures gathered, we come to the next step — labeling the data. Labeling is the process of drawing bounding boxes around the desired objects. LabelImg is a great tool for creating an object detection dataset.
LabelImg supports two formats, PascalVOC and Yolo. For this tutorial make sure to select PascalVOC. LabelImg saves a xml file containing the label data for each image. These files will be used to create a tfrecord file, which can be used to train the model. The code and documentation for this part is available through my Github in the following repository.
At the end of things, it should look something like the following for the training/test folders (I’m just showing a snippet of the test folder).
When labeling, it’s good practice to keep the labels (e.g. row_label and the labels specified in the .pbtxt file) the same. However, I did not follow this as I named my
row_label==Air_Force_1 and the shoe label when labeling images as
airforce1. This didn’t impact the performance of the model nor did it impact what displayed for the user; however, it’s just good convention to name things the same when labeling.
Generating Training Data
With the images labeled, we need to create TFRecords that can be served as input data for the training of the object detector. In order to create the TFRecords we will use two scripts from Dat Tran’s raccoon detector. Namely, the xml_to_csv.py and generate_tfrecord.py files.
After downloading both scripts, we can first of change the main method in the xml_to_csv file so we can transform the created xml files to csv correctly.
# Old Code:def main():
image_path = os.path.join(os.getcwd(), ‘annotations’)
xml_df = xml_to_csv(image_path)
print(‘Successfully converted xml to csv.’)--------------------------------------------------------------------
# New Code:def main():
for folder in [‘train’, ‘test’]:
image_path = os.path.join(os.getcwd(), (‘images/’ + folder))
xml_df = xml_to_csv(image_path)
print(‘Successfully converted xml to csv.’)
You’ll have to place the generate_tf_record.py and xml_to_csv.py code inside the object detection folder if they aren’t already present. Now we can transform our xml files to csv by opening the command line and typing
python xml_to_csv.py. This creates two files in the images directory. One called test_labels.csv and another one called train_labels.csv. If you get a “no module named ‘pandas’” error just run
conda install pandas.
Next, open the generate_tfrecord.py file and replace the labelmap inside the class_text_to_int method with your own label map.
# OLD: TO-DO replace this with label mapdef class_text_to_int(row_label):
if row_label == ‘basketball’:
elif row_label == ‘shirt’:
elif row_label == ‘shoe’:
# NEW CODE, Replace Above With Thisdef class_text_to_int(row_label):
if row_label == ‘Air_Force_1’:
elif row_label == ‘Huaraches’:
elif row_label == ‘Air_Max’:
Now the TFRecords can be generated by typing:
python generate_tfrecord.py — csv_input=images/train_labels.csv — image_dir=images/train — output_path=train.recordpython generate_tfrecord.py — csv_input=images/test_labels.csv — image_dir=images/test — output_path=test.record
These two commands generate a train.record and test.record file which can be used to train our object detector. If you’re getting an error like
module TensorFlow has no attribute app, it is because you had switched to Tensorflow 2.0 at some point. To fix this, there are two solutions: (1) deprecate to TensorFlow 1 or (2) modify the generate_tfrecord.py file. For the second method, change line 17 from
import tensorflow as tf to
import tensorflow.compat.v1 as tf.These two commands generate a train.record and a test.record file which can be used to train our object detector.
Getting Ready for Training the Model — Creating a Label Map
The label map maps an id to a name. We will put it in a folder called training, which will be located in the object_detection root directory. The labelmap for my detector can be seen below. The id number of each item should match the id of the specified item in the generate_tfrecord.py file. The file is called
pet_label_map.pbtxt which I renamed to
labelmap.pbtxt after the fact. If you are creating things from scratch, just create a directory called training and create a .pbtxt file with the following information (changing the names to match the labels you are using).
The id number of each item should match the id of specified in the generate_tfrecord.py file.
Creating the Training Configuration
Lastly, we need to create a training configuration file. Originally, I had used
faster_rcnn_inception, which just like a lot of other models can be downloaded from the Tensorflow detection model zoo. This model was too slow, so I ended up going with
Because we are using a MobileNet model, we can choose one of its predefined configurations. We will use and modify the pipeline.config file provided in object_detection/samples/config. You can download the model and config file from Tensorflow’s model zoo and put it in a folder called training in the root folder. This is the training folder created earlier.
Copy the config file to the training directory and make the following changes!
* Line 3: change the number of classes to number of objects you want to detect (3 in my case)* Line 159: change fine_tune_checkpoint to the path of the model.ckpt file:* ```fine_tune_checkpoint: “/Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/training/ssd_mobilenet_v2_coco_2018_03_29”```* Line 166: change input_path to the path of the train.records file:* ```input_path: “input_path: “/Users/**insert username here**//Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow Tensorflow-Object-Detection-API-Train-Model/train.record”```* Line 179: change input_path to the path of the test.records file:* ```input_path: “input_path: “/Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow Tensorflow-Object-Detection-API-Train-Model/test.record”```* Line 164 and 175: change label_map_path to the path of the label map:* ```label_map_path: “/Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/training/labelmap.pbtxt”```
Training the Model
Before continuing, I want to provide screenshots of my working environment because I understand that there are a lot of files and folders to keep track of.
For the following steps, I’m assuming that TensorFlow 1 is being used. To train the model, execute the following command in the command line. Please run this from the root folder and change the model_directory like so:
Going from this (don’t run this):
python model_main.py — logtostderr — model_dir=training/ — pipeline_config_path=/Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/training/pipeline.config
To this (please read below and make necessary changes before running):
python model_main.py — logtostderr — model_dir=/Users/**insert username here** /Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model — pipeline_config_path=/Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/training/pipeline.config
If you encounter an
AttributeError: module “TensorFlow" has no attribute “contrib” error, it’s because you are using TensorFlow 2.0. Revert back to 1.x by running the following in the command line.
# uninstall Tensorflow 2.0
conda remove tensorflow# install Tensorflow 1.X
conda install tensorflow=1
I then encountered another issue which I resolved by going in the pipeline.config file and deleting the following line:
batch_norm_trainable: true . This is deprecated, so it is necessary to delete it. This is the config file placed inside the folder labeled training that was recently created. If you get an issue regarding a module called nets not being available, please check the Python paths and re-run the Python path commands. This is in the “Adding Necessary Environment Variables & Finishing the TensorFlow Object Detection API Installation” section for reference.
If everything was set up correctly, run the above code. The training should begin shortly and you should see something like the following:
Every few minutes the current loss gets logged to Tensorboard. Open Tensorboard by opening a second command line, navigating to the object_detection folder, activate anaconda, and typing:
tensorboard — logdir=/Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model
Recently, it seems that Tensorboard isn’t supporting the latest version of protobuf. If you face any issues, try running the following:
pip3 uninstall protobuf
pip uninstall protobuf
pip install protobuf==3.8
pip3 install protobuf==3.8
After running the Tensorboard commands, it will open a webpage at localhost:6006 and if you go there then you can see what’s happening on the training end. The training scripts save checkpoints about every five minutes. Train the model until it reaches a satisfying loss and then you can terminate the training process by pressing Ctrl+C. I train usually for about two days because I’m not using a GPU.
If the steps were followed exactly, there will be a training folder under the object_detections folder with all the .cpkt files. Either that or the root folder will hold the necessary files. You’ll need these files for the next part.
Here’s where things get interesting. At this point, when I used the model_main python code, it showed that I could train until around step 8000. Previously, when I originally did this code, I had used a different script to train the model, which was the following (it wasn’t recommended as it uses a deprecated way of training the model). Feel free to use this method if model_main.py isn’t working:
Python train.py — logtostderr — train_dir=/Users/**insert username here**/Desktop/Tensorflow-Object-Detection-API-Train-Model — pipeline_config_path=/Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/training/pipeline.config
Try seeing if your version of TensorFlow has different installs in different places on your computer if model_main.py isn’t working. When I wasn’t using Anaconda, I had to
pip install tensorflow==1.15because it was originally using tensorflow 2 instead of tensorflow 1.15 (which is what Anaconda will automatically install if you tell it to install Tensorflow=1).
At the end of everything, you should have the following (the sizes/numbers are going to be different):
Exporting the Inference Graph
Now that we have a trained model, we need to generate an inference graph, which can be used to run the model. For doing so we need to first find out the highest saved step number. For this, we need to navigate to the directory with the generated .ckpt files and look for the model.ckpt file with the biggest index. If you kept your .ckpt files in the training directory that was created, then we can create an inference graph by typing the following command in the command line where XXXX represents the highest number.
python export_inference_graph.py — input_type image_tensor — pipeline_config_path /Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/training/pipeline.config — trained_checkpoint_prefix training/model.ckpt-XXXX — output_directory inference_graph
If your .cpkt files were generated in the root folder, create a folder called train_output_1 (arbitrary name, I wanted to consolidate all the training files in a folder) and place all the files (e.g. .cpkt/checkpoint files) in that folder. This is what’s being referred to in the command I ran to get things working with the latest version I was able to train with:
python export_inference_graph.py — input_type image_tensor — pipeline_config_path /Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/training/pipeline.config — trained_checkpoint_prefix /Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/train_output_1/model.ckpt-8909 — output_directory inference_graph
Here, the highest index I received was 8909. Now, when I ran this, I got an error saying that I was missing a module called nets. This is because of a really annoying bug with Python export paths and you need to run something similar to the following to get this working:
export PYTHONPATH=”$PYTHONPATH:/Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/models/research:/Users/**insert username here**/Desktop/deep-learning-multiple-shoe-training-and-porting-model-tensorflow/Tensorflow-Object-Detection-API-Train-Model/models/research/slim”
If everything goes well, the output should be under your models/research/object_detection folder in inference_graph and look like:
Brief Aside on Hosting Models
There are multiple ways to host a model. One way is by hosting the model with Tensorflow Serving and Docker. Please see the post from my coworker, Allison Youngdahl, on how to integrate this model in a React web app!
Running the Model Locally
I’ve included a Jupyter Notebook file I’ve modified from Gilbert Tanner to help you test out if your code is working. First, download the Jupyter Notebook file here. Next, navigate to the object_detection folder and place the file there. Open Jupyter Notebook and click on the .ipynb file. Run through the code.
If there is a problem with Jupyter Notebook, try checking if your TensorFlow version is 1.x, that you’ve installed matplotlib by
pip install matplotlib. If you get an issue where Jupyter Notebook keeps crashing, try
pip install 'ipykernel<5.0.0'.
Under the “Variables” section, you’re going to want to change the pathway to match the path to the .pbtxt file (which is likely going to be located in the training directory that was created earlier). I also suggest creating a folder in the root folder called “test_images_for_post_training” with images you’re interested in testing. If you follow the Jupyter Notebook and run things in order, you should first see the following:
Then, there are two cells you can run after this part that will enable a video stream with your model. An example output will look like the following: