In 2020, Year 4 of the Autodrive Challenge I, I was tasked with developing the pedestrian detection capabilities of Zeus, the self driving car at aUToronto. As a member of the Perception team, I worked alongside Brian Cheong to accomplish this task. In past rounds of the competition, we were using a Squeezedet model to accomplish pedestrian detection. However, this time we tried to use a YoloV3 model with newly trained weights and fine tuned hyperparameters. YoloV3 is the latest variant of a popular object detection algorithm YOLO – You Only Look Once. It is extremely powerful and useful for real time object detection tasks and so was employed for the purpose of pedestrian detection in this project.
Summary Of Results
These are some of the trials with the best results. The implementation of YoloV3 is open source although the specific implementation details, changes and hyperparameters we tried cannot be disclosed due to confidentiality reasons. Furthermore while JAAD and NuScenes are publicly available datasets the Scale dataset was collected and labelled by aUToronto. Nevertheless, I'll provide some visualizations of the results and describe some interesting takeaways.
YoloV3 In Action
These are some videos of a test run of these trained models on Zeus, our self driving car at the University of Toronto Institute for Aerospace Studies (UTIAS). The videos were taken by the blackfly cameras on Zeus.
Pedestrian detection output from Blackfly cameras mounted on Zeus. Attribution: aUToronto
The trained YoloV3 model detects pedestrians well, however, there are false positives on dark objects in the white snow and it qualitatively appears as if detection performance is less confident on objects that are far away.
YoloV3 robustly identifies a dynamic pedestrian. Attribution: aUToronto
Additionally, it appears as if the recall is quite good, however, the precision suffers in cases where the model classifies the dummy of the deer as a pedestrian.
YoloV3 identifies a moving pedestrian and pedestrian dummy but misclassifies the deer as a pedestrian. Attribution: aUToronto
YoloV5: Evolution Of Our 2D Object Detectors
By 2021, Year 1 of the Autodrive Challenge II, the state of the art of object detection had progressed quite significantly. Ultimately, we decided to go with a YoloV5 model as their implementation was particularly good. Being built on PyTorch Lightening which made training incredibly efficient and having several model sizes, YOLOv5 had the potential to be the best detector for a range of driving scenarios. I trained the pedestrian and vehicle detectors on YoloV5 and arrived at the best performing model. Implementation details have to be withheld as aUToronto is currently in an ongoing competition. Below you can find a nice visualization of YOLOv5 identifying a pedestrian, a vehicle quite far away, and also doesn't misclassify the deer as a pedestrian.
YoloV5 detects a pedestrian, pedestrian dummy, vehicle in the far distance, and doesn't misclassify other objects.
We discovered that YoloV5 was an incredibly expressive model architecture and also applied it to other problems in Perception 2D Object Detection including but not limited to traffic lights and traffic signs detection. At the Round 2 Year 1 competition, we also had a situation where a deer detector had to be trained on the fly and freezing the backbone of our pre-trained YoloV5 for was the key to creating a deer detector with only a thousand or so training images collected and labelled within a day for training the classifier head. This story is described in a newsletter by the university if you are interested in learning more.