I recently joined UofT's Formula Racing Team (UTFR) as a lead of their deep learning perception team to help build up the perception system of a driverless formula racing car. As part of this task, I'm responsible for making recommendations to get the best hardware for perception, perform cone detection, and depth estimation to localize the vehicle in 3D space. The wiring, controls, and mechanical design are being done by other members of the team and I'm working with a group of brilliant engineers who all are trying to build this incredible system.
It wasn't always like this. UTFR was a primarily electric and mechanical engineering team for a while and the formula racing cars they built actually had to be driven by people. As one would expect with a formula racing team, UTFR has a history of racing other universities.
UTFR at an annual SHOOTOUT competition.
Localization And Cone Detection
Unlike the operational domain of self-driving cars, such as at aUToronto (the self-driving car team I am in), the perception design space is slightly more constrained and focused in formula racing. Instead of identifying pedestrians, traffic lights, etc. all that is needed for this level of autonomy (with only one car on a track at a time) is cone detection (with 3D coordinate information about where they are located) which lays out the drivable path. However, there is one significant challenge and that is the fact that formula racing cars are incredibly fast with speeds ranging from 50 - 120 km. This means that the detectors have to be incredibly robust, work in real-time, and should ideally have a pretty good range so that the track in the distance can be identified earlier to handle curves in the road. The following video is another university's attempt at this problem. Our goal will be to achieve something of this form and hopefully with a better system.
Our goal and inspiration. This is the autonomous system of another university in action.
The first thing I attempted was a classical approach to traffic cone detection. By performing colour segmentation on the image, I identify all objects that are orange, yellow or blue. By applying canny edge detection and identifying convex hulls in an image, I bound each of these objects with our colours of interest as shapes. By writing a program that checks whether the convex hull is pointing upwards and whether the tip of the convex hull is within x-coordinates of the base of the convex hull, I created a heuristic that filters out valid convex hulls from non-valid convex hulls. For each valid convex hull, a bounding box is generated thereby identifying cones in the image. Below you will find demos of an implementation of this program.
Conclusion of experiment: Convex-hull based cone detection isn't a robust solution.
A team member of mine began working on a Haar Cascades based approach which works much better on the cone detector than a convex-hull based detection approach. I began focusing on creating a depth estimation program. The problem: A severe lack of resources. The solution: Bootstrapping a system because none of us give up so easily! You can view the cascade classifier below paired with the depth estimator I worked on.
Depth estimation with two monocular cameras.
Since we don't have RGBD (depth) cameras, I decided to take some inspiration from how the human visual system works and built my own depth estimation program using two monocular cameras, disparity information between the cameras, triangulation, and some creative mathematics.
An excited me after getting the depth estimation system working (testing it with an off-the-shelf face classifier).
Conducting some tests in front of the UofT Formula Racing (UTFR) Workshop.
Testing the Haar Cascade Classifier for cones, a HSV colour segmenter and the depth estimation program all at once.
Stress testing the depth estimation and cone detection system at a far distance.
This approach has the capacity to run really quickly (after some optimizations) which is advantageous for the real-time, resource constrained environment that we're in. However, it might suffer from accuracy issues when there are multiple cones as there will be on an actual formula race track. This experiment motivates the use of deep learning approaches to achieve cone detection.
Develop deep learning approach using object detectors such as YoloV5-Small, YoloV7, etc. (Timeline: After 1-2 month, we'll have a well-trained detector that can run on a Jetson Nano and can perform cone detection at scale).