Differences

This shows you the differences between two versions of the page.

--- user:deniz001 [2021/02/18 19:15] – [0. Introduction] deniz001
+++ user:deniz001 [2021/02/18 21:00] – [4. Object Tracking] deniz001
@@ Line 35: / Line 35: @@
 The idea is very simple, we first "detect" or "select" a bounding box around the drone object, then use a tracking algorithm to track the drone object in each frame, and then control the PTU using the PID controller to put the drone object into the center of the frames.
-//**All the source code and usage demo can be found here:**// https://gitlab.com/poseidon42/object-tracker
+//**All the source code for this project and usage demo can be found here:**// https://gitlab.com/poseidon42/object-tracker
 ==== 1. Data Collection ====
-Data is very important if we use the second option that we train a neural network. There are a lot of resources to load datasets but I could not find an efficient one for drone images therefore I have written a python script that downloads a good number of user specified object images from google to create my own datasets. The script can be found in the project.
+Data is very important if we use the second option that we train a neural network. There are a lot of resources to load datasets but I could not find an efficient one for drone images therefore I have written a python script that downloads a good number of user specified object images from google to create my own datasets.
 ==== 2. Image Augmentation ====
-Data augmentation aims to increase the accuracy of the training by creating different versions of the images that we gathered using the python script to collect our data. This way we can use the original images in the validation step and use the augmented images in the training step. Again the python script can be found in the project folder.
+Data augmentation aims to increase the accuracy of the training by creating different versions of the each original image that we gathered using the python script to collect our data. This way we can use the original images in the validation step and use the augmented images in the training step.
 ==== 3. Object Detection ====
 In object detection we aim to find an instance of an object of a certain class in an image or a frame of a video stream and output the bounding box coordinates of that instance. In our case, one of our drone from our laboratory is an instance of the drone object.
-Object detection can be achieved using machine learning and deep learning based algorithms. In machine learning approaches we need to feature engineer in order to define features. While in deep learning, we do not need to manually define the features but rather the CNN(Convolutional Neural Network) finds its way to define the features.
+Object detection can be achieved using machine learning and deep learning based algorithms. In machine learning approaches we need to feature engineer in order to define features. While in deep learning, we do not need to manually define the features but rather the CNN(Convolutional Neural Network) finds its way to define the features. In conclusion, the deep learning approaches do not require feature engineering and give better results, therefore I decided to use deep learning in the object detection step.
 Example of Machine Learning approaches:
@@ Line 77: / Line 77: @@
 == Why not using tracking by detecting in each frame? ==
-  * There can be multiple objects entering and exiting the scene of the camera over time in frames, in that case there is no possibility to match or connect the objects in the current frame with the previous frames that the camera was recording in the past.
+  * There can be multiple objects entering and exiting the view of the camera over time in frames, in that case there is no possibility to match or connect the objects in the current frame with the previous frames that the camera was recording in the past.
-  * The object may suddenly go out of the camera's view in the next frame then another same type of object may get in the frame, in this scenario there is no way that the system can figure out which object that the system was actually tracking. To sum up, there is no way for the system to have an idea about object's current and past movements but in this project the purpose is to track a unique object so that we can calculate where the object goes and create a motion map.
+  * The object may suddenly go out of the camera's view in the next frame then another same instance of the object(drone) may get in the frame, in this scenario there is no way that the system can figure out which object that the system was actually tracking. To sum up, there is no way for the system to have an idea about object's current and past movements but in this project the purpose is to track a unique object so that we can calculate where the object goes and create a motion map.
   * There can be blur or noise in the frames due to the motion of the object or camera, that is why the object may look very different so the detection would fail again.
-  * The object may have a very viewpoint that trained neural network was not prepared for that or the object may go far away so that the detection is not available(as the object gets far from the camera's view there will be a huge change in the scale of the object this would cause a failure in detection).
+  * The object may have a various viewpoints that trained neural network was not prepared for that
+  * The object may go far away from the camera so that the distance between the camera and the object would be so much, in this case as there will be a huge change in the scale of the object this would most likely cause a failure in detection.
   * Low resolution, the number of pixels
-  * Tracking algorithms are faster than object detection because trackers do not intend to learn all the detailed information of the object like the detection algorithms.(In the future when we have much more powerful hardware we may go for detection in each frame rather than tracking!)
+  * Tracking algorithms are faster than object detection algorithms, as detection algorithms aims to learn all the detailed information of the object ,while trackers do not intend that.(High computational usage in object detection.)
+  * There can be obstacles where the object ,that the system suppose to track, may hide behind another object. In that case, trackers can estimate that the object may be behind this obstacle ,however any object detector cannot do that unless there is a system giving the detector help for this speculation.
 ==== 4. Object Tracking ====
-By object tracking we can uniquely identify an object instance, so the drone instance.
+By using an object tracking algorithm we can uniquely identify an object instance, and locate a moving object by estimating the location of the target object in the future frames and check if the object in current frame is the same as the one which was in the very previous frame.
+Tracking process goes by first initially defining a bounding box of the target object.
-The goal of an object tracker is to locate a moving object by estimating the location of the target object in the future frames and check if the object in current frame is the same as the one which was in the very previous frame.
+A good tracker must model the motion, and appearance of an object and detect the motion space to localization the object in the future frames using the knowledge from the past frames.
-Tracking process goes by first initially defining a bounding box of the target object.
+== Motion modelling ==
-== Motion modelling ==
+Any object do not randomly move in the space but rather they have moving characteristics and patterns which can be modeled.
-Objects do not randomly move in the space but rather they have moving characteristics and
-patterns which can be modeled.
+Therefore, a successful object tracker must understand and model a movement estimation model which remembers how the object moved in the past frames in order to predict the next possible location space of that the object can be present. This will also make the algorithm faster by reducing the size of the region of interest that the tracker needs to scan for that object.
-movement prediction model to remember how the object moved in the past frames so that we can predict the next possible location space of the object.
-An object tracker tries to understand and model the motion of an object mostly in the pixel level, that is called the motion model. it can estimate the location of an object in the future frames that would reduce the size of the image that the tracker looks for the object.
 == Appearance Modelling ==
-A good tracker must understand the appearance of the object that the tracker tracks, they must learn to differentiate the object from the background which is in the image.
+An instance of an object has also an appearance characteristics.
+A good tracker must understand the appearance of the object that the tracker tracks by using the previous frames to train the appearance model and also they must learn to differentiate the object from the background which is in the image.
-== Motion Detection ==
+To sum up, if the tracker has efficient models about the object's look and behavior, it can then use this knowledge to find the exact location of the target object in that current frame.
-A good tracker must learn to estimate the motion of the object in order to have a guess about the space that the target possibly can be present in the frame.
+== Type of object trackers: ==
-== Object Localization ==
+**Offline learning trackers** are used when we have a recorded media, in that case we also use the future frames to make tracking predictions.
-Focus the attention on the region of interest in the frame. Data reduction
-A good tracker uses the motion estimation and figures out the possible region where the target may be locating in the current frame and scan this area using the model that the tracker created about the object's appearance in the past frames and finally find the exact location of the target in that current frame.
-Offline trackers are used when we have a recorded media, in that case we use also the future frames to make tracking predictions. While online trackers can only use the past frames to model the appearance, and the motion of the object for tracking estimations.
+**Online learning trackers** train itself to learn about the object which is inputted to the tracker for learning by drawing a bounding box around that object. Those trackers use an array of frames, starting from the initial frame until the frame that is one before the current frame.
-Online learning trackers train itself to learn about the object(which is initially selected and the bounding box is inputted to the tracker for learning) using the array of frames that start from the initial frame till the frame that is one before the current frame.
+A decision has to be made:
+  - Use an online tracker that could train itself.
+  - Use an offline tracker that has been already trained.
+  - Train an offline tracker to identify only the drones.
+  - Train an offline tracker to identify drones and many other objects.
-Offline learning trackers are trained offline and they do not learn anything during the tracking process. An offline tracker may be trained to identify an object before the tracking starts.
+Offline trackers do not need to learn anything during the tracking process, that sounds faster but training is not an easy task because we can never train a CNN for every possibility. However, online learning trackers may just learn about the object that we are interested in at that moment, for example the object may be red and the background may have no red color, in this case it will be so easy to track the object, in the opposite case it may be very challenging. This is not a physics problem that we can explain and formulate using mathematics but rather an engineering problem that requires experimenting and many trackers have its advantage in different cases, therefore, I have decided to implement several tracking algorithms which the user can decide what to use in different scenarios.
+Most of the traditional trackers that are available in OpenCV are not based on deep learning CNNs and my favorite algorithm is KCF.
-Most of the traditional trackers that are available in OpenCV are not based on Deep Learning. (KCF is the best one)
 CNN(Convolutional Neural Network) based offline trackers: GOTURN
 CNN(Convolutional Neural Network) based online trackers: MDNet(Multi domain network) best DL based
-Tracking algorithms available:
+Tracking algorithms available in this system:
-  * __**Boosting Tracker:**__ A real-time object tracking based on a novel online version of the AdaBoost algorithm. The classifier uses the surrounding background as negative examples in update step to avoid the drifting problem.
+  * __**Boosting Tracker:**__
   * __**MIL Tracker:**__
   * __**KCF Tracker:**__
-  * __**KCF Tracker:**__
+  * __**MEDIANFLOW Tracker:**__
-  * __**KCF Tracker:**__
+  * __**GOTURN Tracker:**__
-  * __**KCF Tracker:**__
+  * __**MOSSE Tracker:**__
-  * __**KCF Tracker:**__
+  * __**CSRT Tracker:**__
-  * __**KCF Tracker:**__
+  * __**MDNet Tracker:**__
+  * __**ROLO Tracker:**__
 ==== 5. PID Controller ====