Perform classification, object detection, transfer learning using convolutional neural networks (CNNs, or ConvNets), create customized detectors
Object detection is a computer vision technique for locating instances of objects in images or videos. Object detection algorithms typically leverage machine learning or deep learning to produce meaningful results. When looking at images or video, humans can recognize and locate objects of interest in a matter of moments. The goal of object detection is to replicate this intelligence using a computer. The best approach for object detection depends on your application and the problem you are trying to solve.
Deep learning techniques require a large number of labeled training images, so the use of a GPU is recommended to decrease the time needed to train a model. Deep learning-based approaches to object detection use convolutional neural networks (CNNs or ConvNets), such as R-CNN and YOLO, or use single-shot detection (SSD). You can train a custom object detector, or use a pretrained object detector by leveraging transfer learning, an approach that enables you to start with a pretrained network and then fine-tune it for your application. Convolutional neural networks require Deep Learning Toolbox™. Training and prediction are supported on a CUDA®-capable GPU. Use of a GPU is recommended and requires Parallel Computing Toolbox™. For more information, see Computer Vision Toolbox Preferences and Parallel Computing Support in MathWorks Products (Parallel Computing Toolbox).
Machine learning techniques for object detection include aggregate channel features (ACF), support vector machines (SVM) classification using histograms of oriented gradient (HOG) features, and the Viola-Jones algorithm for human face or upper-body detection. You can choose to start with a pretrained object detector or create a custom object detector to suit your application.
Apps
Image Labeler | Label images for computer vision applications |
Video Labeler | Label video for computer vision applications |
Functions
expand all
Detect Objects
Deep Learning Detectors
rcnnObjectDetector | Detect objects using R-CNN deep learning detector |
fastRCNNObjectDetector | Detect objects using Fast R-CNN deep learning detector |
fasterRCNNObjectDetector | Detect objects using Faster R-CNN deep learning detector |
ssdObjectDetector | Detect objects using SSD deep learning detector (Since R2020a) |
yolov2ObjectDetector | Detect objects using YOLO v2 object detector |
yolov3ObjectDetector | Detect objects using YOLO v3 object detector (Since R2021a) |
yolov4ObjectDetector | Detect objects using YOLO v4 object detector (Since R2022a) |
Feature-based Detectors
readAprilTag | Detect and estimate pose for AprilTag in image (Since R2020b) |
readArucoMarker | Detect and estimate pose for ArUco marker in image (Since R2024a) |
generateArucoMarker | Generate ArUco marker images (Since R2024a) |
readBarcode | Detect and decode 1-D or 2-D barcode in image (Since R2020a) |
acfObjectDetector | Detect objects using aggregate channel features |
peopleDetectorACF | Detect people using aggregate channel features |
vision.CascadeObjectDetector | Detect objects using the Viola-Jones algorithm |
vision.ForegroundDetector | Foreground detection using Gaussian mixture models |
vision.PeopleDetector | (To be removed) Detect upright people using HOG features |
vision.BlobAnalysis | Properties of connected regions |
Detect Objects Using Point Features
detectBRISKFeatures | Detect BRISK features |
detectFASTFeatures | Detect corners using FAST algorithm |
detectHarrisFeatures | Detect corners using Harris–Stephens algorithm |
detectKAZEFeatures | Detect KAZE features |
detectMinEigenFeatures | Detect corners using minimum eigenvalue algorithm |
detectMSERFeatures | Detect MSER features |
detectORBFeatures | Detect ORB keypoints |
detectSIFTFeatures | Detect scale invariant feature transform (SIFT) features (Since R2021b) |
detectSURFFeatures | Detect SURF features |
extractFeatures | Extract interest point descriptors |
matchFeatures | Find matching features |
Select Detected Objects
selectStrongestBbox | Select strongest bounding boxes from overlapping clusters using nonmaximal suppression (NMS) |
selectStrongestBboxMulticlass | Select strongest multiclass bounding boxes from overlapping clusters using nonmaximal suppression (NMS) |
Train Custom Object Detectors
Load Training Data
boxLabelDatastore | Datastore for bounding box label data (Since R2019b) |
groundTruth | Ground truth label data |
imageDatastore | Datastore for image data |
objectDetectorTrainingData | Create training data for an object detector |
combine | Combine data from multiple datastores |
Train Feature-Based Object Detectors
trainACFObjectDetector | Train ACF object detector |
trainCascadeObjectDetector | Train cascade object detector model |
trainImageCategoryClassifier | Train an image category classifier |
Train Deep Learning Based Object Detectors
trainRCNNObjectDetector | Train R-CNN deep learning object detector |
trainFastRCNNObjectDetector | Train Fast R-CNN deep learning object detector |
trainFasterRCNNObjectDetector | Train Faster R-CNN deep learning object detector |
trainSSDObjectDetector | Train an SSD deep learning object detector (Since R2020a) |
trainYOLOv2ObjectDetector | Train YOLO v2 object detector |
trainYOLOv3ObjectDetector | Train YOLO v3 object detector (Since R2024a) |
trainYOLOv4ObjectDetector | Train YOLO v4 object detector (Since R2022a) |
Augment and Preprocess Training Data for Deep Learning
balanceBoxLabels | Balance bounding box labels for object detection (Since R2020a) |
bboxcrop | Crop bounding boxes (Since R2019b) |
bboxerase | Remove bounding boxes (Since R2021a) |
bboxresize | Resize bounding boxes (Since R2019b) |
bboxwarp | Apply geometric transformation to bounding boxes (Since R2019b) |
bbox2points | Convert rectangle to corner points list |
imwarp | Apply geometric transformation to image |
imcrop | Crop image |
imresize | Resize image |
randomAffine2d | Create randomized 2-D affine transformation (Since R2019b) |
centerCropWindow2d | Create rectangular center cropping window (Since R2019b) |
randomWindow2d | Randomly select rectangular region in image (Since R2021a) |
integralImage | Calculate 2-D integral image |
Design Object Detection Deep Neural Networks
R-CNN (Regions With Convolutional Neural Networks)
rcnnBoxRegressionLayer | Box regression layer for Fast and Faster R-CNN |
fasterRCNNLayers | Create a faster R-CNN object detection network (Since R2019b) |
rpnSoftmaxLayer | Softmax layer for region proposal network (RPN) |
rpnClassificationLayer | Classification layer for region proposal networks (RPNs) |
regionProposalLayer | Region proposal layer for Faster R-CNN |
roiAlignLayer | Non-quantized ROI pooling layer for Mask-CNN (Since R2020b) |
roiInputLayer | ROI input layer for Fast R-CNN |
roiMaxPooling2dLayer | Neural network layer used to output fixed-size feature maps for rectangular ROIs |
roialign | Non-quantized ROI pooling of dlarray data (Since R2021b) |
YOLO v2 (You Only Look Once version 2)
yolov2Layers | Create YOLO v2 object detection network |
yolov2TransformLayer | Create transform layer for YOLO v2 object detection network |
yolov2OutputLayer | Create output layer for YOLO v2 object detection network |
spaceToDepthLayer | Space to depth layer (Since R2020b) |
Focal Loss
focalCrossEntropy | Compute focal cross-entropy loss (Since R2020b) |
SSD (Single Shot Detector)
ssdMergeLayer | Create SSD merge layer for object detection (Since R2020a) |
Anchor Boxes
estimateAnchorBoxes | Estimate anchor boxes for deep learning object detectors (Since R2019b) |
Visualize Detection Results
cuboid2img | Project cuboids from 3-D world coordinates to 2-D image coordinates (Since R2022b) |
insertObjectAnnotation | Annotate truecolor or grayscale image or video |
insertObjectMask | Insert masks in image or video stream (Since R2020b) |
insertShape | Insert shapes in image or video |
showShape | Display shapes on image, video, or point cloud (Since R2020b) |
Evaluate Predicted Results
evaluateObjectDetection | Evaluate object detection data set against ground truth (Since R2023b) |
objectDetectionMetrics | Object detection quality metrics (Since R2023b) |
mAPObjectDetectionMetric | Mean average precision (mAP) metric for object detection (Since R2024a) |
bboxOverlapRatio | Compute bounding box overlap ratio |
bboxPrecisionRecall | Compute bounding box precision and recall against ground truth |
Blocks
Deep Learning Object Detector | Detect objects using trained deep learning object detector (Since R2021b) |
Topics
Get Started
- Getting Started with Object Detection Using Deep Learning
Perform object detection using deep learning neural networks. - Choose an Object Detector
Compare object detection deep learning models, such as YOLOX and YOLOv4. - Local Feature Detection and Extraction
Learn the benefits and applications of local feature detection and extraction. - Get Started with Cascade Object Detector
Train a custom classifier. - Point Feature Types
Choose functions that return and accept points objects for several types of features. - Getting Started with OCR
Detect and recognize text in multiple languages, train OCR models to recognize custom text. - Image Classification with Bag of Visual Words
Use the Computer Vision Toolbox™ functionsfor image category classification by creating a bag of visual words. - Coordinate Systems
Specify pixel Indices, spatial coordinates, and 3-D coordinate systems.
Training Data for Object Detection and Instance Segmentation
- Get Started with the Image Labeler
Interactively label rectangular ROIs for object detection, pixels for semantic segmentation, polygons for instance segmentation, and scenes for image classification. - Get Started with the Video Labeler
Interactively label rectangular ROIs for object detection, pixels for semantic segmentation, polygons for instance segmentation, and scenes for image classification in a video or image sequence. - Datastores for Deep Learning (Deep Learning Toolbox)
Learn how to use datastores in deep learning applications. - Training Data for Object Detection and Semantic Segmentation
Create training data for object detection or semantic segmentation using the Image Labeler or Video Labeler. - Get Started with Image Preprocessing and Augmentation for Deep Learning
Preprocess data for deep learning applications with deterministic operations such as resizing, or augment training data with randomized operations such as random cropping.
Get Started With Deep Learning
- Deep Learning in MATLAB (Deep Learning Toolbox)
Discover deep learning capabilities in MATLAB® using convolutional neural networks for classification and regression, including pretrained networks and transfer learning, and training on GPUs, CPUs, clusters, and clouds. - Pretrained Deep Neural Networks (Deep Learning Toolbox)
Learn how to download and use pretrained convolutional neural networks for classification, transfer learning and feature extraction.
Featured Examples
Open Live Script
Open Live Script
Open Live Script
Open Live Script
Open Live Script
Open Script
Open Script
Open Script
Open Live Script
Open Live Script
Open Live Script
Open Live Script
Open Live Script
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- Deutsch
- English
- Français
- United Kingdom (English)
Contact your local office