Implement illegal dumping detection using AI - Smart capture videos from rtsp camera stream

Vincent

2024-09-01

This post is the first part of the series, it will focus on capturing videos from RTSP camera stream, if you want to read the introduction part, then can access this link.

RTSP in cameras

RTSP (Real-Time Streaming Protocol) is an application layer protocol designed for telecommunications and entertainment systems to control the delivery of multimedia data.

RTSP is commonly used in IP cameras, security cameras, and surveillance systems because of its ability to handle media streaming efficiently. There are some reasons why RTSP is often used for cameras:

Low latency: RTSP enables real-time streaming with minimal delay, making it ideal for live video surveillance and monitoring.
Efficient bandwidth usage: RTSP works well over limited bandwidth to maintain smooth streaming.
Standard protocol: RTSP is an industry-standard protocol; many software devices support it.
Control Features: RTSP allows for basic controls such as play, pause, stop, and seek.
Multicast Support: It supports multicasting, allowing multiple users to view the same stream.
Integration with Other Protocols: RTSP works with other protocols, such as RTP (Real-Time Transport Protocol) and RTCP (Real-Time Control Protocol), to manage media data and ensure synchronization and quality of the stream.

Why do we need smart capture video from RTSP camera stream

The camera run daily - 24/7, so usually it will record all videos, and can forward video logs into our drive folder, by the time growing, the video storage becomes larger. Now based on the idea, I want to have only suspicious actitvies videos, so this is the reason for the need of smart capture video.

How can we do that?

My approach is quite simple, by writing a script that utilizing a lightweight AI model for motion and object detection, with the input is RTSP stream (coming from the camera), then we can have output are short video clips. For example, the camera I set, it will point out to places in front of my house, through the script, I will have video clips when people walking, running, or vehicles, motorcycles, or bicycles running.

Determine the AI model

Object detection is a technique used in computer vision for the identification and localization of objects within an image or a video. Object detection models use convolutional neural networks (CNNs) to classify the objects and regressor networks to accurately predict the bounding box coordinates for each detected object.

You Only Look Once (YOLO) is a state-of-the-art, real-time object detection algorithm introduced in 2015 by Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi in their famous research paper “You Only Look Once: Unified, Real-Time Object Detection. There are reasons that make YOLO popular for object detection [1] & [2]

Speed
Detection accuracy
Good generalization

YOLO model supports object tracking to track the same instance of an object throughout a video [3], by using object tracking algorithms like ByteTrack [4].

Trigger recording if movement is detected

To detect movement, I will use a background subtractor [5] to distinguish moving objects from the static background. YOLOv8 identifies objects in the video, and a filter is applied to focus on specific classes of interest. For each detected object, the program checks if the corresponding area in the foreground mask shows significant changes, indicating movement. If movement surpasses a defined threshold, video recording is initiated. The video is continuously recorded until no movement is detected for a set persistence period, ensuring that brief pauses do not stop the recording prematurely. Finally, once recording concludes, the video file is finalized and saved, making it ready for further use.

The example code is

back_sub = cv2.createBackgroundSubtractorMOG2(history=500, varThreshold=100, detectShadows=True)

fourcc = cv2.VideoWriter_fourcc(*"mp4v")
out = None
temp_video_path = None

os.makedirs(output_dir, exist_ok=True)

while True:
    ret, frame = cap.read()
    if not ret:
        print("End of video stream or capture error.")
        break

    fg_mask = back_sub.apply(frame)
    movement_detected = fg_mask.mean() > movement_threshold

    if movement_detected:
        print("Movement detected, starting recording.")
        if out is None:
            height, width, _ = frame.shape
            temp_video_path = f"{output_dir}/{int(time.time())}.mp4"
            out = cv2.VideoWriter(temp_video_path, fourcc, frame_rate, (width, height))

        out.write(frame)

    else:
        if out is not None:
            print(f"Stopping recording and saving {temp_video_path}")
            out.release()
            out = None

Conclusion

The approach I show above is for my problem - it can work even with a webcam, so it’s a low-cost solution. Many modern CCTV cameras, such as Arlo Pro and Hikvision, have built-in features to detect motion and automatically record video clips when movement is detected.

I’ll see you guys in the second part, where I will implement a simple API application so I can send recorded videos.

Reference

[1] - YOLO Object Detection Explained - https://www.datacamp.com/blog/yolo-object-detection-explained
[2] - YOLO Object Detection Explained: Evolution, Algorithm, and Applications - https://encord.com/blog/yolo-object-detection-guide/
[3] - Track Objects - https://supervision.roboflow.com/how_to/track_objects/
[4] - An Introduction to BYTETrack: Multi-Object Tracking by Associating Every Detection Box - https://www.datature.io/blog/introduction-to-bytetrack-multi-object-tracking-by-associating-every-detection-box
[5] - How to Use Background Subtraction Methods - https://docs.opencv.org/4.x/d1/dc5/tutorial_background_subtraction.html