Deployable Multi-Camera System for Robust Object Detection and Human Activity Understanding

Supervisors: Julien Nembrini, Yong-Joon Thoo

Student: TBD

Project status: Open

Year: 2025

Object detection and human pose estimation in indoor environments are central to a wide range of applications — from assistive technologies for people with disabilities (e.g., blind and low-vision individuals, wheelchair users) to occupant behavior monitoring in smart buildings. Within these research contexts, we are developing a ROS-based detection and visualization pipeline using YOLO and RealSense, capable of detecting and localizing objects using depth data and computer vision models. However, the current setup relies on a single camera, which limits spatial coverage (due to occlusions and field of view) and reduces robustness in challenging lighting or positioning conditions.

The goal of this project is to extend this system into a multi-camera setup that can detect objects and estimate human poses more reliably across a shared space. This includes:

Camera coordination: Align the referentials of multiple RealSense cameras using visual markers (e.g., a cube with ArUco markers).
Cross-view fusion: Detect objects and human poses independently from each view and fuse them into a single, unified scene representation.
Evaluation: Compare detection coverage and pose accuracy in single-camera vs. multi-camera configurations.

The student will then apply the resulting system in one of the following application areas and conduct an evaluation in the chosen context:

Assistive augmented reality (AR) for blind and low-vision (BLV) users:

Use the system to provide spatialized object information to users through an AR headset. The RealSense cameras are aligned with the AR headset’s coordinate frame using ArUco markers, allowing visual overlays and guidance cues to appear anchored to the correct objects in the real world. The student will evaluate how multi-camera integration improves the reliability, spatial accuracy, and usability of these AR visualizations for object interaction in realistic scenarios.
Occupant behavior monitoring:

Apply the system for real-time human pose estimation across a room or open space. Evaluate the ease of deployment, improvements in pose tracking, and potential for activity recognition or behavior analysis.

Keywords: Multi-camera setup, Computer Vision (Object Detection & Human Pose Estimation), Indoor Tracking, Robotics

Desired Skills (or willingness to learn):

Python, C++, and OpenCV
3D geometry basics (coordinate transforms, pose estimation)
Experience or interest in YOLO, SSD, OpenPifPaf, or similar models is a bonus
Familiarity with ROS (Robot Operating System), Augmented Reality, and/or Unity is a plus