A Trinocular System for Pedestrian Localization by Combining Template Matching with Geometric Constraint Optimization.
Jinjing Zhao, Sen Huang, Yancheng Li, Jingjing Xu, Shengyong Xu
Abstract
Open AccessPedestrian localization is a fundamental sensing task for intelligent outdoor systems. To overcome the limitations of accuracy and efficiency in conventional binocular approaches, this study introduces a trinocular stereo vision framework that integrates template matching with geometric constraint optimization. The system employs a trinocular camera configuration arranged in an equilateral triangle, which enables complementary perspectives beyond a standard horizontal baseline. Based on this setup, an initial depth estimate is obtained through multi-scale template matching on the primary binocular pair. The additional vertical viewpoint is then incorporated by enforcing three-view geometric consistency, yielding refined and more reliable depth estimates. We evaluate the method on a custom outdoor trinocular dataset. Experimental results demonstrate that the proposed approach achieves a mean absolute error of 0.435 m with an average processing time of 3.13 ms per target. This performance surpasses both the binocular Semi-Global Block Matching (0.536 m) and RAFT-Stereo (0.623 m for the standard model and 0.621 m for the real-time model without fine-tuning). When combined with the YOLOv8-s detector, the system can localize pedestrians in 7.52 ms per frame, maintaining real-time operation (>30 Hz) for up to nine individuals, with a total end-to-end latency of approximately 32.56 ms.