Tree-Guided Transformer for Sensor-Based Ecological Image Feature Extraction and Multitarget Recognition in Agricultural Systems.
Yiqiang Sun, Zigang Huang, Linfeng Yang, Zihuan Wang, Mingzhuo Ruan, Jingchao Suo, Shuo Yan
Abstract
Open AccessFarmland ecosystems present complex pest-predator co-occurrence patterns, posing significant challenges for image-based multitarget recognition and ecological modeling in sensor-driven computer vision tasks. To address these issues, this study introduces a tree-guided Transformer framework enhanced with a knowledge-augmented co-attention mechanism, enabling effective feature extraction from sensor-acquired images. A hierarchical ecological taxonomy (Phylum-Family Species) guides prompt-driven semantic reasoning, while an ecological knowledge graph enriches visual representations by embedding co-occurrence priors. A multimodal dataset containing 60 pest and predator categories with annotated images and semantic descriptions was constructed for evaluation. Experimental results demonstrate that the proposed method achieves 90.4% precision, 86.7% recall, and 88.5% F1-score in image classification, along with 82.3% hierarchical accuracy. In detection tasks, it attains 91.6% precision and 86.3% mAP@50, with 80.5% co-occurrence accuracy. For hierarchical reasoning and knowledge-enhanced tasks, F1-scores reach 88.5% and 89.7%, respectively. These results highlight the framework's strong capability in extracting structured, semantically aligned image features under real-world sensor conditions, offering an interpretable and generalizable approach for intelligent agricultural monitoring.