Neural networks : the official journal of the International Neural Network Society
Continual low-rank scaled dot-product attention.
Ginés Carreto Picón, Illia Oleksiienko, Lukas Hedegaard, Arian Bakhtiarnia, Alexandros Iosifidis
Published: 202510.1016/j.neunet.2025.108517
Abstract
Transformers are widely used for their ability to capture data relations in sequence processing, with great success for a wide range of static tasks. However, the computational and memory footprint of their main component, i.e., the Scaled Dot-produc…
Preview only. Read the full abstract at the source