Adaptive Intelligence | Marco P. Apolinario

Deep neural networks today excel at learning complex representations from massive datasets — yet they remain inherently static. Once trained, they struggle to incorporate new knowledge without overwriting the old, a phenomenon known as catastrophic forgetting. Moreover, their reliance on global gradient backpropagation requires storing large activation maps and synchronizing updates across all layers — a process that is energy-intensive and ill-suited for on-device intelligence.

In contrast, natural intelligence learns continuously and adaptively. The brain integrates new experiences without erasing the past, leveraging distributed representations and sparse updates to maintain a stable sense of the world.

A central question motivates my research:

Can we engineer learning systems that continuously evolve and improve on-device, without relying on cloud retraining or external memory?

This question guides my work on adaptive intelligence — algorithms that allow deep networks to learn new tasks sequentially while preserving prior knowledge, all under the strict memory and energy constraints of embedded devices.

The first step toward this goal was to overcome catastrophic forgetting — the tendency of deep networks to lose prior knowledge when learning new tasks. In CODE-CL (Conceptor-Based Gradient Projection for Deep Continual Learning) (Apolinario et al., 2025), we introduced a biologically inspired framework where each task’s learned knowledge is encoded as a conceptor matrix — a compact representation of the activation subspace spanned by that task.

When learning a new task, CODE-CL projects incoming gradients onto the pseudo-orthogonal complement of previously learned subspaces, preventing destructive interference while still allowing information flow along shared directions. This mechanism effectively balances stability and plasticity: it preserves important gradient directions from past tasks while enabling forward knowledge transfer for new ones. Through extensive experiments on benchmark datasets such as Split CIFAR-100, Split MiniImageNet, and 5-Datasets, CODE-CL demonstrated higher accuracy and reduced forgetting compared to state-of-the-art continual learning methods — all with minimal memory overhead and high computational efficiency.

Yet, while CODE-CL addressed how to remember, it also revealed the challenge of where to learn efficiently. Training deep networks, even for continual adaptation, still demands large activation storage for backpropagation — a severe limitation for edge devices with only a few hundred kilobytes of on-chip memory.

To address this, we developed LANCE (Low-Rank Activation Compression for Efficient On-Device Continual Learning) (Apolinario & Roy, 2025), a framework that rethinks backpropagation itself. LANCE performs a one-shot higher-order singular value decomposition (HOSVD) to identify a reusable low-rank subspace for each layer’s activations. Instead of recomputing decompositions every iteration, activations are projected into this fixed subspace throughout training — drastically reducing both memory and compute cost. This design reduces activation storage by up to 250× and computational energy by 1.5×, while maintaining accuracy within 2% of full backpropagation.

Crucially, these fixed low-rank subspaces naturally extend to continual learning. Each new task is allocated to orthogonal components of the subspace, allowing models to acquire new knowledge without overwriting the old, and doing so entirely on-device. LANCE thus unifies compression and continual adaptation into a single mechanism, enabling edge devices to fine-tune and learn continually without relying on cloud resources or replay buffers.

Together, CODE-CL and LANCE define a roadmap toward adaptive edge intelligence — systems that learn efficiently, preserve knowledge over time, and adapt to changing environments. By coupling subspace-based memory preservation with low-rank activation compression, this research bridges algorithmic design and hardware co-optimization, paving the way for energy-efficient, lifelong learning in next-generation embedded and neuromorphic AI systems.

Continual learning (CL) - the ability to progressively acquire and integrate new concepts - is essential to intelligent systems to adapt to dynamic environments. However, deep neural networks struggle with catastrophic forgetting (CF) when learning tasks sequentially, as training for new tasks often overwrites previously learned knowledge. To address this, recent approaches constrain updates to orthogonal subspaces using gradient projection, effectively preserving important gradient directions for previous tasks. While effective in reducing forgetting, these approaches inadvertently hinder forward knowledge transfer (FWT), particularly when tasks are highly correlated. In this work, we propose Conceptor-based gradient projection for Deep Continual Learning (CODE-CL), a novel method that leverages conceptor matrix representations, a form of regularized reconstruction, to adaptively handle highly correlated tasks. CODE-CL mitigates CF by projecting gradients onto pseudo-orthogonal subspaces of previous task feature spaces while simultaneously promoting FWT. It achieves this by learning a linear combination of shared basis directions, allowing efficient balance between stability and plasticity and transfer of knowledge between overlapping input feature representations. Extensive experiments on continual learning benchmarks validate CODE-CL’s efficacy, demonstrating superior performance, reduced forgetting, and improved FWT as compared to state-of-the-art methods.

On-device learning is essential for personalization, privacy, and long-term adaptation in resource-constrained environments. Achieving this requires efficient learning, both fine-tuning existing models and continually acquiring new tasks without catastrophic forgetting. Yet both settings are constrained by high memory cost of storing activations during backpropagation. Existing activation compression methods reduce this cost but relying on repeated low-rank decompositions, introducing computational overhead. Also, such methods have not been explored for continual learning. We propose LANCE (Low-rank Activation Compression), a framework that performs one-shot higher-order Singular Value Decomposition (SVD) to obtain a reusable low-rank subspace for activation projection. This eliminates repeated decompositions, reducing both memory and computation. Moreover, fixed low-rank subspaces further enable on-device continual learning by allocating tasks to orthogonal subspaces without storing large task-specific matrices. Experiments show that LANCE reduces activation storage up to 250 while maintaining accuracy comparable to full backpropagation on CIFAR-10/100, Oxford-IIIT Pets, Flowers102, and CUB-200 datasets. On continual learning benchmarks (Split CIFAR-100, Split MiniImageNet, 5-Datasets), it achieves performance competitive with orthogonal gradient projection methods at a fraction of the memory cost. These results position LANCE as a practical and scalable solution for efficient fine-tuning and continual learning on edge devices.

References

2025