Publications | Marco P. Apolinario

2025

WACV

LLS: Local Learning Rule for Deep Neural Networks Inspired by Neural Activity Synchronization

M. P. E. Apolinario, A. Roy , and K. Roy

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025

Abs arXiv HTML Code

Training deep neural networks (DNNs) using traditional backpropagation (BP) presents challenges in terms of computational complexity and energy consumption, particularly for on-device learning where computational resources are limited. Various alternatives to BP, including random feedback alignment, forward-forward, and local classifiers, have been explored to address these challenges. These methods have their advantages, but they can encounter difficulties when dealing with intricate visual tasks or demand considerable computational resources. In this paper, we propose a novel Local Learning rule inspired by neural activity Synchronization phenomena (LLS) observed in the brain. LLS utilizes fixed periodic basis vectors to synchronize neuron activity within each layer, enabling efficient training without the need for additional trainable parameters. We demonstrate the effectiveness of LLS and its variations, LLS-M and LLS-MxM, on multiple image classification datasets, achieving accuracy comparable to BP with reduced computational complexity and minimal additional parameters. Furthermore, the performance of LLS on the Visual Wake Word (VWW) dataset highlights its suitability for on-device learning tasks, making it a promising candidate for edge hardware implementations.
ICCV

CODE-CL: Conceptor-Based Gradient Projection for Deep Continual Learning

M. P. E. Apolinario, S. Choudhary , and K. Roy

International Conference on Computer Vision (ICCV), 2025

Abs arXiv

Continual learning (CL) - the ability to progressively acquire and integrate new concepts - is essential to intelligent systems to adapt to dynamic environments. However, deep neural networks struggle with catastrophic forgetting (CF) when learning tasks sequentially, as training for new tasks often overwrites previously learned knowledge. To address this, recent approaches constrain updates to orthogonal subspaces using gradient projection, effectively preserving important gradient directions for previous tasks. While effective in reducing forgetting, these approaches inadvertently hinder forward knowledge transfer (FWT), particularly when tasks are highly correlated. In this work, we propose Conceptor-based gradient projection for Deep Continual Learning (CODE-CL), a novel method that leverages conceptor matrix representations, a form of regularized reconstruction, to adaptively handle highly correlated tasks. CODE-CL mitigates CF by projecting gradients onto pseudo-orthogonal subspaces of previous task feature spaces while simultaneously promoting FWT. It achieves this by learning a linear combination of shared basis directions, allowing efficient balance between stability and plasticity and transfer of knowledge between overlapping input feature representations. Extensive experiments on continual learning benchmarks validate CODE-CL’s efficacy, demonstrating superior performance, reduced forgetting, and improved FWT as compared to state-of-the-art methods.
IJCNN

TESS: A Scalable Temporally and Spatially Local Learning Rule for Spiking Neural Networks

M. P. E. Apolinario, K. Roy , and C. Frenkel

International Joint Conference on Neural Networks (IJCNN), 2025

Abs arXiv Code

The demand for low-power inference and training of deep neural networks (DNNs) on edge devices has intensified the need for algorithms that are both scalable and energy-efficient. While spiking neural networks (SNNs) allow for efficient inference by processing complex spatio-temporal dynamics in an event-driven fashion, training them on resource-constrained devices remains challenging due to the high computational and memory demands of conventional error backpropagation (BP)-based approaches. In this work, we draw inspiration from biological mechanisms such as eligibility traces, spike-timing-dependent plasticity, and neural activity synchronization to introduce TESS, a temporally and spatially local learning rule for training SNNs. Our approach addresses both temporal and spatial credit assignments by relying solely on locally available signals within each neuron, thereby allowing computational and memory overheads to scale linearly with the number of neurons, independently of the number of time steps. Despite relying on local mechanisms, we demonstrate performance comparable to the backpropagation through time (BPTT) algorithm, within ∼1.4 accuracy points on challenging computer vision scenarios relevant at the edge, such as the IBM DVS Gesture dataset, CIFAR10-DVS, and temporal versions of CIFAR10, and CIFAR100. Being able to produce comparable performance to BPTT while keeping low time and memory complexity, TESS enables efficient and scalable on-device learning at the edge.
TMLR

S-TLLR: STDP-inspired Temporal Local Learning Rule for Spiking Neural Networks

M. P. E. Apolinario, and K. Roy

Transactions on Machine Learning Research (TMLR), 2025

Abs arXiv HTML Code

Spiking Neural Networks (SNNs) are biologically plausible models that have been identified as potentially apt for deploying energy-efficient intelligence at the edge, particularly for sequential learning tasks. However, training of SNNs poses significant challenges due to the necessity for precise temporal and spatial credit assignment. Back-propagation through time (BPTT) algorithm, whilst the most widely used method for addressing these issues, incurs high computational cost due to its temporal dependency. In this work, we propose S-TLLR, a novel three-factor temporal local learning rule inspired by the Spike-Timing Dependent Plasticity (STDP) mechanism, aimed at training deep SNNs on event-based learning tasks. Furthermore, S-TLLR is designed to have low memory and time complexities, which are independent of the number of time steps, rendering it suitable for online learning on low-power edge devices. To demonstrate the scalability of our proposed method, we have conducted extensive evaluations on event-based datasets spanning a wide range of applications, such as image and gesture recognition, audio classification, and optical flow estimation. S-TLLR achieves comparable accuracy to BPTT (within ±2% for most tasks), while reducing memory usage by 5-50× and multiply-accumulate (MAC) operations by 1.3-6.6×, particularly when updates are restricted to the last few time-steps.

2024

DATE

Unearthing the Potential of Spiking Neural Networks

S. Chowdhury , A. Kosta , D. Sharma , M. P. E. Apolinario, and K. Roy

In 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE) , 2024

Abs HTML

Spiking neural networks (SNNs) offer a promising alternative to traditional analog neural networks (ANNs), especially for sequential tasks, with enhanced energy efficiency. The internal memory in SNNs obtained through the membrane potential equips them with innate lightweight temporal processing capabilities. However, the unique advantages of this temporal dimension of SNN s have not yet been effectively harnessed. To that end, this article delves deeper into the what, why and where of SNNs. By considering event-based optical flow as an exemplary task in vision-based navigation, we highlight that the true potential of SNNs lies in sequential tasks. The event-driven recurrent dynamics of a spiking neuron merged harmoniously with event camera inputs enables SNNs to outperform corresponding ANNs with a lower number of parameters for optical flow. Furthermore, we demonstrate that SNNs can be synergistically combined with ANNs to form SNN-ANN hybrids to obtain the best of both worlds in terms of accuracy, energy, memory, and training efficiency. Additionally, the emergence of various near-memory and in-memory computing techniques has propelled efficient implementation of these approaches. Overall, the immediate future of SNNs looks exciting, as we discover the niche of SNNs, comprising sequential tasks with low power requirements.
WACV

HALSIE – Hybrid Approach to Learning Segmentation by Simultaneously Exploiting Image and Event Modalities

S. Biswas , A. Kosta , C. Liyanagedera , M. P. E. Apolinario, and K. Roy

In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , 2024

Abs arXiv HTML

Standard frame-based algorithms fail to retrieve accurate segmentation maps in challenging real-time applications like autonomous navigation, owing to the limited dynamic range and motion blur prevalent in traditional cameras. Event cameras address these limitations by asynchronously detecting changes in per-pixel intensity to generate event streams with high temporal resolution, high dynamic range, and no motion blur. However, event camera outputs cannot be directly used to generate reliable segmentation maps as they only capture information at the pixels in motion. To augment the missing contextual information, we postulate that fusing spatially dense frames with temporally dense events can generate semantic maps with fine-grained predictions. To this end, we propose HALSIE, a hybrid approach to learning segmentation by simultaneously leveraging image and event modalities. To enable efficient learning across modalities, our proposed hybrid framework comprises two input branches, a Spiking Neural Network (SNN) branch and a standard Artificial Neural Network (ANN) branch to process event and frame data respectively, while exploiting their corresponding neural dynamics. Our hybrid network outperforms the state-of-the-art semantic segmentation benchmarks on DDD17 and MVSEC datasets and shows comparable performance on the DSEC-Semantic dataset with upto 33.23× reduction in network parameters. Further, our method shows upto 18.92× improvement in inference cost compared to existing SOTA approaches, making it suitable for resource-constrained edge applications.

2023

CVPR

Live Demonstration: ANN vs SNN vs Hybrid Architectures for Event-based Real-time Gesture Recognition and Optical Flow Estimation

A. Kosta , M. P. E. Apolinario, and K. Roy

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , Jun 2023

Abs HTML

Spiking Neural Networks (SNNs) have recently emerged as a promising solution to handle asynchronous data from event-based cameras. Their inherent recurrence allows temporal information in events to be effectively captured unlike widely used non-spiking artificial neural networks (so-called ANNs). However, SNNs are not suitable to run on GPUs and still require specialized neuromorphic hardware to process events efficiently. Hybrid SNN-ANN architectures aim to obtain the best of both worlds with initial SNN layers capturing input temporal information followed by standard ANN layers for ease of training and deployment on GPUs. In this work, we implement ANN, SNN, and hybrid architectures for real-time gesture recognition and optical flow estimation on standard GPUs. We compare different architectures in terms of prediction accuracy, number of parameters, latency, and computational power when executing them in real time on a standard laptop. Our implementation suggests that the hybrid architecture offers the best trade-off in terms of accuracy, compute efficiency, and latency on readily available GPU platforms.
TETC

Hardware/Software co-design with ADC-Less In-memory Computing Hardware for Spiking Neural Networks

M. P. E. Apolinario, A. Kosta , U. Saxena , and K. Roy

IEEE Transactions on Emerging Topics in Computing, Sep 2023

Abs arXiv HTML

Spiking Neural Networks (SNNs) are bio-plausible models that hold great potential for realizing energy-efficient implementations of sequential tasks on resource-constrained edge devices. However, commercial edge platforms based on standard GPUs are not optimized to deploy SNNs, resulting in high energy and latency. While analog In-Memory Computing (IMC) platforms can serve as energy-efficient inference engines, they are accursed by the immense energy, latency, and area requirements of high-precision ADCs (HP-ADC), overshadowing the benefits of in-memory computations. We propose a hardware/software co-design methodology to deploy SNNs into an ADC-Less IMC architecture using sense-amplifiers as 1-bit ADCs replacing conventional HP-ADCs and alleviating the above issues. Our proposed framework incurs minimal accuracy degradation by performing hardware-aware training and is able to scale beyond simple image classification tasks to more complex sequential regression tasks. Experiments on complex tasks of optical flow estimation and gesture recognition show that progressively increasing the hardware awareness during SNN training allows the model to adapt and learn the errors due to the non-idealities associated with ADC-Less IMC. Also, the proposed ADC-Less IMC offers significant energy and latency improvements, 2−7× and 8.9−24.6×, respectively, depending on the SNN model and the workload, compared to HP-ADC IMC.

2021

BTSym

Method of Estimating River Levels with Reflective Tapes Using Artificial Vision Techniques

L.E. López Huamán , M. P. E. Apolinario, and S.G. Huamán Bustamante

In Iano Y., Arthur R., Saotome O., Kemper G., Borges Monteiro A.C. (eds) Proceedings of the 5th Brazilian Technology Symposium. Smart Innovation, Systems and Technologies , Sep 2021

Abs HTML

Water level measurement in a river is a necessary step for flood prevention. Early recognition of this factor could reduce the vulnerability of the population in the surroundings. In this work, we use frames from videos as images to obtain water level measurement indirectly. We applied digital image processing techniques over these images to perform segmentation, edge detection, and we also applied multiple view geometry techniques as projective transformation in a plane. The proposed method estimates water level in specific locations where it is possible to install a reflective tape with a two-color pattern, used as an indicator of water level. We obtained the segmentation of the section out of the water of the reflective tape and so we get the estimation of water level. Through the relation of the real distances (in centimeters) that have four collinear points, seen from a perpendicular view, with the distances (in pixel unit) of those same points contained in an image, which has undergone a projective transformation. We made testing in a water tank built to this work and the tests produced a percentage error at the range of 0.01–2.06% to a distance of 1.5 m from a wall and 2 m high (location of camera).

2019

IEEELatam

Open Set Recognition of Timber Species Using Deep Learning for Embedded Systems

M. P. E. Apolinario, D. A. Urcia Paredes , and S. G. Huaman Bustamante

IEEE Latin America Transactions, Dec 2019

Abs HTML

Reliable and rapid identification of timber species is a very relevant issue for many countries in South America and especially for Peru, which is the second country with the largest extent of tropical forest, and that is because this issue is a necessity in order to develop an effective management of the forest resources, such as inspection and control of the timber commerce. Since current methods of identification are based on a closed set recognition approach, they are not reliable enough to be used in a practical application because scenarios of identification of timber species are by nature an open set recognition problem. For that reason, in this work we propose a convolutional neural network that has two main characteristics, being able to run in a real-time embedded system and being able to handle the open set recognition problem, that is, this model can discriminate between known and unknown species. In order to evaluate it, tests are performed in two timber species datasets and some experiments are developed in the embedded system Raspberry Pi3B+ to measure energy consumption. The results present high metrics, which means that it manages to discriminate the unknown species with accuracy and F1 score above 91% for two sets of images used. In addition to this, our proposed model obtain lower maximum power value (10-12%) and computational resource usage (5-13%) than a classical convolutional model and MobileNetsV2 measured on the Raspberry Pi3B+.
INTERCON

Estimation of 2D Velocity Model using Acoustic Signals and Convolutional Neural Networks

M. P. E. Apolinario, S. G. Huaman Bustamante , G. Morales , and D. Diaz

In 2019 IEEE XXVI International Conference on Electronics, Electrical Engineering and Computing (INTERCON) , Dec 2019

Abs HTML

The parameters estimation of a system using indirect measurements over the same system is a problem that occurs in many fields of engineering, known as the inverse problem. It also happens in the field of underwater acoustic, especially in mediums that are not transparent enough. In those cases, shape identification of objects using only acoustic signals is a challenge because it is carried out with information of echoes that are produced by objects with different densities from that of the medium. In general, these echoes are difficult to understand since their information is usually noisy and redundant. In this paper, we propose a model of convolutional neural network with an Encoder-Decoder configuration to estimate both localization and shape of objects, which produce reflected signals. This model allows us to obtain a 2D velocity model. The model was trained with data generated by the finite-difference method, and it achieved a value of 98.58% in the intersection over union metric 75.88% in precision and 64.69% in sensitivity.

2018

INTERCON

Deep Learning Applied to Identification of Commercial Timber Species from Peru

M. P. E. Apolinario, S. G. Huaman Bustamante , and G. C. Orellana

In 2018 IEEE XXV International Conference on Electronics, Electrical Engineering and Computing (INTERCON) , Aug 2018

Abs HTML

Automatic identification of timber species is a necessity and a challenge in several aspects, especially for government institutions in charge of monitoring forestry resources. In this paper, we propose a methodology to develop an efficient computational model to identify wood samples of seven commercial timber species chosen according to availability of samples properly classified by specialists. For this, we created image sets of wood of seven timber species using a portable digital microscope connected to a personal computer. These images were divided into patches and grouped into training, validation and test sets, with which a convolutional neuronal network was trained. It consist of four layers: two convolutional layers with max pooling and two fully connected layers at the output. Previously, three image patch sizes were evaluated to find the highest accuracy value, precision and sensitivity for the identification. The results show a good performance of the computational model with an accuracy of 94.05% and precision and sensitivity values around 90%, under proposed conditions.