Publications
publications by categories in reversed chronological order.
2025
- LLS: Local Learning Rule for Deep Neural Networks Inspired by Neural Activity SynchronizationM. P. E. Apolinario, A. Roy , and K. RoyAccepted at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025
Training deep neural networks (DNNs) using traditional backpropagation (BP) presents challenges in terms of computational complexity and energy consumption, particularly for on-device learning where computational resources are limited. Various alternatives to BP, including random feedback alignment, forward-forward, and local classifiers, have been explored to address these challenges. These methods have their advantages, but they can encounter difficulties when dealing with intricate visual tasks or demand considerable computational resources. In this paper, we propose a novel Local Learning rule inspired by neural activity Synchronization phenomena (LLS) observed in the brain. LLS utilizes fixed periodic basis vectors to synchronize neuron activity within each layer, enabling efficient training without the need for additional trainable parameters. We demonstrate the effectiveness of LLS and its variations, LLS-M and LLS-MxM, on multiple image classification datasets, achieving accuracy comparable to BP with reduced computational complexity and minimal additional parameters. Furthermore, the performance of LLS on the Visual Wake Word (VWW) dataset highlights its suitability for on-device learning tasks, making it a promising candidate for edge hardware implementations.
2024
- Unearthing the Potential of Spiking Neural NetworksS. Chowdhury , A. Kosta , D. Sharma , M. P. E. Apolinario, and K. RoyIn 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE) , 2024
Spiking neural networks (SNNs) offer a promising alternative to traditional analog neural networks (ANNs), especially for sequential tasks, with enhanced energy efficiency. The internal memory in SNNs obtained through the membrane potential equips them with innate lightweight temporal processing capabilities. However, the unique advantages of this temporal dimension of SNN s have not yet been effectively harnessed. To that end, this article delves deeper into the what, why and where of SNNs. By considering event-based optical flow as an exemplary task in vision-based navigation, we highlight that the true potential of SNNs lies in sequential tasks. The event-driven recurrent dynamics of a spiking neuron merged harmoniously with event camera inputs enables SNNs to outperform corresponding ANNs with a lower number of parameters for optical flow. Furthermore, we demonstrate that SNNs can be synergistically combined with ANNs to form SNN-ANN hybrids to obtain the best of both worlds in terms of accuracy, energy, memory, and training efficiency. Additionally, the emergence of various near-memory and in-memory computing techniques has propelled efficient implementation of these approaches. Overall, the immediate future of SNNs looks exciting, as we discover the niche of SNNs, comprising sequential tasks with low power requirements.
- CODE-CL: COnceptor-Based Gradient Projection for DEep Continual LearningM. P. E. Apolinario, and K. RoyUnder Review, 2024
Continual learning, or the ability to progressively integrate new concepts, is fundamental to intelligent beings, enabling adaptability in dynamic environments. In contrast, artificial deep neural networks face the challenge of catastrophic forgetting when learning new tasks sequentially. To alleviate the problem of forgetting, recent approaches aim to preserve essential weight subspaces for previous tasks by limiting updates to orthogonal subspaces via gradient projection. While effective, this approach can lead to suboptimal performance, particularly when tasks are highly correlated. In this work, we introduce COnceptor-based gradient projection for DEep Continual Learning (CODE-CL), a novel method that leverages conceptor matrix representations, a computational model inspired by neuroscience, to more flexibly handle highly correlated tasks. CODE-CL encodes directional importance within the input space of past tasks, allowing new knowledge integration in directions modulated by 1−S, where S represents the direction’s relevance for prior tasks. Additionally, we analyze task overlap using conceptor-based representations to identify highly correlated tasks, facilitating efficient forward knowledge transfer through scaled projection within their intersecting subspace. This strategy enhances flexibility, allowing learning in correlated tasks without significantly disrupting previous knowledge. Extensive experiments on continual learning image classification benchmarks validate CODE-CL’s efficacy, showcasing superior performance with minimal forgetting, outperforming most state-of-the-art methods.
- HALSIE – Hybrid Approach to Learning Segmentation by Simultaneously Exploiting Image and Event ModalitiesS. Biswas , A. Kosta , C. Liyanagedera , M. P. E. Apolinario, and K. RoyIn Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , 2024
Standard frame-based algorithms fail to retrieve accurate segmentation maps in challenging real-time applications like autonomous navigation, owing to the limited dynamic range and motion blur prevalent in traditional cameras. Event cameras address these limitations by asynchronously detecting changes in per-pixel intensity to generate event streams with high temporal resolution, high dynamic range, and no motion blur. However, event camera outputs cannot be directly used to generate reliable segmentation maps as they only capture information at the pixels in motion. To augment the missing contextual information, we postulate that fusing spatially dense frames with temporally dense events can generate semantic maps with fine-grained predictions. To this end, we propose HALSIE, a hybrid approach to learning segmentation by simultaneously leveraging image and event modalities. To enable efficient learning across modalities, our proposed hybrid framework comprises two input branches, a Spiking Neural Network (SNN) branch and a standard Artificial Neural Network (ANN) branch to process event and frame data respectively, while exploiting their corresponding neural dynamics. Our hybrid network outperforms the state-of-the-art semantic segmentation benchmarks on DDD17 and MVSEC datasets and shows comparable performance on the DSEC-Semantic dataset with upto 33.23× reduction in network parameters. Further, our method shows upto 18.92× improvement in inference cost compared to existing SOTA approaches, making it suitable for resource-constrained edge applications.
2023
- S-TLLR: STDP-inspired Temporal Local Learning Rule for Spiking Neural NetworksM. P. E. Apolinario, and K. RoyUnder Review, 2023
Spiking Neural Networks (SNNs) are biologically plausible models that have been identified as potentially apt for the deployment for energy-efficient intelligence at the edge, particularly for sequential learning tasks. However, training of SNNs poses a significant challenge due to the necessity for precise temporal and spatial credit assignment. Back-propagation through time (BPTT) algorithm, whilst being the most widely used method for addressing these issues, incurs a high computational cost due to its temporal dependency. Moreover, BPTT and its approximations solely utilize causal information derived from the spiking activity to compute the synaptic updates, thus neglecting non-causal relationships. In this work, we propose S-TLLR, a novel three-factor temporal local learning rule inspired by the Spike-Timing Dependent Plasticity (STDP) mechanism, aimed at training SNNs on event-based learning tasks. S-TLLR considers both causal and non-causal relationships between pre and post-synaptic activities, achieving performance comparable to BPTT and enhancing performance relative to methods using only causal information. Furthermore, S-TLLR has low memory and time complexity, which is independent of the number of time steps, rendering it suitable for online learning on low-power devices. To demonstrate the scalability of our proposed method, we have conducted extensive evaluations on event-based datasets spanning a wide range of applications, such as image and gesture recognition, audio classification, and optical flow estimation. In all the experiments, S-TLLR achieved high accuracy with a reduction in the number of computations between 1.1−10×.
- Live Demonstration: ANN vs SNN vs Hybrid Architectures for Event-based Real-time Gesture Recognition and Optical Flow EstimationA. Kosta , M. P. E. Apolinario, and K. RoyIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , Jun 2023
Spiking Neural Networks (SNNs) have recently emerged as a promising solution to handle asynchronous data from event-based cameras. Their inherent recurrence allows temporal information in events to be effectively captured unlike widely used non-spiking artificial neural networks (so-called ANNs). However, SNNs are not suitable to run on GPUs and still require specialized neuromorphic hardware to process events efficiently. Hybrid SNN-ANN architectures aim to obtain the best of both worlds with initial SNN layers capturing input temporal information followed by standard ANN layers for ease of training and deployment on GPUs. In this work, we implement ANN, SNN, and hybrid architectures for real-time gesture recognition and optical flow estimation on standard GPUs. We compare different architectures in terms of prediction accuracy, number of parameters, latency, and computational power when executing them in real time on a standard laptop. Our implementation suggests that the hybrid architecture offers the best trade-off in terms of accuracy, compute efficiency, and latency on readily available GPU platforms.
- Hardware/Software co-design with ADC-Less In-memory Computing Hardware for Spiking Neural NetworksM. P. E. Apolinario, A. Kosta , U. Saxena , and K. RoyIEEE Transactions on Emerging Topics in Computing, Sep 2023
Spiking Neural Networks (SNNs) are bio-plausible models that hold great potential for realizing energy-efficient implementations of sequential tasks on resource-constrained edge devices. However, commercial edge platforms based on standard GPUs are not optimized to deploy SNNs, resulting in high energy and latency. While analog In-Memory Computing (IMC) platforms can serve as energy-efficient inference engines, they are accursed by the immense energy, latency, and area requirements of high-precision ADCs (HP-ADC), overshadowing the benefits of in-memory computations. We propose a hardware/software co-design methodology to deploy SNNs into an ADC-Less IMC architecture using sense-amplifiers as 1-bit ADCs replacing conventional HP-ADCs and alleviating the above issues. Our proposed framework incurs minimal accuracy degradation by performing hardware-aware training and is able to scale beyond simple image classification tasks to more complex sequential regression tasks. Experiments on complex tasks of optical flow estimation and gesture recognition show that progressively increasing the hardware awareness during SNN training allows the model to adapt and learn the errors due to the non-idealities associated with ADC-Less IMC. Also, the proposed ADC-Less IMC offers significant energy and latency improvements, 2−7× and 8.9−24.6×, respectively, depending on the SNN model and the workload, compared to HP-ADC IMC.
2021
- BTSymMethod of Estimating River Levels with Reflective Tapes Using Artificial Vision TechniquesL.E. López Huamán , M. P. E. Apolinario, and S.G. Huamán BustamanteIn Iano Y., Arthur R., Saotome O., Kemper G., Borges Monteiro A.C. (eds) Proceedings of the 5th Brazilian Technology Symposium. Smart Innovation, Systems and Technologies , Sep 2021
Water level measurement in a river is a necessary step for flood prevention. Early recognition of this factor could reduce the vulnerability of the population in the surroundings. In this work, we use frames from videos as images to obtain water level measurement indirectly. We applied digital image processing techniques over these images to perform segmentation, edge detection, and we also applied multiple view geometry techniques as projective transformation in a plane. The proposed method estimates water level in specific locations where it is possible to install a reflective tape with a two-color pattern, used as an indicator of water level. We obtained the segmentation of the section out of the water of the reflective tape and so we get the estimation of water level. Through the relation of the real distances (in centimeters) that have four collinear points, seen from a perpendicular view, with the distances (in pixel unit) of those same points contained in an image, which has undergone a projective transformation. We made testing in a water tank built to this work and the tests produced a percentage error at the range of 0.01–2.06% to a distance of 1.5 m from a wall and 2 m high (location of camera).
2019
- Open Set Recognition of Timber Species Using Deep Learning for Embedded SystemsM. P. E. Apolinario, D. A. Urcia Paredes , and S. G. Huaman BustamanteIEEE Latin America Transactions, Dec 2019
Reliable and rapid identification of timber species is a very relevant issue for many countries in South America and especially for Peru, which is the second country with the largest extent of tropical forest, and that is because this issue is a necessity in order to develop an effective management of the forest resources, such as inspection and control of the timber commerce. Since current methods of identification are based on a closed set recognition approach, they are not reliable enough to be used in a practical application because scenarios of identification of timber species are by nature an open set recognition problem. For that reason, in this work we propose a convolutional neural network that has two main characteristics, being able to run in a real-time embedded system and being able to handle the open set recognition problem, that is, this model can discriminate between known and unknown species. In order to evaluate it, tests are performed in two timber species datasets and some experiments are developed in the embedded system Raspberry Pi3B+ to measure energy consumption. The results present high metrics, which means that it manages to discriminate the unknown species with accuracy and F1 score above 91% for two sets of images used. In addition to this, our proposed model obtain lower maximum power value (10-12%) and computational resource usage (5-13%) than a classical convolutional model and MobileNetsV2 measured on the Raspberry Pi3B+.
- Estimation of 2D Velocity Model using Acoustic Signals and Convolutional Neural NetworksM. P. E. Apolinario, S. G. Huaman Bustamante , G. Morales , and D. DiazIn 2019 IEEE XXVI International Conference on Electronics, Electrical Engineering and Computing (INTERCON) , Dec 2019
The parameters estimation of a system using indirect measurements over the same system is a problem that occurs in many fields of engineering, known as the inverse problem. It also happens in the field of underwater acoustic, especially in mediums that are not transparent enough. In those cases, shape identification of objects using only acoustic signals is a challenge because it is carried out with information of echoes that are produced by objects with different densities from that of the medium. In general, these echoes are difficult to understand since their information is usually noisy and redundant. In this paper, we propose a model of convolutional neural network with an Encoder-Decoder configuration to estimate both localization and shape of objects, which produce reflected signals. This model allows us to obtain a 2D velocity model. The model was trained with data generated by the finite-difference method, and it achieved a value of 98.58% in the intersection over union metric 75.88% in precision and 64.69% in sensitivity.
2018
- Deep Learning Applied to Identification of Commercial Timber Species from PeruM. P. E. Apolinario, S. G. Huaman Bustamante , and G. C. OrellanaIn 2018 IEEE XXV International Conference on Electronics, Electrical Engineering and Computing (INTERCON) , Aug 2018
Automatic identification of timber species is a necessity and a challenge in several aspects, especially for government institutions in charge of monitoring forestry resources. In this paper, we propose a methodology to develop an efficient computational model to identify wood samples of seven commercial timber species chosen according to availability of samples properly classified by specialists. For this, we created image sets of wood of seven timber species using a portable digital microscope connected to a personal computer. These images were divided into patches and grouped into training, validation and test sets, with which a convolutional neuronal network was trained. It consist of four layers: two convolutional layers with max pooling and two fully connected layers at the output. Previously, three image patch sizes were evaluated to find the highest accuracy value, precision and sensitivity for the identification. The results show a good performance of the computational model with an accuracy of 94.05% and precision and sensitivity values around 90%, under proposed conditions.