Frame-based event-vision sensors (EVS), while practical for in-sensor computing due to their hardware-compatible data streams, suffer from spurious noise events generated by underlying sensor noise, which hinders reliable deployment. This paper proposes an efficient Spiking Neural Network (SNN) architecture that processes sequences of Binary Activity Frames (BAFs) to perform real-time, on-chip denoising. Evaluated on the DVSCLEAN dataset, our method achieves orders of magnitude greater efficiency—consuming over 4,000 times less energy—than existing deep learning approaches, while simultaneously delivering superior signal purity (up to 31.13 dB SNR) and high signal integrity (up to a 0.9847 F1-score). Our analysis further identifies an optimal 60-90 FPS operational range, confirming that the proposed SNN is a powerful and viable solution for enabling robust visual processing in resource-constrained EVS systems.