How to decode human vision through neural signals has attracted a long-standing interest in neuroscience and machine learning. Modern contrastive learning and generative models improved the performance of visual decoding and reconstruction based on functional Magnetic Resonance Imaging (fMRI) and electroencephalography (EEG). However, combining these two types of information is difficult to decode visual stimuli, including due to a lack of training data. In this study, we present an end-to-end fMRI-EEG based visual reconstruction zero-shot framework, consisting of multiple tailored brain encoders and fuse module, which projects neural signals from different sources into the shared subspace as the CLIP embedding, and a two-stage multi-pipe fMRI-EEG-to-image generation strategy. In stage one, fMRI and EEG are embedded to align the high-level CLIP embedding, and then the prior diffusion model refines combined embedding into image priors. In stage two, we input this combined embedding to a pre-trained diffusion model. The experimental results indicate that our fMRI-EEG-based visual zero-shot framework achieves SOTA performance in reconstruction, highlighting the portability, low cost, and hight temporal and spatial resolution of combined fMRI-EEG, enabling a wide range of BCI applications.
fMRI-EEG-based visual decoding and generation framework. The fMRI and EEG encoders are designed as flexible replacement components. After aligning with image features, the combined fMRI-EEG features are used to obtain reconstructed images through a two-stage generator.
@article{visualstimulireconstruction2024fmrieeg,
title={Visual stimuli reconstruction from simultaneous fMRI-EEG signals},
author={Daniil Dorin, Nikita Kiselev, Ernest Nasyrov, Kirill Semkin, Andrey Grabovoy, Vadim Strijov},
year={2024},
}