Nature image identification and voxels activity modeling using fMRI encoding model

In may 2016. I tried to repeat a pioneering research published on NATURE (Kay et al., 2008) using a data-driven Voxel-Wise Model (Naselaris et al., 2011). I successfully constructed a liner receptive field model for every voxels using the Gabor wavelet filter according to the encoding theory of the primary visual cortex (Hubel and Wiesel, 1968). Without noise ceiling procedures, I also got remarkable accuracy in the identification task (about 48.8%, meanwhile the chance level is about 0.8%).

My code is now open sccess on Github - Voxels-activity-modeling-using-fMRI-data.

Important Statement

I did this stuff in my freshman year in university. At that time both my coding technique and English writing skills are not so sufficient. So please not evaliated my programming skills and English level based on the content below.

About the OPEN ACCESS data

The data comes from CRCNS - Collaborative Research in Computational .

Data set:

Kay, K.N.; Naselaris, T.; Gallant, J. (2011): fMRI of human visual areas in response to natural images.



Figure 1. A given example of the natural image stimulus

Subjects viewed 1870 natural images while recording their brain activities using fMRI. 1750 of these are for training, the remaining is the test set.

After the pre-processing procedure. They used GLM to estimate response for every images. Then they choose about 25000 voxels from V1, V2, V3, V3a, V3b, V4 and LatOcc, etc.




Figure 2. The overview of the data processing procedures. Mainly in three parts, feature extractions, voxels selection, and main training stage.

In order to model voxels’ tunning dynamic. We use this three-stage method. Including feature extractions, voxels selection, and main training(modeling) stages.

Procedures & Results

Extract features

According to the theory about the encoding mechanism of the early visual cortex (EVC). Neurons in EVC firing for some particular orientation and spatial frequency in a particular visual position, which we called the receptive field. In fMRI, the activity patterns in human visual cortex can also reveal what stimulus orientation a person is viewing(Kamitani & Tong, 2005).

To pick up infomation of orientation and spatial frequency in a natural image. We used the garbor wavelet to project onto every images in different spacial location, orientation, and spatial frequency, which we called the Gabor wavelet pyramid.

As for the Gabor wavelet pyramid. Wavelets occur at five (or, in some cases, six) spatial frequencies. This panel depicts one wavelet at each of the first five spatial frequencies. At each spatial frequency f cycles per field-of-view (FOV), wavelets are positioned on an f × f grid, as indicated by the translucent lines.Orientation and phase. At each grid position, wavelets occur at eight orientations and two phases. This panel depicts a complete set of wavelets for a single grid position. Dashed lines indicate the bounds of the mask associated with each wavelet.(Kay et al., 2008) Showed below.


Figure 3. Gabor wavelet pyramid design.

There are five different spatial frequency(1,2,4,8,16 FOV). For every single spatial frequency, it has (1,4,16,64,256) types of different position in the visual field. For every position, there are 2 orthogonal phase and 8 different angles(0:22.5:157.5).

Together, there are 5456 different gabor wavelets in the Gabor wavelet pyramid. I used the gabor_fn.m function to generate a gabor wavelet for the given sf,angle,phase,posi. from Wikipedia

Then, I wrote a function gaborfit.m to generate a set of gabor wavelet image directly. It produce a martix called gab(128x128x5456). Generate 5456 different gabor wavelet image(128x128).

To compute the projection. I wrote the compscript.m (section 1). To compute the projection of 5456 wavelets for 1750 images. The projections for each quadrature pair of wavelets are then squared, summed, and square-rooted, yielding a measure of contrast energy. The result will be in the martix called proj(1750x2728).

To remove wavelets outside the field of the natural image (outside the circle). I wrote a function called gabdelete in gabdelete.m. The results were put into a martix called projdeleteedge(1750x2728).

Training algorithm


Figure 4. Gabor wavelet pyramid model..

Here we used multi-variation liner regression to build up the model. We used the features which we cauculated before using gabor wavelet pyramid model.

We used the traditional square-root error cost function together with gradient descent with monument to optimize the cost function.

$h=h- \varepsilon g$

$g=\left[ \left[ X’(Xh-y) \right]+\alpha g \right]$

To prevent over-fitting, we used early stopping method. A randomly selected 20% of data-set were removed and kept as a stopping set. Iterations proceeded until the squared error on the stopping set nolonger decreased, or until the squared error on the training set no longer decreased.

I wrote two function to running this regression. inittrain.m and linereg.m

The initrain.m is used to pre-processing the dataset, devided it into training set and stopping set for early stopping, and remove NaN data points.

The linereg.m is used for running the optimzation of the cost function.

Pre-training stage

Trainning that model for 25000 voxels was time-consuming. In order to improve efficiency. We carried out a pre-training stage.

In this stage, we tried to reduce dimension firstly. We did not take orentation into our computation: we averged the 8 gabor wavelet projection for each position.

Then we used that new features to fit the model. Using the weight for every location, we can reconstruct a population receptive field.

This method is pretty likely to build up a data driven pRF(population receptive field).


Figure 5. A given example of the reconstructed pRF. Left img represent a well-fitted voxel, its’ receptive field wass very consentrate and sparse. Image in right represent a traditional poorly fitted voxel, its’ receptive field was very confusion.

I worte _gaborfield.m to generate a field map using gaussian functions.
The RFvis.m is used for reconstructing a population receptive field map for a given voxels, using the weight computed in the pre-trainning stage and the gabor_field.m, sum it up to form a receptive field map.
We need to remove those poorly fitted voxels. We have two ways to select voxels.

1. Gaussian Fitted

We can using Gaussian function to fit the reconstructed pRF. The Gaussian RMS width(field vaule in fig 5.) can represent the ‘goodness of fit’, but sometimes some poorly-fitted could also be selected because of some overfitting problem.

2. R-Squared sorting

To quantify the ‘goodness of fit’. We can also using R-Squared vaule directly. We cauculate R2 of those 25000 voxels and make a sorting. And we can choose the top 500 voxels to do the futher analyze.

After the R-Squared sorting procedure, the top-10 voxels showed below.


Figure 5. Top 10 well-fitted voxels. Figure shows the reconstructed pRF of top 10 well-fitted voxels, all of these have a relatively ‘sparse’ and ‘concentrative’ receptive field

The anatomical distribution of those 500 voxels is shown below.


Figure 6. The anatomical distribution of those 500 voxels.

As we can see, a huge amount of voxels were from early visual cortex(V1,V2,V3).

I wrote causelectedh.m to select best 500 voxels, and to run the main-trainning stage below.

Main-training stage

Have choosen 500 well-fitted voxels. We fitted those good voxels with full features(with 8 orientations).
I then used this model to predict 500 voxels’ activity based on 120 natural images in the test set(120 images).
Then, We computed R-Squared vaule between actually response value and the predicted vaule of those 500 voxels among those 120 images, that generated a 120x120 image.


Figure 6. The correlation map of the actually response value and the predicted vaule of those 500 voxels. The (i,j)position’s color represent the R-Squared vaule between actually response value of the number i th image and the predicted vaule of the j th image of those 500 voxels.

I wrote corxcmp.m to predict the activity using test set images, and generate this final result.

In the identifation task. We re-match the target’s brain activities with the best correlated predicted brain activities. In my work, without noise ceiling procedures, the identifation performance is about 58.8%, meanwhile, the chance level is about 0.83%.

Using the optimized wights in the main-training stage. We could generate a tuning properties image. It could tell us which orinentation and spatial frequency contribute more to the activity of the voxels.

As a given example, for the No.23198 voxel, we visualized the weight distribution of the 8 orintations and 5 spatial frequency (5x8).


Figure 7. The weight distribution of the 8 orintation and 5 spatial frequency for the voxel 23198. .

Furthermore, we ploted the tuning properties(based on spatial frequency and orientation) separately.


Figure 8.The tuning properties for the orintations(L) and spatial frequency(R). .

We observed that this voxel prefered 45 degree, and relevantly prefer higher special frequency. Which is consist with the encoding theory of the early visual cortex.


Gilbert, C. D., & Wiesel, T. N. (1985). Intrinsic connectivity and receptive field properties in visual cortex. Vision research, 25(3), 365-374.

Chen, N., P. Cai, T. Zhou, B. Thompson, and F. Fang. 2016. ‘Perceptual learning modifies the functional specializations of visual cortical areas’, Proc Natl Acad Sci U S A.

Hubel, D. H., and T. N. Wiesel. 1968. ‘Receptive fields and functional architecture of monkey striate cortex’, Journal of Physiology, 195: 215-43.

Huth, A. G., W. A. de Heer, T. L. Griffiths, F. E. Theunissen, and J. L. Gallant. 2016. ‘Natural speech reveals the semantic maps that tile human cerebral cortex’, Nature, 532: 453-8.

Kandel, E., and J. Schwartz. 2013. Principles of Neural Science, Fifth Edition (McGraw-Hill Education).

Kay, K. N., T. Naselaris, R. J. Prenger, and J. L. Gallant. 2008. ‘Identifying natural images from human brain activity’, Nature, 452: 352-5.

Miyawaki, Y., H. Uchida, O. Yamashita, M. A. Sato, Y. Morito, H. C. Tanabe, N. Sadato, and Y. Kamitani. 2008. ‘Visual image reconstruction from human brain activity using a combination of multiscale local image decoders’, Neuron, 60: 915-29.

Naselaris, T., K. N. Kay, S. Nishimoto, and J. L. Gallant. 2011. ‘Encoding and decoding in fMRI’, NeuroImage, 56: 400-10.

Norman, K. A., S. M. Polyn, G. J. Detre, and J. V. Haxby. 2006. ‘Beyond mind-reading: multi-voxel pattern analysis of fMRI data’, Trends Cogn Sci, 10: 424-30.

Sprague, T. C., & Serences, J. T. (2013). Attention modulates spatial priority maps in the human occipital, parietal and frontal cortices. Nat Neurosci, 16(12), 1879-1887. doi:10.1038/nn.3574