Analysis of fMRI data with AFNI: Connectivity analyses


Note: before attempting the techniques here you should be familiar with the basic steps in an AFNI analysis of fMRI data, as outlined here:

Analysis of fMRI data with AFNI: Fundamental steps

Many researchers are more interested in how different regions in the brain interact, than whether specific regions activate in isolation. This makes sense if the brain is considered to be at least somewhat modular. To address questions concerning the interaction of different brain regions, or their relationship as part of an interconnected network, it is desirable to devise means of testing their functional connectivity. To this end, we can adopt a number of approaches, ranging from simple analyses that make use of individual differences in patterns of brain activation, to more complex time-series based approaches based on structural equation modeling (SEM). Below I outline some of the possibilities using AFNI software.

NOTE: All of these techniques are correlational. Do not kid yourself that you can use them to infer causality. You cannot, no matter how sophisticated the statistics, get around the fact that you are associating two or more observed variables.

  1. Using individual differences in brain activation to infer connectivity

  2. Simple time-based correlation approaches to functional connectivity

  3. Structural equation modeling and functional connectivity

1. Using individual differences in brain activation to infer connectivity

The simplest approach to addressing questions concerning the covariation in functional activation between different parts of the brain is to ask how individual differences in functional activation in one brain region are correlated with individual differences in activation in one more more other regions. i.e. if certain subjects show greater activation in region 1, where else in the brain do those same subjects show greater activation (positive correlation) or lesser activation (negative correlation)?

1.1. Cluster-cluster correlation: To perform this type of test, we could simply take the functional contrast values extracted from a set of functionally-derived clusters (as we get from a standard fMRI analysis) or apriori-defined regions of interest, for each subject, and calculate the correlation coefficient. For example, let's imagine that in an experiment measuring BOLD response to fear versus neutral facial expressions, a group analysis revealed clusters of fear-neutral activation in amygdala (AMYG), anterior cingulate (ACC) and fusiform gyrus (FUSI). Extracting the fear-neutral percent signal change contrast estimates from each of these clusters for every subject might give us something like:

































































Calculating the correlations between regions would reveal that activation in amygdala correlates 0.78 with ACC activation and 0.6 with fusiform activation, while fusiform and ACC correlate at 0.87. We could go further and include a measure of anxiety (last column), which correlates significantly with all three brain regions. As a further step, we could use partial correlations or multiple regression (within a statistical software package) to determine the unique and shared (overlapping) variance in the variables, or the degree to which activation in one more more of the brain regions uniquely explains variance in subject anxiety.

At this stage, the astute might be asking “Why even bother with this analysis? When all three brain regions are activated, isn't it obvious that they will be correlated?” Much of the time this is true, and an analysis of this type contributes little extra information. However, it is not always the case that brain regions that commonly show activation in a group analysis will show correlation in the degree of activation across subjects. Indeed, one can imagine a situation in which a brain region that activates to emotional facial expressions, but particularly so in socially anxious individuals, will show negative across-subject correlation with a different brain region that reacts to emotional facial expressions, but particularly so in happy, non-socially anxious individuals.

1.2. Voxelwise correlation with a seed cluster: Perhaps we are interested in the more exploratory question of which areas of the brain show cross-subject correlation with a predetermined region or cluster. In this case, we can use the regression analysis technique explained here to perform a voxelwise cross-subject correlation analysis. Instead of using a behavioral measure as the predictor in a regression, we can use the extracted contrast values from a functionally-derived cluster or apriori-defined region of interest as the predictor. Using the example above, we could perform a whole brain regression analysis using each subject's amygdala contrast value as the predictor, and each subject's whole brain contrast map as the dependent variable. The resulting statistical map could then be thresholded in the standard way to yield clusters of “activation”, where activation in this case represents significant cross-subject correlation between the respective brain regions and the seed region (e.g. the amygdala).

Using either the cluster-cluster technique or the voxelwise technique, it is possible to extend the analyses to ask questions about how two brain regions might interact to affect other brain regions, or how activation in a brain region might interact with a behavioral individual difference variable to influence activation in other brain regions. For example, in addition to including contrast values from two brain regions (say amygdala and ACC) as predictors, we can multiply the de-meaned contrast values (i.e. the contrast value for a given subject minus the group mean) from the two clusters together and use the resulting value as a predictor that represents the interaction of the two regions. Other techniques such as mediation analysis are extensions of these basic regression techniques.

Interpretation of cross-subject connectivity analyses. An important distinction exists between this type of analysis and those based upon time series of individual subjects' BOLD signal, as described below. In these analyses we can draw inferences about how brain activation between two or more brain regions corresponds across subjects. i.e. “those individuals who show greater activation in region A show lower activation in region B”. This is not the same as “when individuals show greater activation in region A they show lower activation in region B”. The latter requires an examination of the within-subject covariation of activation in each brain region. One type of analysis is not better than the other. They simply answer different questions. You should be sure to interpret and present your results accordingly.

<back to menu>

2. Simple time-based correlation approaches to functional connectivity

To assess functional connectivity within subjects, we need to turn to the time series of BOLD signal. The basic approach is:

  1. Decide upon a 'seed' region – we're interested to know what other areas of the brain show functional connectivity with this region. This region could be defined apriori, based on an anatomically-derived region of interest, or it could be functionally defined, based upon super-threshold clusters of activation from either single-subject or multi-subjects fMRI analysis.

  2. Extract the time series from this region (usually the mean time series for all voxels in the region).

  3. Use this time series as a regressor in a single-subject analysis using the standard GLM methods.

  4. Convert the parameter estimate for that regressor to a correlation coefficient (or an analogous standardised value).

  5. Carry out group-level analysis on the correlation coefficient to determine mean functional connectivity across a group, differences in functional connectivity between groups, or relationship of functional connectivity to one more more covariates of interest such as behavioral measures.

For the purposes of this guide, let's assume that the seed ROI is cluster 5 of brick 2 of the file group_glm_thresh+tlrc and that individual subject functional data (slice-time corrected and motion corrected from two functional scan runs) is stored as follows:




2.1. Extracting the time series from the seed region: The first thing we need to do is get the ROI and the subject's functional data in the same space. Here we have a choice of either transforming the functional data into standardised (e.g. Talairach) space, or transforming the ROI back from standardised space to individual subject space. There are advantages and disadvantages to both methods.

The first method is easier, since all you need do is apply the same transformation that you previously applied to the individual subject statistical maps, this time to the run_1 and run_2 data. The disadvantage is that this creates much larger datasets if you resample the data to a higher resolution (say 2mm cubic voxels), which uses up more disk space and also makes GLM analysis much slower (approx. 8 times slower if going from 4mm cubic voxels to 2mm cubic voxels). It can also be argued that it's better to perform individual-subject GLM on non-resampled data.

The second approach is theoretically better, uses less memory, and the GLM will run faster. But you need to be able to apply the reverse transform to the ROI mask for each subject. How to do that? We use both the subject's tlrc anatomical and orig functional datasets, as follows:

3dfractionize -prefix roi_mask_subj001 -template subject001/run_1/run_1_mc+orig -warp subject001/T1High+tlrc -input group_glm_thresh+tlrc[2] -preserve -clip 0.5

This command will use the transformation information stored in the header of subject001/T1High+tlrc to reverse-warp the input ROI dataset group_glm_thresh+tlrc[2] to the same space and coordinates as the template dataset subject001/run_1/run_1_mc+orig, saving the data in a new dataset called roi_mask_subj001+orig. The clip value, which should range from 0 to 1, determines how inclusive or exclusive to make the resulting voxel values (see the help for 3dfractionize). Smaller values will tend to make bigger clusters in the original space, whereas large values will make smaller clusters.

NOTE: At this stage, it is a very good idea to overlay the new ROI mask dataset on subject001's anatomy to make sure that the respective clusters are located where they should be. If not, then something's gone wrong with the reverse transform.

Now we need to extract the time series from the subject's functional datasets:

3dmaskave -quiet -mask roi_mask_subj001+orig -mrange 5 5 subject001/run_1/run_1_mc+orig > subj001_clust5_timeseries.1D
3dmaskave -quiet -mask roi_mask_subj002+orig -mrange 5 5 subject001/run_2/run_2_mc+orig >> subj001_clust5_timeseries.1D

These commands extract the mean time series of each scan run from cluster 5 of the mask dataset, saving the time series into a 1D file that can then be used in individual subject GLM analysis. To look at the time series (recommended):

1dplot subj001_clust5_timeseries.1D

2.2. Single-subject GLM with cluster time series as a regressor

We are now able to run the single subject GLM exactly as we normally would for a regular fMRI analysis, but this time, instead of using regressors for each experimental condition, we use the time series extracted from the cluster.

NOTE: We use 3dDeconvolve here, and not simply 3dfim, since it gives us more flexibility to model signal variation due to extraneous sources such as motion.

3dDeconvolve –input run_1/run_1_mc+orig run_2/run_2_mc+orig \
-nfirst 0 –polort 2 \
-num_stimts 7 –basis_normall 1 \
-stim_file 1 subj001_clust5_timeseries.1D \
-stim_label 1 clust5_corr \

Here we are using the “old” format of specifying the regressor, where we have an explicit time series 1D file and don't specify a model. The rest of the command is like a normal GLM, as follows (here we use motion covariates, which is essential in this type of analysis):

-stim_file 2 run_all_motion.txt[1] \
-stim_label 2 roll \
-stim_base 2 \
-stim_maxlag 2 1 \
-stim_file 3 run_all_motion.txt[2] \
-stim_label 3 pitch \
-stim_base 3 \
-stim_maxlag 3 1 \
-stim_file 4 run_all_motion.txt[3] \
-stim_label 4 yaw \
-stim_base 4 \
-stim_maxlag 4 1 \
-stim_file 5 run_all_motion.txt[4] \
-stim_label 5 I_S \
-stim_base 5 \
-stim_maxlag 5 1 \
-stim_file 6 run_all_motion.txt[5] \
-stim_label 6 R_L \
-stim_base 6 \
-stim_maxlag 6 1 \
-stim_file 7 run_all_motion.txt[6] \
-stim_label 7 A_P \
-stim_base 7 \
-stim_maxlag 7 1 \
-fitts clust5_fit_ts –errts clust5_error_ts \
-xjpeg clust5_matrix.jpg –rout -tout –fout -bucket clust5_glm_out

Notice that we've also added the -rout flag, which will output the r-squared stats, which can be converted to correlation coefficients. The output file, clust5_glm_out+orig, will contain a t-stat sub-brick labeled 'clust5_corr' which can be used as the threshold in AFNI to see which voxels in the brain showed significant temporal correlation (which can be interpreted as “functional connectivity”) with the seed region.

To get the actual correlation coefficient, we can use the following command (assuming that the r-squared value for clust5_corr is sub-brick 6 and its associated t-stat is in sub-brick 7):

3dcalc -prefix clust5_corr -a clust5_glm_out+orig[6] -b clust5_glm_out[7] -expr “sqrt(a)*abs(b)/b”

(We've used the t-statistic in the multiplier abs(b)/b to give us the sign of the correlation coefficient)

2.3. Group-level analysis on the correlation coefficient: This is a simple matter of transforming the correlation coefficient datasets for each subject to standard space, smoothing and analysing with ANOVA or regression just as you would normally with functional contrasts. The difference is in the interpretation of the data, which will now indicate areas that show significant temporal correlation with the seed region (in the case of a group average), or significant differences in the temporal correlation with the seed region (in the case of a group difference), or even differences in the degree of temporal correlation with the seed region that are predicted by one or more external behavioural or physiological variables (in the case of a regression analysis).

<back to menu>

3. Structural equation modeling and functional connectivity

AFNI now provides programs to apply the methods of structural equation modeling (SEM) to FMRI data. Structural equation modeling allows the testing of more complex models of functional connectivity, consisting of multiple interconnected brain regions and their interactions with other variables (such as behavioural, diagnostic or physiological measures). SEM is not magic – if you give it junk data, it will give you a junk model. It is also worth considering that the more complex your model, and therefore the more parameters that need to be estimated, the less reliable the results are likely to be. One good idea is to estimate a model with one half of your data, and then test it with the other half. More on this below.

AFNI web page on SEM: You'll want to refer to this for additional information.

3.1. Getting started: Extracting time series from regions of interest. The procedure here is exactly the same as that outlined in section 2.1 above. You need to identify your regions of interest and extract the time series from each one. The regions might be based on anatomical-based ROIs, or statistical clusters from a group analysis.

For the sake of this tutorial, let's assume that we have 3 regions of interest: amyg, ins, acc, and that the time series for each subject are stored in 1D files as follows:

subj001_ins.1D subj001_acc.1D

subj002_amyg.1D subj002_ins.1D subj002_acc.1D etc.

3.2. Singular value decomposition. In this step, we run a singular value decomposition for each ROI on the matrix of time series for all N subjects:

1dsvd subj001_amyg.1D subj002_amyg.1D ... subj00N_amyg.1D

You can think of this step as performing a Principal Components Analysis (PCA), which we use to identify the first principal component in the time series, which should correspond to activation in your functional paradigm. You need to find the part that pertains to the left-hand matrix in the SVD:

++ Left Vectors [U]:
25.639 7.9888 ... 0.5319
------------ ------------ ------------
00: -0.38698 0.075116 ... -0.090827
01: -0.09261 0.033281 ... -0.077916
300: -0.42006 -0.040904 ... 0.35085

The values in the top row are the eigenvalues for each eigenvector, which is the column of numbers below each eigenvalue. Identify the largest eigenvalue (i.e. the largest value in the top row), and save the column below it to a 1D file, say sv_amyg.1D. This should be the first principal component, also called the first eigenvector.

You should plot the 1D file to make sure it seems to have signal fluctuations that look about right for your experimental paradigm.

1dplot sv_amyg.1D

Now obtain the mean time series for the ROI (across all subjects):

1deval -a subj001_amyg.1D -b subj002_amyg.1D ... -expr “(a+b+c+...)/N” > amyg_mean.1D

and calculate its dot product with the first principal comonent:

3ddot -dodot sv_amyg.1D amyg_mean.1D

If the value is negative, then flip the sign of the first principal component:\

1deval -a sv_amyg.1D -expr “-a” > sv_amyg.1D

We also need to calculate the residual error variance for the ROI, which is given by:

psi_amyg =

where si is the ith eigenvalue.

Repeat these steps for each ROI. Save the residual error variances from all ROIs in a 1D file:


3.3. Calculating inter-ROI correlations and degrees of freedom.

We can now calculate the intercorrelation matrix between the different ROI principal components:

1ddot -dem sv_amyg.1D sv_ins.1D sv_acc.1D > roi_correls.1D

To estimate the effective degrees of freedom associated with the analysis, we have to take into account the degree of temporal autocorrelation in the ROI principal components. That is, if we have time series of length T samples, because the ith value of a component will typically be correlated with the i+1th value, and so on, it would not be correct to assume that each time series consisted of T independent samples. If all samples were independent, then the DOF would equal T-R, with R the number of ROIs. Because there is autocorrelation, however, the DOF will effectively be less than that number.

To estimate the first order autocorrelation for each ROI:

1ddot -demean sv_amyg.1D'{0..298}' sv_amyg.1D'{1..299}'

The effective degrees of freedom for the analysis is then given by:

dof =

Where T = number of time points, P = number of ROIs, and ARi = autocorrelation of ith ROI.

3.4. Model validation with SEM. In this mode, we postulate a model of how the different ROIs should be connected, and SEM will test our model. Let's imagine that our theory-based model is the following:

amyg connects to acc

acc connects to amyg

ins connects to acc and amyg

This we could instantiate in a matrix:

#       amyg    ins     acc
amyg    0       1       1
ins     0       0       0
acc     1       1       0

Save this into a 1D file, model.1D

Now we can run the SEM analysis as follows:

1dSEM -theta model.1D -C roi_correls.1D -psi roi_errors.1D -DF <dof>

where you should substitute the DOF value you obtained above for <dof> in the command. The program will produce output like the following:

++ Program 1dSEM: AFNI version=AFNI_2007_01_15_xxxx [32-bit]
++ Authored by: Daniel Glen, Gang Chen
Finding optimal theta values
Total number of iterations 237
Connection coefficients matrix: 3 x 3
01:      0.0000      0.0000      0.5615
02:      0.4512      0.0000      0.6133
03:      -0.5866      0.0000      0.0000

The output shows the estimated coefficients for each theorised connection.