Note: this introduction is not intended as a guide to the AFNI GUI, for which Hillary Schaefer has created a good introduction:

http://tezpur.keck.waisman.wisc.edu/~oakes/afni/afni_intro.pdf

Also note that for all the AFNI command line programs, a list and description of all the command line options can be viewed by typing the name of the command by itself or followed by “-help”

e.g:

**3dvolreg**

**to3d –help**

The help contents are also available at the official AFNI web site here, with more explanatory manuals here.

** ACKNOWLEDGMENTS:** All of the material here represents
a selective synthesis of the (terrific) official AFNI documentation,
advice gleaned from colleagues and on the web, and my own experiences
in analysing fMRI data. Many thanks to all, especially the AFNI crew
for such powerful yet understandable software.

NOTE: I am assuming that a recent (i.e. 2005 or later) version of AFNI is being used in these instructions. If in doubt about which version of AFNI is set up as your default, type the following command on the Linux command line:

**which afni**

This should tell you the path to the version that is set up as your default. To change the default version, it is necessary to edit your login startup files. If you use the BASH shell (the default in our lab), you can edit the .bash_profile file in your home directory with a text editor, and add the following lines to the end:

**export AFNI_PATH=/apps/linux/afni-latest/**

**export AFNI_PLUGINPATH=$AFNI_PATH**

**export PATH=$AFNI_PATH:$PATH**

**export AFNI_GLOBAL_SESSION=$AFNI_PATH**

Alternatively, if you want to use the most recent version of AFNI commands, you can precede the commands listed below with /apps/linux/afni-latest/ (no trailing space). E.g.:

**/apps/linux/afni-latest/3dvolreg**

This will work on any of the main Linux application servers.

The purpose of this document is to provide an overview of a typical processing pathway for the analysis of fMRI data with AFNI software. Throughout I emphasize the commands that are used to process data. Virtually a full analysis could, at least in principle, be done with command line programs, without ever looking at the data itself. This would be a VERY BAD IDEA. I cannot emphasize enough how important it is to examine your data (using AFNI excellent data visualisation capabilities) at all stages of data processing. The old adage “garbage in…garbage out” is very well suited to fMRI data analysis – if you don’t look at your data and consider what it means, whether results are reasonable, and make an effort to visually identify potential problems, then you are likely to end up with garbage.

Enough of the lecture – on with the overview.

reconstruction of raw functional images (P-files) – specific to The Waisman Laboratory

conversion of anatomical images to AFNI format – specific to The Waisman Laboratory

slice timing correction and motion correction of functional images

single-subject analysis of functional images with General Linear Modeling (GLM)

Scaling parameter estimates to percent signal change and calculating SNR

Group analysis of individual statistical maps using regression

Below I detail each of these steps, with some example data.

**NOTE: These instructions are for P-files collected since the
scanner upgrade in April 2005. For P-files collected before that
time, you should use the instructions
here for reconstruction.**

The functional data is saved from the scanner in K-space form. That is, the data is actually the Fourier transform of the image data itself. We use a program called epirecon to Fourier transform the data back to image space. Epirecon also applies a number of corrections for various types of image distortion. Rather than directly call epirecon, however, we use another program called sip, which in turn calls epirecon. Sip also creates some standard directories and log files which aid in later analysis and debugging.

To use sip, we first create a *.SIF file for the files we wish to reconstruct. A *.SIF file is nothing more than a text file that contains information about the files to be reconstructed, and how we want them to be reconstructed. To reconstruct the P-files, we simply invoke sip with the name of the *.SIF file as an argument:

**sip trainingBlockS001.SIF **

A template for a *.SIF file is shown below – you can copy this text into a new text file and use it as a template.

- # file trainingBlockS001.SIF
- # specifies the template from which this *.SIF file was created
- SIF template = template.SIT
- # specifies the study name
- Study identifier = trainingBlock
- # specifies the subject number
- Subject identifier = 001
- # specifies the base directory in which sub-directories corresponding to each scan run will be created
- Working directory = /study/training/afni/block/s001
- # specifies the name of the log file to be created
- SIF log filename = /study/training/afni/block/s001/trainingBlockS001.log
- #specifies the P-files for each scan run. The items “run_1” and “run_2” here are the names to be given to the sub-directories for each scan run
- run_1 Pfile = /study/training/afni/block/s001/raw/P05120.7
- run_2 Pfile = /study/training/afni/block/s001/raw/P06656.7
- # specifies the ref.dat file that was created during the scan session and that is used for calibration during reconstruction
- Reference filename = /study/training/afni/block/s001/raw/ref.dat.s001
- # Spatial reconstruction filter = Fermi, Hamming, or None
- Reconstruction filter = Fermi
- # specifies the type of file to create = usually you will want to specify BRIK (for AFNI) or ANA (for ANALYZE) here. Other options are SDT, FNI, IFH
- Reconstructed files to create = BRIK
- # specifies the scaling factor to be applied during reconstruction – specify 0 to use whole-brain autoscaling
- Scaling factor = 0
- # specifies how many images at the start of each run to discard. These images correspond to the time that it takes for the scanner to stabilize
- Pre-images to discard = 5
- # other reconstruction parameters that generally should be left as shown here
- ZX = NA
- ZY = NA
- Apply BP asymm correction = Y

Apart from the file names specific to the study and subject being analysed, a few other options will vary according to your study’s needs:

The reconstruction filter chosen depends whether you want to filter your data now, and not later (use “Hamming”), or apply no filtering now and wait until just before Talairach transforming your data before filtering (use “None”), or whether you want to apply a small amount of filtering now to remove ghosting and other artifacts (as done by GE, use “Fermi”) but still filter the data further before Talairaching.

The choice of output file format might depend to some extent on which software is going to be used for analysis. In general, we suggest the AFNI format, since it has some features not available in other formats, but can easily be converted to other formats at a later stage if desired. The AFNI format consists of a binary BRIK file that contains the data, and a text HEAD file that contains useful information about the data. One of the AFNI-specific features is that information about the position of the imaged brain volume in relation to the scanner coordinates can be stored in the header. This makes it possible for AFNI to align or overlay fMRI data with completely different acquisition parameters. e.g. a high resolution anatomical image acquired axially can be aligned with a low resolution functional image acquired saggitally. In addition to information about the format and position of data in the BRIK file, the HEAD file also contains a command history of the AFNI commands used to produce the file. More details about AFNI dataset formats is available here.

The reconstruction will produce a subdirectory for each scan run. Inside the subdirectory will be a reconstruction log file, called recon.log, which is a plain text file containing a record of the slice by slice reconstruction. You should open this file in a text editor and look at it for any error messages. Also in the directory will be the reconstructed data (i.e. a *.HEAD and a *.BRIK file if you chose AFNI format).

**NOTE: These instructions are for scans collected since the
scanner upgrade in April 2005. For scans collected before that time,
you should use the instructions
here.**

To convert anatomical files to AFNI format, we use the AFNI command to3d. In your data directory, you'll have a bunch of subdirectories such as:

S1_3-Plane

S2_EFGRE3D

etc. Each of these contains a different type of anatomical scan, but the overall conversion process should be the same. To take the example of the S2_EFGRE3D directory, which contains the high resolution, T1-weighted anatomical scan, this directory contains a numerically ordered set of files:

S2_EFGRE3D.0001

S2_EFGRE3D.0002

...

S2_EFGRE3D.0124

To convert these to AFNI format, cd into the directory where you want the AFNI format file stored, and then execute the following command:

**to3d –session . -prefix T1High /study/<path to
files>/S2_EFGRE3D.***

where **<path to files>**
completes the full Linux path to the directory containing the files
that you want converted.

Make sure you visually inspect the created file in AFNI. Also, overlay your reconstructed functional files on the anatomical file and visually inspect to make sure they are properly aligned. This can be done by choosing the anatomical file as the underlay, choosing the functional file as the overlay, and setting the overlay threshold to a value such that only voxels in and immediately around the brain are shown.

For a given brain volume, each slice image is acquired at a slightly different time. The exact timing of the slices depends on the acquisition setup. Often slices are either acquired in a straightforward sequence, or in an interleaved sequence in which the even numbered slices are acquired first, followed by the odd numbered slices. Particularly in the latter case, the time offset between adjacent slices will lead to differences in signal that can be substantial, particularly in event related studies. Although such differences will not affect individual voxel analyses, any process that involves averaging or interpolating the functional time series across slices will be affected by slice timing differences. Such processes include 3 dimensional spatial smoothing, cluster averaging, motion correction, and spatial normalization (e.g. to Talairach space). When any of these processes are to be applied to the functional data, it might be desirable to correct for slice timing differences. In AFNI this can be done using the command 3dTshift, but a more simple way is to use the –tshift option during motion correction (see example below). The correction is applied prior to motion correction, since motion correction involves interpolating between adjacent slices – such interpolation will be inaccurate if adjacent slices represent different time points.

Motion correction seeks to correct for small subject head motion during the scan. The command 3dvolreg is used as shown below.

**3dvolreg -prefix run_1_vr –tshift -Fourier -verbose -base
run_1+orig[0] -dfile run_1_motion.txt run_1+orig**

You can use different images as the base image - in this case and
in most cases you will want to use an image taken with the same scan
parameters that was acquired closest in time to your anatomical
images. The example here uses the first image of the first scan run,
which would be appropriate if the anatomical scans were acquired
before the functional scans. As another example, if an experiment
acquired 5 functional scans, each of length 150 TRs, *before*
the anatomical scans, the following command might be appropriate:

**3dvolreg -prefix run_1_vr –tshift -Fourier -verbose -base
run_5+orig[149] -dfile run_1_motion.txt run_1+orig**

This command specifies the final time point (in AFNI, time indices start at 0, so a 150 TR scan run has time points 0 to 149) of the last functional run as the reference, or base image.

The 3dvolreg command creates a new AFNI dataset, consisting of a HEAD file and a BRIK file with the prefix given in the command line.

To see the extent to which there were any large translations or rotations, you can look at the motion text file (use 1dplot for a graphical output).

**1dplot -volreg run_1_motion.txt[1..6]**

(The [1..6] designates the 6 columns of the motion file containing translation and rotation estimates).

3dvolreg can handle small motions fairly well, but larger motion (> 1mm) might not be properly corrected. To see whether this is the case, check the motion-corrected dataset using AFNI. How do the time series look (i.e. are there obvious discontinuities in the data that could be motion-related)? Is there still apparent motion when scrolling through the time index? Make notes!

NOTE: For earlier versions of AFNI there was a bug in the 3dTshift routine, such that samples are “shifted” in the wrong temporal direction. It is important that you use a recent version of AFNI for motion correction and slice-time correction – if in doubt, use the latest version.

The data can be filtered prior to linear model fitting, using 3dFourier, which applies a rather brutal frequency domain filter to the time-series. For event-related designs, this is not really necessary now that the AFNI GLM program, 3dDeconvolve, includes options for more precise baseline drift estimation. For block designs it might still be desirable to eliminate high frequency noise terms:

**3dFourier -prefix run_1_vr_lp -lowpass 0.05 -retrend
run_1_vr+orig**

Here the low pass cutoff value was based upon the spectrum of the input stimulus model. This will tend to smooth the data before linear fitting, thus providing a better model fit. You should always be conservative when choosing frequency cutoffs, and visually inspect the time series in AFNI after temporal filtering.

At some point you will probably want to spatially smooth or blur your data. This is done for two main reasons. For one, as with any smoothing operation, spatial smoothing will tend to average out high spatial frequency noise in your image. Thus areas of activation with a greater spatial extent will remain, but those with a small spatial extent will be eliminated. Second, before aggregating data across subjects, spatial blurring needs to be performed because subjects’ brain anatomies vary, even after normalization to a standard brain space (e.g. Talairach space), and thus their areas of functional activation are likely not to exactly overlap.

There are basically three points in the data analysis process that spatial smoothing can be performed. Data can be smoothed during reconstruction (i.e. using the hamming or fermi filter option). This has the advantage of eliminating unwanted noise and artifacts from the images before they can be compounded by further processing. At least some spatial smoothing at this stage is thus probably warranted. However, spatial smoothing during reconstruction is only performed in-plane. That is, it is applied separately for each brain slice, and thus does not account for unwanted high spatial frequency variation between slices, nor does it help align functional activation across subjects in the slice direction.

A second time to spatially smooth is just before first level statistical analysis of individual subject data. This is the last stage at which the (essentially) raw data time series will be used. After individual subject statistical analysis, further processing is applied to statistical meta-data, to which it is arguably less valid to apply spatial smoothing. Spatial smoothing at this point is advised if i) you want to interpret statistical maps from single subjects, since it will give you improved spatial signal to noise ratio (SNR) and thus greater sensitivity, or ii) you want to include the variance estimates from individual subject statistical analysis in higher level (e.g. group-level) statistical modeling, as is done in FMRISTAT, FSL and SPM, but not currently in AFNI. This second option might be preferable if you have unbalanced data – i.e. different numbers of observations for different conditions, scan sessions, or subjects.

The third possibility is to apply spatial smoothing to the statistical maps produced by individual subject statistical analysis, questions of validity notwithstanding (this is discussed briefly below). Here the advantage is primarily one of processing speed, since instead of smoothing every time point (and you probably have hundreds of time points), you would only need to smooth a smaller number of statistical images.

To opt for the second approach, we can spatially smooth the functional dataset using the command 3dmerge:

**3dmerge -1blur_fwhm 5 -doall -session . -prefix run_1_vr_lp_5mm
run_1_vr_lp+orig**

This will apply a Gaussian blur with a full width at half maximum (FWHM) of 5mm to the dataset.

For an introduction to the principles behind such analyses, see this brief introduction to GLM in functional imaging.

To go beyond the simple example given here, see these explanations of including motion estimates as covariates and estimating the hemodynamic response function (“deconvolution”).

In AFNI, the command that implements single-subject GLM is called
3dDeconvolve. This is not, however, a completely appropriate name,
because 3dDeconvolve doesn’t necessarily perform a deconvolution
analysis. It *can* do deconvolution, but it more generally does
GLM – please use the term GLM rather than deconvolution (it is
correct for all types of analysis that can be performed using
3dDeconvolve).

Note that there are a number of new features that make the 3dDeconvolve command of AFNI more simple, yet more powerful. These are explained in detail at:

http://afni.nimh.nih.gov/afni/doc/misc/3dDeconvolveSummer2004

Before performing the GLM analysis, we need to make one or more stimulus timing files, which are text files containing the time onsets (in seconds) for each type of experimental stimulus for which we want to separately model the BOLD response. These timing files have the file extension .1D by convention. There should be a separate timing file for each stimulus condition to be modeled. tname is the name of the file that contains the stimulus times (in units of seconds, as in the TR of the -input file). There are two formats for timing files:

1. A single column of numbers, in which case each time is relative to the start of the first imaging run ("global times"). This is the older technique and generally not preferred for multiple scan run analyses.

2. If there are 'R' runs catenated together (either directly on the command line, or as represented in the -concat option), the second format is to give the times within each run separately. In this format, the timing file would have R rows, one per run; the times for each run taking up one row. For example, with two runs, R=2:

12.3 19.8 23.7 29.2 39.8 52.7 66.6

16.8 21.8 32.7 38.9 41.9 55.5 71.2

These times will be converted to global times by the 3dDeconvolve program, by adding the time offset for each imaging run, thus eliminating the possibility of human error in making the calculation. When using the multi-row input style, you may have the situation where the particular class of stimulus does not occur at all in a given imaging run. To encode this, the corresponding row of the timing file should consist of a single '*' character

**NOTE**: The times are relative
to the start of the data time series as input to 3dDeconvolve. If the
first few points of each imaging run have been cut off (e.g. during
reconstruction), then the actual stimulus times must be adjusted
correspondingly (e.g., if 5 time points were dropped with TR=2, then
the actual stimulus times should be reduced by 10.0 in the stimulus
timing files).

Now for fitting the linear model using 3dDeconvolve. The best way to explain this is with a simple example. Say we have an event-related experiment in which subjects are shown a sequence of positive and negative images, presented in a pseudorandom order. They undergo two scan runs, each one 180 seconds long. During reconstruction we drop 5 TRs off each scan run, meaning that we have 170 seconds of fMRI data for each scan run left. Let’s imagine that the time onsets (corrected for the 5 dropped TRs) for the two types of stimuli for the two runs are given in 2 files:

pos_onsets.1D:

10.0 19.0 30.9 40.8 53.1 65.0 74.7 84.6 94.5 103.3 115.7 124.7 134.8 144.2 153.7 165.7

16.0 25.4 34.8 43.0 53.5 60.6 73.1 85.5 97.3 107.0 118.3 128.1 140.4 151.9 160.9 170.0

neg_onsets.1D:

15.0 23.7 34.6 46.8 58.6 68.6 80.4 92.9 104.1 115.4 128.2 139.8 148.8 161.0 173.6 183.9

10.0 19.4 31.5 41.6 49.0 57.2 66.6 74.4 85.4 93.0 103.6 115.3 127.8 135.1 144.4 151.4

Let’s also assume that the two scan runs are stored in the files run_1+orig and run_2+orig (remember that the extension .HEAD/.BRIK are not required when referring to AFNI format files in AFNI commands). The start of the command is simple enough:

**3dDeconvolve –input run_1_vr+orig run_2_vr+orig -nfirst 0
–polort 2 \**

(the backslash at the end of the line here is a bash convention used to denote to the operating system that the command continues on the following line)

Here we specify the files to be analysed. AFNI automatically takes
into account that there are 2 scan runs (because there are 2 files),
and will calculate a separate baseline for each scan run. Analysis
should start from the first time point (-nfirst 0) and we want to
model the baseline with a polynomial of degree 2. Roughly speaking, a
2^{nd} degree polynomial will model the baseline with a
constant value, a linear drift, and a quadratic-shaped curve, which
should be sufficient to capture most signal drift in each of these
fairly short scan runs. For longer runs, it might be desirable to use
a polort of 3 or 4, thus including cubic and quartic baseline trends.

**NOTE**: It is no longer necessary to concatenate the runs
before analysis, nor is it necessary to make their baselines equal.
Each scan run will have its baseline individually modeled with a
polynomial of the specified degree.

Now to specify the experimental conditions and timing:

**-num_stimts 2 –basis_normall 1
\**

**-stim_times 1 pos_onsets.1D
‘GAM’ \**

**-stim_label 1 positive \**

**-stim_times 2 neg_onsets.1D
‘GAM’ \**

**-stim_label 2 negative \**

The first line here specifies that we have two stimulus conditions, and that the model time series for each should be normalised to have a peak amplitude of 1. The latter allows us to convert estimated b-weights to estimates of % signal change later in the analysis chain.

Then we specify the timing file for each condition, and specify what model we want to use for the hemodynamic response. Here we have chosen the simplest option, a simple Gamma variate function (‘GAM’). Other options are:

'GAM' = 0 parameter gamma variate 'GAM(p,q)' = 2 parameter gamma variate 'SPMG' = 2 parameter SPM gamma variate + derivative 'POLY(b,c,n)' = n parameter polynomial expansion 'SIN(b,c,n)' = n parameter sine series expansion 'TENT(b,c,n)' = n parameter tent function expansion 'BLOCK(d,p)' = 1 parameter block stimulus of duration 'd' 'EXPR(b,c) exp1 ... expn' = n parameter; arbitrary expressions

Each of these models has advantages and disadvantages, that we won’t go into here, but basically they represent different degrees of tradeoff between a well-specified, but constrained model with few estimated parameters versus a poorly specified, but flexible model with many estimated parameters. The former (exemplified by the simple ‘GAM’ model) affords more statistical power, but cannot properly model different shapes or latencies of hemodynamic response. More details of each model type are given at:

http://afni.nimh.nih.gov/afni/doc/misc/3dDeconvolveSummer2004

Now we can specify contrasts between different experimental conditions. In this instance, there are only a few possibilities:

**-num_glt 2 \**

**-glt_label 1 neg_pos_diff \**

**-gltsym 'SYM: +negative
-positive' \**

This will calculate the linear contrast negative-positive, giving us the difference in estimated response to the two conditions. It is not necessary to specify the reverse contrast, since all contrasts in AFNI are purely mathematical i.e. they do not mask out negative values, unlike other software.

**-glt_label 2 neg_pos_ave \**

**-gltsym 'SYM: +0.5*negative
+0.5*positive' \**

This will calculate the mean estimated response to the two conditions.

Finally, we need to specify the type of output we want:

**-fitts fit_ts –errts error_ts
\**

**-xjpeg glm_matrix.jpg –tout
–fout –bucket glm_out**

(note the absence of a backslash on the last line, since it’s the last line of the command)

The first of these lines specifies that we want the fitted time series and error time series to be output for each voxel, in the form of two 3D+time AFNI datasets. These can be useful for diagnostic purposes and for calculating signal to noise ratios.

The second line specifies that we want a graphical representation of the GLM design matrix output to a JPEG file called glm_matrix.jpg, and that we want all the estimated b-coefficients, fit statistics, and contrast estimates output to an AFNI bucket file called glm_out+orig.

Putting this all together, we have:

**3dDeconvolve –input run_1+orig run_2+orig -nfirst 0 –polort
2 \**

**-num_stimts 2 –basis_normall 1 \**

**-stim_times 1 pos_onsets ‘GAM’ \**

**-stim_label 1 positive \**

**-stim_times 2 neg_onsets ‘GAM’ \**

**-stim_label 2 negative \**

**-num_glt 2 \**

**-glt_label 1 neg_pos_diff \**

**-gltsym 'SYM: +negative -positive'**

**-glt_label 2 neg_pos_ave \**

**-gltsym 'SYM: +0.5*negative +0.5*positive' \**

**-fitts fit_ts –errts error_ts \**

**-xjpeg glm_matrix.jpg –tout –fout -bucket glm_out**

When we run this, AFNI will churn away for some minutes, reporting progress as it goes. When it is finished, you should use AFNI to view the statistical maps overlayed on the original EPI data or on the anatomical images. You should assess the goodness of fit of this model just as you would for a linear regression or GLM analysis with any other type of data. The beta weights and associated t-tests give you an indication of the functional activation in response to each experimental condition for this subject.

The process of applying GLM to individual subject data is far more flexible, and potentially far more complex than can be presented here. For a more complete treatment, see:

http://afni.nimh.nih.gov/pub/dist/edu/latest/afni_3dDeconvolve/regression.htm

Before combining the results from each subject into a group-level analysis, it is useful to convert each subject’s parameter estimates to a common scale. This way, differences between subjects in the overall scaling of fMRI data (which are meaningless because fMRI is an arbitrarily scaled, purely relative measure) are removed. There is no one correct way to scale each subject’s data, but the method most commonly used (and which seems to give the best combination of sensitivity and reliability) is percent signal change. This is simply calculated as the estimated signal change for a given experimental condition (or contrast between conditions) expressed as a percentage of the baseline signal.

**NOTE**: the following discussion and method is only
appropriate when using the ‘GAM’ model, or similar
*single-parameter* models. For other, multi-parameter models,
percent signal change should be calculated on the basis of the
estimated response functions, which are output from 3dDeconvolve
using the –iresp option.

Now it becomes apparent why we specified the –normall option in the 3dDeconvolve command, above. By doing this, our condition regressors were normalised to a peak amplitude of 1. This means that the parameter estimates for a given condition will be estimates of the amplitude of the BOLD response for that condition. Thus we simply use 3dcalc to perform the arithmetic on the appropriate sub-bricks of the output glm bucket dataset: First we need to determine which sub-bricks of the statistical bucket dataset contains the beta coefficients and contrasts that we want to scale. We use 3dinfo for this:

**3dinfo glm_out+orig **

this command will give us information about the contents of the statistical bucket file, including lines like the following:

-- At sub-brick #0 'Run #1 P_0 Coef' datum type is short: 0 to 32767 [internal] [* 0.712808] 0 to 23356.6 [scaled] -- At sub-brick #6 'Run #2 P_0 Coef' datum type is short: 0 to 32767 [internal] [* 0.722834] 0 to 24671.6 [scaled] -- At sub-brick #12 'positive[0] Coef' datum type is short: -26364 to 21149 [internal] [* 0.000433336] -11.4245 to 9.16462 [scaled] -- At sub-brick #16 'negative[0] Coef' datum type is short: -26364 to 21149 [internal] [* 0.000433336] -11.4245 to 9.16462 [scaled] -- At sub-brick #20 'neg_pos_diff LC[0] Coef' datum type is short: -18586 to 29350 [internal] [* 0.00044168] -8.20907 to 12.9633 [scaled] -- At sub-brick #24 'Neg_pos_ave LC[0] Coef' datum type is short: -26364 to 21149 [internal] [* 0.000433336] -11.4245 to 9.16462 [scaled]

This tells us that sub-bricks 0 and 6 contain the baseline estimates (P_0 denotes the parameter for the 0-order baseline regressor, which is a constant equal to the estimated baseline) for each run, and sub-bricks 12, 16, 20, and 24 contain the beta coefficients and estimated linear contrasts. Thus:

**3dcalc –prefix glm_estimates_perc –fscale \**

–**a glm_out+orig[0] –b glm_out+orig[6] \**

–**c glm_out+orig[12,16,20,24] \**

–**expr “step(mean(a,b)-500)*100*c/mean(a,b)”**

The –fscale option forces 3dalc to internally use the full range of possible integer values to store the results – this avoids rounding and truncation errors, but is otherwise invisible to the user.

The step(mean(a+b)-500) term here returns 1 if the mean of the two baselines in greater than 500, and 0 otherwise. This is used to mask out all voxels with a mean baseline signal below a certain level (500 is usually appropriate for our lab’s internal files when reconstructed with autoscaling as described above, but you should check by overlaying the estimated baseline sub-bricks in the glm_out file on the subject’s anatomical image and setting the threshold to 500). It also ensures that we don’t get ridiculously high percent signal change values for voxels outside the brain with very low baseline signal, but possibly high parameter estimates due to noise.

This technique will produce a new bucket dataset with only the
percent signal change parameter and contrast estimates – we have
dropped all the t-statistics, F-statistics, R^{2} estimates
etc., which is fine because none of these are used in the higher
level (group) analysis. However, two things that are worth including
at this stage are the mean baseline estimate, and an estimate of the
signal to noise ratio (SNR). The former we can easily get by
averaging the baseline estimates from the different runs using
3dcalc:

**3dcalc –prefix glm_baseline –fscale \**

–**a glm_out+orig[0] –b glm_out+orig[6] \**

–**expr “step(mean(a,b)-500)*mean(a,b)”**

To estimate the SNR, we need to use both this mean estimate, as well as the residual time series that was saved with the –errts option in the 3dDeconvolve command. The residual time series is our best estimate of the noise in the data. To calculate the SNR, we need to divide the mean baseline estimate (the signal) by the standard devaition of the residual time series (the noise). To calculate the latter:

**3dTstat –prefix err_ts_stdev –stdev –datum short
err_ts+orig**

Hence to calculate the SNR:

**3dcalc –prefix snr –fscale \**

**-a glm_baseline+orig –b err_ts_stdev+orig \**

**-expr “a/b”**

Now that we have both the mean baseline and the SNR, we want to put these into the same bucket dataset that contains the percent signal change parameter and contrast estimates. To do this, we can use the 3dbucket command:

**3dbucket –session . –prefix glm_out_perc glm_baseline+orig
snr+orig glm_estimates_perc+orig**

We now have the basic dataset that we will want to use for second level, group analysis.

At this stage, if satisfied with the individual subject results, we would transform the statistical images from each subject to a standard space (so that they are aligned) and perform a second level (i.e. group) analysis. This involves two steps. First, we need to calculate the transformation from the individual subject brain space to the standard space, which we do based on the high resolution anatomicals, since they have the best anatomical detail. Then we apply the same spatial transform to our functional statistical data, thus bringing them into the standard space. To perform both these steps, we can choose from two options:

**8.1. Manual Talairach transformation technique (traditional)**

In this technique, which is the traditional one in AFNI, we manually mark each subject's high resolution anatomical image and then have AFNI use those markers to transform the data into Talairach Space. First we need to landmark and transform a high resolution anatomical image of each subject's brain, using interactive landmarking in AFNI, which will produce a new file with +tlrc in place of +orig in the filename. This step is discussed here:

http://afni.nimh.nih.gov/pub/dist/edu/latest/afni08_talairach/afni08_talairach.htm

We then normalise the statistical map using the adwarp program.

**adwarp -apar T1High+tlrc -dpar glm_out_perc+orig -prefix
glm_out_perc -dxyz 2 –verbose**

In this example we use the (already marked and Talairach transformed) high resolution T1-weighted anatomical image as the basis for normalizing the statistical data into Talairach space. After normalization we interpolate and resample the data to a 2mm cubic voxel size. The default interpolation method, linear interpolation, was used in this example. This interpolation method is appropriate for beta coefficients and contrast estimates, but possibly not for parameters such as t-values and F-values. Since typically beta coefficients or contrast estimates will be the only parameters carried into the next analysis step, this is a reasonable approach.

**8.2. Automatic transformation technique (new)**

In this newer technique, AFNI can automatically calculate the transform of each subject's anatomy to a template anatomy of our choosing. A number of templates are included in the AFNI distribution, including:

TT_N27+tlrc: This is the “Colin” brain – one subject scanned 27 times and averaged

TT_icbm452+tlrc: This is the International Consortium for Brain Mapping template, created by averaging 452 normal brains

TT_avg152T1+tlrc: The Montreal Neurological Institute template, creating by averaging 152 normal brains

It might also be desirable to use a different template, for example a custom made template if examining a childhood population.

**IMPORTANT NOTE: Please read
the ****information
here** **(slides 24 on) for a description of the templates and
differences between automatic and manual transformation techniques.
It is very important that you understand the differences in
coordinates that arise from using different templates, and that you
use only a single template for your study. It is also worth noting
that the manual technique might give better alignment across subjects
for structures close to the anterior and posterior commisures, such
as the amygdala, striatum, etc. **

For automatic transformation, the first step is to calculate and apply the transform from individual subject space to template space based on the subject's high resolution anatomical:

**@auto_tlrc -base TT_icbm452+tlrc
-suffix _icbm452 -input T1High+orig**

This step should strip the skull off the anatomical image, and then calculate the best spatial transform to warp the individual's anatomical data into the same space as the template image (in this case the icbm452 template). A few things can go wrong at this stage, so you should open up AFNI and display your transformed anatomical overlayed on the icbm452 template to make sure the alignment looks okay. Also look at the transformed anatomical as an underlay to make sure that the skull stripping has done a decent job (e.g. hasn't cut off large chunks of brain). If the stripping wasn't good, it will be necessary to manually skull strip your anatomical, either using AFNI programs or perhaps the BET program from the FSL package.

Assuming all went well with transforming the anatomical, you can now apply the same transformation to your functional data, re-sampling the data to 2 mm cubic voxels at the same time:

**@auto_tlrc -apar
T1High_icbm452+tlrc -input glm_out_perc+orig -suffix _icbm452 -dxyz 2**

As mentioned above, spatial smoothing can be applied to the statistical maps rather than to raw data. The main reason for spatially smoothing at this stage rather than earlier on, is that it is typically far less computationally expensive, since there are not multiple time points in the statistical map. In addition, because the voxels have been interpolated and up-sampled (e.g. to 2mm cubic voxels) the result of smoothing will be also be smoother and the resulting statistical map will have a more Gaussian spatial distribution (which can have advantages when correcting for multiple comparisons at a later stage).

However, it must be noted that in smoothing beta coefficients and contrast estimates we are smoothing meta-data; i.e. we are smoothing estimates of effects rather than actual fMRI data. This should be a reasonable approach, since such effect estimates are clearly and directly interpretable as estimates of fMRI signal values.

We blur the individual normalized statistical maps using 3dmerge:

**3dmerge -1blur_fwhm 5 -doall -session . -prefix glm_out_5mm
glm_out_perc+tlrc**

This will apply a Gaussian blur with a full width at half maximum (FWHM) of 5mm to the statistical map.

The final stage of analysis is to perform a group level analysis. In many cases you might have between-subjects factors to include in your analysis. In a first example here, we will deal with a simple completely within-subjects design, in which the question is which areas of the brain become activated in response to the negative stimuli. First we need to determine which sub-brick of the statistical bucket dataset contains the percent signal change estimates for the negative condition. We can use 3dinfo for this, or we can simply recall that our dataset contains the mean baseline in the first sub-brick (sub-brick 0), the SNR in the second sub_brick (sub-brick 1), and then our percent signal change estimates of the 2 parameter estimates and 2 contrasts of interest in sub-bricks 2 to 5. Thus sub-brick 3 should correspond to the percent signal change estimated response to the negative stimuli.:

To perform the analysis we would use a 1-sample t-test as follows:

**3dttest –session
. –prefix ttest –base1 0 –set2 \**

**subj1/glm_out_5mm+tlrc[3]
\**

**subj2/glm_out_5mm+tlrc[3]
\**

**subj3/glm_out_5mm+tlrc[3]
\**

**subj4/glm_out_5mm+tlrc[3]
\**

**subj5/glm_out_5mm+tlrc[3]
\**

**subj6/glm_out_5mm+tlrc[3]
\**

**subj7/glm_out_5mm+tlrc[3]
\**

**subj8/glm_out_5mm+tlrc[3]
\**

**subj9/glm_out_5mm+tlrc[3]
\**

**subj10/glm_out_5mm+tlrc[3]
\**

**subj11/glm_out_5mm+tlrc[3]
\**

**subj12/glm_out_5mm+tlrc[3]
\**

**subj13/glm_out_5mm+tlrc[3]
\**

**subj14/glm_out_5mm+tlrc[3]
\**

**subj15/glm_out_5mm+tlrc[3]**

This tests whether the percent signal change stimated reponse for the negative condition (sub-brick 1) is significantly different from 0 (the –base1 0 term).

A slightly different example concerns when there are two groups for which we want to compare functional activation in response to the task. In this case we would perform a two-group t-test. Let’s say we now wanted to compare the contrast between positive and negative images, which would be sub-brick 4:

**3dttest –session
. –prefix ttest \**

–**set1 \**

**subj1/glm_out_5mm+tlrc[2]
\**

**subj2/glm_out_5mm+tlrc[2]
\**

**subj3/glm_out_5mm+tlrc[2]
\**

**subj4/glm_out_5mm+tlrc[2]
\**

**subj5/glm_out_5mm+tlrc[2]
\**

**subj6/glm_out_5mm+tlrc[2]
\**

**subj7/glm_out_5mm+tlrc[2]
\**

**subj8/glm_out_5mm+tlrc[2]
\**

–**set2 \**

**subj9/glm_out_5mm+tlrc[2]
\**

**subj10/glm_out_5mm+tlrc[2]
\**

**subj11/glm_out_5mm+tlrc[2]
\**

**subj12/glm_out_5mm+tlrc[2]
\**

**subj13/glm_out_5mm+tlrc[2]
\**

**subj14/glm_out_5mm+tlrc[2]
\**

**subj15/glm_out_5mm+tlrc[2]
\**

**subj16/glm_out_5mm+tlrc[2]**

In the case that we had a single group that was measured twice; say, before and after a treatment, we would perform a paired t-test by adding the –paired option:

**3dttest –session
. –paired –prefix ttest \**

–**set1 \**

**subj1/glm_out_5mm+tlrc[4]
\**

**subj2/glm_out_5mm+tlrc[4]
\**

**subj3/glm_out_5mm+tlrc[4]
\**

**subj4/glm_out_5mm+tlrc[4]
\**

**subj5/glm_out_5mm+tlrc[4]
\**

**subj6/glm_out_5mm+tlrc[4]
\**

**subj7/glm_out_5mm+tlrc[4]
\**

**subj8/glm_out_5mm+tlrc[4]
\**

–**set2 \**

**subj9/glm_out_5mm+tlrc[4]
\**

**subj10/glm_out_5mm+tlrc[4]
\**

**subj11/glm_out_5mm+tlrc[4]
\**

**subj12/glm_out_5mm+tlrc[4]
\**

**subj13/glm_out_5mm+tlrc[4]
\**

**subj14/glm_out_5mm+tlrc[4]
\**

**subj15/glm_out_5mm+tlrc[4]
\**

**subj16/glm_out_5mm+tlrc[4]**

At this stage you have completed a basic voxelwise statistical analysis of fMRI data.

Many experimental designs will incorporate either more than two experimental conditions or more than two experimental groups, or both. In this case, analysis of variance (ANOVA) is appropriate. Consider an example where we performed an experiment in which subjects saw pictures of either happy faces, fearful faces, or neutral faces. Consider that we want to measure which areas of the brain show differential activation to the different types of faces. In addition, let us assume that there were two experimental groups – say a patient group and a control group. In this experiment we could use the 3dANOVA3 command to perform a 3 factor ANOVA.

For this design, the within-subject condition (face expression) is a fixed factor, group is a fixed factor, and subjects should be considered a random factor nested within group (since we want to be able to generalise beyond this particular set of subjects to the general population from which they are “randomly” sampled). So we need to perform a 3-way mixed effects ANOVA. The command for this is 3dANOVA3.

Let us assume that the beta coefficients from a 3 condition (happy, fearful, neutral) GLM performed on individual subject data are contained in sub-bricks 4, 5 and 6 respectively of the GLM output bucket files. Then the command below could be used:

**3dANOVA3 –type 5
–alevels 2 –blevels 3 –clevels 8 \**

Here we specify some general aspects of the design. The –type option specifies which type of design. Options are:

1 A,B,C fixed; AxBxC

2 A,B,C random; AxBxC

3 A fixed; B,C random; AxBxC

4 A,B fixed; C random; AxBxC

5 A,B fixed; C random; AxB,BxC,C(A)

We also specify that the first factor (group) has 2 levels, the second factor (face) has 3 levels, and the final factor (subject) has 8 levels.

Next, we specify the datasets to analyse, including their respective factor levels:

**-dset 1 1 1
control1/glm_out_5mm+tlrc[4] \**

**-dset 1 1 2
control2/glm_out_5mm+tlrc[4] \**

**-dset 1 1 3
control3/glm_out_5mm+tlrc[4] \**

**-dset 1 1 4
control4/glm_out_5mm+tlrc[4] \**

**-dset 1 1 5
control5/glm_out_5mm+tlrc[4] \**

**-dset 1 1 6
control6/glm_out_5mm+tlrc[4] \**

**-dset 1 1 7
control7/glm_out_5mm+tlrc[4] \**

**-dset 1 1 8
control8/glm_out_5mm+tlrc[4] \**

**-dset 1 2 1
control1/glm_out_5mm+tlrc[5] \**

**-dset 1 2 2
control2/glm_out_5mm+tlrc[5] \**

**-dset 1 2 3
control3/glm_out_5mm+tlrc[5] \**

**-dset 1 2 4
control4/glm_out_5mm+tlrc[5] \**

**-dset 1 2 5
control5/glm_out_5mm+tlrc[5] \**

**-dset 1 2 6
control6/glm_out_5mm+tlrc[5] \**

**-dset 1 2 7
control7/glm_out_5mm+tlrc[5] \**

**-dset 1 2 8
control8/glm_out_5mm+tlrc[5] \**

**-dset 1 3 1
control1/glm_out_5mm+tlrc[6] \**

**-dset 1 3 2
control2/glm_out_5mm+tlrc[6] \**

**-dset 1 3 3
control3/glm_out_5mm+tlrc[6] \**

**-dset 1 3 4
control4/glm_out_5mm+tlrc[6] \**

**-dset 1 3 5
control5/glm_out_5mm+tlrc[6] \**

**-dset 1 3 6
control6/glm_out_5mm+tlrc[6] \**

**-dset 1 3 7
control7/glm_out_5mm+tlrc[6] \**

**-dset 1 3 8
control8/glm_out_5mm+tlrc[6] \**

**-dset 2 1 1
patient1/glm_out_5mm+tlrc[4] \**

**-dset 2 1 2
patient2/glm_out_5mm+tlrc[4] \**

**-dset 2 1 3
patient3/glm_out_5mm+tlrc[4] \**

**-dset 2 1 4
patient4/glm_out_5mm+tlrc[4] \**

**-dset 2 1 5
patient5/glm_out_5mm+tlrc[4] \**

**-dset 2 1 6
patient6/glm_out_5mm+tlrc[4] \**

**-dset 2 1 7
patient7/glm_out_5mm+tlrc[4] \**

**-dset 2 1 8
patient8/glm_out_5mm+tlrc[4] \**

**-dset 2 2 1
patient1/glm_out_5mm+tlrc[5] \**

**-dset 2 2 2
patient2/glm_out_5mm+tlrc[5] \**

**-dset 2 2 3
patient3/glm_out_5mm+tlrc[5] \**

**-dset 2 2 4
patient4/glm_out_5mm+tlrc[5] \**

**-dset 2 2 5
patient5/glm_out_5mm+tlrc[5] \**

**-dset 2 2 6
patient6/glm_out_5mm+tlrc[5] \**

**-dset 2 2 7
patient7/glm_out_5mm+tlrc[5] \**

**-dset 2 2 8
patient8/glm_out_5mm+tlrc[5] \**

**-dset 2 3 1
patient1/glm_out_5mm+tlrc[6] \**

**-dset 2 3 2
patient2/glm_out_5mm+tlrc[6] \**

**-dset 2 3 3
patient3/glm_out_5mm+tlrc[6] \**

**-dset 2 3 4
patient4/glm_out_5mm+tlrc[6] \**

**-dset 2 3 5
patient5/glm_out_5mm+tlrc[6] \**

**-dset 2 3 6
patient6/glm_out_5mm+tlrc[6] \**

**-dset 2 3 7
patient7/glm_out_5mm+tlrc[6] \**

**-dset 2 3 8
patient8/glm_out_5mm+tlrc[6] \**

We now specify the output that we want:

**-fa group –fb
face –fab groupByFace \**

This line specifies that we want the F-test for the group main effect, the F-test for the face main effect, and the F-test for the group by face interaction.

Next we can look at specific contrasts:

**-acontr 1 0
controls –acontr 0 1 patients \**

**-bcontr 1 0 0 happy
–bcontr 0 1 0 fearful –bcontr 0 0 1 neutral \**

These two lines specify that we want the factor level means and statistical tests of whether those means are significantly different from 0.

**-acontr 1 –1
contr_pat_diff \**

**-bcontr 1 –1 0
happ_fear_diff –bcontr 1 0 –1 happ_neut_diff –bcontr 0 1 –1
fear_neut_diff –bcontr 0.5 0.5 –1 emo_neut_diff \**

These lines test specific contrasts for each factor, averaged across the levels of the other factors.

**-aBcontr 1 –1 : 1
cont_pat_diff_happ –aBcontr 1 –1 : 2 cont_pat_diff_fear –aBcontr
1 –1 : 3 cont_pat_diff_neut \**

**-Abcontr 1 : 1 –1
0 happ_fear_diff_cont –Abcontr 2 : 1 –1 0 happ_fear_diff_pat \**

These two lines test contrasts for one factor calculated within a specific level of the other factor.

There are abviously
many more contrasts that could be specified than the ones we have
here. Bear in mind that you should really only be looking at these
contrasts if i) you have an apriori hypothesis about a specific
contrast, ii) the main effect F-test for a given factor is
significant and you want to know which factor level differences are
driving the main effect, or ii) the interaction of two factors is
significant and you need to know what differences are driving the
interaction. *Don’t fall victim to a fishing expedition in which
you test every single possible contrast, and possibly wind up with a
catch of junk. If you must do exploratory analyses, then you should
guard against Type I error by adopting a suitably more stringent
threshold.*

**-bucket anova **

Finally, we specify that all the results should be saved in a statistical bucket dataset called anova+tlrc.

At this stage you have completed a basic statistical analysis of fMRI data.

It is increasingly common for researchers to want to correlate one or more behavioural or physiological measures with a particular brain activation contrast across a group of subjects. For example, we might be interested in how negative-neutral BOLD contrast correlates with individual differences in some measure of anxiety (e.g. measures on an anxiety self report scale, or mean negative-neutral difference in the amplitude eyeblink startle). In this case, regression is an appropriate analysis technique.

Let us assume that the negative-neutral contrast from a GLM performed on individual subject data is stored in sub-brick 4 of the GLM output bucket files, which are stored in a sub-directory as follows:

subjectN/-xydata 11 subject001/glm_out_5mm+tlrc

where N is the subject number. Let's additional assume that we have an anxiety rating for each subject as follows:

subject anxiety 1 11 2 8 3 1 4 3 5 6 6 4 7 15 8 11

Then the command below could be used to run a voxelwise regression, using the anxiety ratings as predictors of the BOLD contrast:

**3dRegAna –rows 8 -cols 1**

This first line specifies that we would like to perform a regression with 8 datasets (in this case subjects) and 1 predictor variable (in this case anxiety score)

**-xydata 11 subject001/glm_out_5mm+tlrc[4]**

**-xydata 8 subject002/glm_out_5mm+tlrc[4]**

**-xydata 1 subject003/glm_out_5mm+tlrc[4]**

**-xydata 3 subject004/glm_out_5mm+tlrc[4]**

**-xydata 6 subject005glm_out_5mm+tlrc[4]**

**-xydata 4 subject006/glm_out_5mm+tlrc[4]**

**-xydata 15 subject007/glm_out_5mm+tlrc[4]**

**-xydata 11 subject008/glm_out_5mm+tlrc[4]**

These lines specify the respective predictors (anxiety scores in this case) and corresponding “dependent” measures (BOLD contrast maps in this case). I enclose “dependent” in quotes here because although the BOLD contrast is treated as a dependent measure in the analysis, the analysis is correlational, so unless the predictor variables represent a directly manipulated quantity (usually not the case), we cannot interpret the results in causal terms.

Now we need to specify a few more details about the model:

**-model 1:0**

This sub-command often gets people confused. The purpose of this
option is to specify the reduced and full models. Basically you can
think of the full model as including all predictors of interest as
well as predictors of no interest, whereas the reduced model only
includes the predictors of no interest. In the current example we
have 1 predictor of interest (anxiety score) and 1 predictor of no
interest (the regression constant term, which we did not specify
since it's included by default). Predictor variables are numbered
sequentially from 0 (constant), 1 (first non-constant predictor), 2
(second non-constant predictor) etc. So in this example, we specify
-model 1:0 which means the reduced model is composed of the 0^{th}
predictor (the constant), and the full model is composed of
everything in the reduced model, plus the 1^{st} non-constant
predictor (the anxiety score). Further down the page there is an
example of a multiple regression with non-constant predictors of
interest and no interest, which you can look at to see how the model
option is specified for such cases.

**-rmsmin 0**

This command can be used to accept the constant-only model for voxels with a root mean square (RMS) value of less than the specified value, thus speeding up execution. If set to zero, the program will attempt to fit a full regression model to every voxel, which in many cases is the best option, since it's difficult to know beforehand what a reasonable RMS cutoff is, especially when analysing percent signal change contrast maps.

**-bucket 0 anxiety_regression_out**

This specifies that we want the default output saved in the AFNI bucket dataset anxiety_regression_out+tlrc.HEAD/BRIK. This dataset will contain sub-bricks with the predictor parameter estimates (coefficients), associated t-tests, and an F-test of the full (versus reduced) model.

NOTE: Do not make the mistake of thinking that the first
coefficient and t-test pair refer to the 1^{st} non-constant
predictor. They do not. They refer to the constant term in the model
(often, but not always, of no interest).

Let's look at an example in which we have another predictor of interest, say a rating of depression, as well as a predictor of no interest, say the age of the subjects. We would set up the model as follows:

**3dRegAna –rows 8 -cols 3**

This first line specifies that we would like to perform a regression with 8 datasets (in this case subjects) and 3 predictor variables (anxiety, depression, age)

**-xydata 11 5 32 subject001/glm_out_5mm+tlrc[4]**

**-xydata 8 8 21 subject002/glm_out_5mm+tlrc[4]**

**-xydata 1 4 43 subject003/glm_out_5mm+tlrc[4] **

.... and so on for the rest of the subjects

**-model 1 2:0 3**

Here we specify that the reduced model consists of the constant
term and the 3^{rd} non-constant predictor (age), while the
full model additionally includes the axiety and depression scores
(the 1^{st} and 2nd non-constant predictors).

The rest of the command would be the same as before. Now in the output, the model F-test refers to how significantly anxiety and depression predict BOLD response, taking into account (i.e. covarying for) age. By including age in the model, we hopefully explain some variance that would otherwise be left unexplained and end up in the error term. Thus we can improve our sensitivity by including appropriate covariates. Note also though, that if age covaries with either of our predictors of interest, then we might reduce our sensitivity, a consequence of predictor collinearity.

At this stage you have completed a basic statistical analysis of fMRI data.

One major
consideration when conducting voxelwise statistical tests is the
sheer number of tests that are being performed. In such cases, Type I
error inflation is a problem. A common approach to correct for this
is to use a clustering method that tries to determine how big a
contiguous cluster of voxels, each one significant at an uncorrected
threshold of P_{u}, has to be to in order to be significant
at a threshold P_{c} – that is corrected for multiple
comparisons. To determine how big such a cluster needs to be, there
are 3 common methods:

1.
__Guassian Random Field Theory__: Uses the mathematical properties
of random fields to derive a corrected cluster size.

2.
__Permutation tests__: Uses the data itself to create a null
distribution of cluster sizes, from which the cluster size
corresopnding to a desired corrected P can be read off.

3.
__Monte Carlo simulation__: Creates multiple simulated null
datasets, and from them creates a distribution of cluster sizes, from
which the cluster size corresponding to a desired corrected P can be
read off.

We could go into a lot of detail about the relative merits of each method here, but that would be somewhat irrelevant for AFNI analysis, since the Monte Carlo method is the only one currently implemented in AFNI (other techniques might become available in the near future).

In order to carry out
the Monte Carlo simulation, we need some basic information about the
data itself, as well as the uncorrected P_{u} that we plan on
using for creating clusters, and a few other parameters that control
the clustering operation.

First we need an estimate of how spatially smooth the noise in the dataset is. To estimate this, we use the program 3dFWHM:

**3dFWHM –dset
err_ts+orig[0] –mask ‘3dcalc( –a glm_out+orig[0] –expr
step(a-500))’**

Here we have used the residual time series from an individual subject GLM to estimate the spatial smoothness. We have masked the time series with the thresholded baseline estimate so as to exclude voxels from outside the brain. This command will return output like the following:

Gaussian filter widths: sigmax = 1.72 FWHMx = 4.05 sigmay = 1.56 FWHMy = 3.67 sigmaz = 1.70 FWHMz = 4.00

So for this dataset, our estimate would be that the full-width-at-half-maximum (FWHM) estimate of spatial smoothness is approximately 4mm. We could then do this for all our subjects and average the results. However, if we used spatial blurring on our statistical percent signal change estimates of more that 4mm, we should use the FWHM of the spatial blur that we applied. In the example above, we used a spatial blur of 5mm, so our estimate of spatial smoothness that we use in Monte Carlo simulations should be 5mm.

The next step is to
decide upon the uncorrected P_{u} that we plan on using for
creating clusters. This choice is fairly arbitrary. If we choose a
fairly liberal voxelwise P_{u}, say P_{u }= 0.01, we
will be able to detect large clusters of voxels that are activated at
a fairly liberal threshold, but we will miss small clusters that are
made up of highly significantly activated voxels. If, on the other
hand, we use a conservative P_{u}, say P_{u }=
0.0001, we will be able to detect small clusters of highly activated
voxels, but not larger clusters of less activated voxels. The final
choice on a P_{u }depends on the size of clusters that we are
looking for, as well as considerations of what our overall
statistical power is likely to be.

Let’s say for the
current example that we choose a P_{u }= 0.01.

The other parameter that we need to determine is what we want our cluster connection radius to be. This specifies how close two voxels need to be to one another in order to be considered contiguous. Usually we would use the voxel size of the datasets being analysed, in this case 2mm.

So now we can run the simulation:

**AlphaSim –mask
glm_out_perc+tlrc[0] –fwhm 5 –rmm 2 –pthr 0.01 –iter 1000**

The –iter 1000 specifies that we want to run 1000 simulations. This is usually enough. The program will take quite a few minutes to run, and then produce output like the following:

Cl Size Frequency Cum Prop p/Voxel Max Freq Alpha 1 229840 0.328718 0.01012002 0 1.000000 2 108491 0.483882 0.00947785 0 1.000000 3 66742 0.579336 0.00887161 0 1.000000 4 51187 0.652544 0.00831218 0 1.000000 5 39186 0.708588 0.00774011 0 1.000000

and so on. What you now need to do is read down the right-most
column, which gives the corrected P_{c} for each minimum
cluster size (left hand column).

```
84 4 0.999920 0.00001630 4 0.058000
85 5 0.999927 0.00001536 5 0.054000
86 1 0.999929 0.00001417 1 0.049000
87 5 0.999936 0.00001393 4 0.048000
```

So from this we can see that according to Monte Carlo simulations,
thresholding the statistical images with an uncorrected P_{u}
= 0.01, and clustering with a cluster connection radius of 2mm, the
resulting clusters need to be at least 86 voxels in size in order to
achieve a corrected P_{c} < 0.05. We can now use this
information to threshold and cluster our group analysis statistical
maps, which is explained in the next section.

We can now use the information from Monte Carlo simulations to threshold and cluster our group analysis statistical maps, so as to get thresholded statistical images corrected for multiple comparisons. This we must do separately for each contrast or F-test that we are interested in. For the preceding example, imagine that we are interested in the F-test for the interaction of group by face. First we need to find out where this information is stored in the anova+tlrc bucket file:

**3dinfo anova+tlrc**

This produces output such as:

-- At sub-brick #0 'group:Inten' datum type is short: 0 to 32767 [internal] [* 0.000644032] 0 to 21.103 [scaled] -- At sub-brick #1 'group:F-stat' datum type is short: 0 to 2562 [internal] [* 0.01] 0 to 25.62 [scaled] statcode = fift; statpar = 1 14 -- At sub-brick #2 'face:Inten' datum type is short: 0 to 32767 [internal] [* 0.000571107] 0 to 18.7135 [scaled] -- At sub-brick #3 'face:F-stat' datum type is short: 0 to 3689 [internal] statcode = fift; statpar = 2 28 -- At sub-brick #4 'groupByFace:Inten' datum type is short: 0 to 32767 [internal] [* 0.000571007] 0 to 19.7125 [scaled] -- At sub-brick #5 'groupByFace:F-stat' datum type is short: 0 to 3489 [internal] statcode = fift; statpar = 2 28

This tells us that the sub-brick we want to threshold with is sub-brick 5. We can also keep the intensity sub-brick for the interaction, although this would make more sense for a t-test of a specific contrast, since the intensity sub-brick in that case contains the value of the contrast itself (e.g. percent signal change contrast between two conditions).

Before we run the cluster command, we need to determine the
F-statistic that corresponds to a P_{u} of 0.01, for the
degrees of freedom that we have for the interaction F-test. This can
be read off from tables, calculated using Excel or an online
calculator, or read off from the threshold slider in AFNI if viewing
the interaction statistical map. In this case, the correct
F-statistic is 5.45.

Now we can use the 3dmerge command to perform the clustering:

**3dmerge –session . –prefix group_by_face_clust –1thresh
5.45 -1clust 2 –86 anova+tlrc[4,5] **

**3dmerge –session . –prefix group_by_face_clust_order
–1thresh 5.45 -1clust_order 2 –86 anova+tlrc[4,5] **

The first command will identify clusters of voxels that exceed an
F-value of 5.45, using a cluster connectivity radius of 2mm, and
minimum cluster size of 86 voxels, using sub-brick 5 as the
thresholding brick, saving the thresholded, clustered data in a new
bucket dataset called group_by_face+tlrc. The second command will do
the same thing, except all voxel intensities within a cluster will be
replaced by the cluster size index (largest cluster=1, next
largest=2, ...). This will be useful for extracting data from
specific clusters later on. The reason why we specify the minimum
cluster size as a negative number is that denotes cluster size in
voxels. If we were to put a positive 86 there, that would denote
86mm^{3}.

Some of the other options in the command 3dmerge might be useful. In particular, you can play around with the –1erode and –1dilate options, which are useful in separating clusters that are joined by a narrow “neck”.

We would typically want to carry out the same basic operation on all the other F-tests and contrasts of interest. When this is done, it is possible to combine all the resulting clustered datasets into one big buckets file using the command 3dbucket.

This basically completes the voxelwise group analysis of data.

One remaining step that you might want to take is to extract each
subject’s individual percent signal change parameter estimates from
the cluster regions identified in the preceding step, for entry into
a statistics package or for graphing results. To do this, you can use
the 3dmaskave command. Say, for example, that when looking at the
face by group clusters identified in the preceding step, we were
interested in a cluster in prefrontal cortex, which happens to be the
5^{th} largest cluster in the group by face interaction
dataset. Thus voxels in this cluster would have a value of 5 in the
group_by_face_clust_order+tlrc dataset. We can thus use this to
extract mean voxel percent signal change for individual subjects as
follows:

**3dmaskave –mask group_by_face_clust_order+tlrc –mrange 5 5
control001/glm_out_5mm+tlrc[4]**

This command would print on the screen the mean percent signal change for this cluster for control subject 1 for the happy face condition. We could repeat this command for the other face conditions, as well as for all the other subjects. The mean percent signal change values could then be used to create graphs with error bars etc.

Alternatively, if want to output the mean percent signal change for each cluster, we could use the following command:

**3dROIstats -–mask group_by_face_clust_order+tlrc
control001/glm_out_5mm+tlrc[4]**

This would cycle through each cluster in group_by_face_clust_order+orig and output the mean for each one for control subject 1, face condition happy.

Because these commands simply print their output to the screen, you’ll probably want to use them with the Linux redirection operators > and >> to save the output to a text file. Obviously these commands are probably best used in a script that cycles through subjects and conditions. For example, the following simple Python script will cycle through 6 subjects and extract the cluster means from 4 designated clusters and two conditions.

#!/usr/bin/python # filename: extract_clusters.py # simple script to cycle through 6 subjects and extract the cluster means from 4 designated clusters and two conditions # means are first appended to a temporary text file, with subject number, cluster number, and condition brick # appended to a separate temporary text file. The two files are then combined using 1dcat to give a properly indexed # file which could be imported into a stats package or spreadsheet program import os,sys # set up study-specific directories and file names top_dir = '/study/my_study/analysis/' cluster_mask_file = top_dir+'group_by_face_clust_order+tlrc' subject_stats_filename = 'glm_out_5mm+tlrc' output_file = top_dir+'clust_means.txt' # specify the subjects, clusters and bricks that you want to extract subjects = ['001','002','004','008','009','010'] # these are the subjects (should be same as subject directory names) clusters = [2,5,6,8] # these are the cluster numbers that we want to extract condition_bricks = [4,6] # these are the bricks in the individual subject data files that contain the conditions of interest # check to see for existence of temporary files if os.isfile('tmp.txt') or os.isfile('tmp2.txt'): print 'You first need to delete or rename the file(s) tmp.txt and tmp2.txt' sys.exit() # check to see for existence of output file if os.isfile(output_file): print 'You first need to delete or rename the output file '+output_file sys.exit() # now loop through subjects, clusters and bricks to get data for subject in subjects: for cluster in clusters: for brick in condition_bricks: subject_dir = top_dir+subject+'/' data_file = subject_dir+subject_stats_filename+'['+str(brick)+']' command = '3dmaskave –mask '+cluster_mask_file+' -mrange '+str(cluster)+' '+str(cluster)+' '+data_file+' >> tmp.txt' print command os.system(command) command = 'echo '+subject+' '+str(cluster)+' '+str(brick)+' >> tmp2.txt' print command os.system(command) # create the output file with appropriate column headings command = 'echo subject cluster brick contrast > '+output_file print command os.system(command) # now combine the two temporary file and then delete them command = '1dcat tmp2.txt tmp.txt[0] >> '+output_file print command os.system(command) os.remove('tmp.txt') os.remove('tmp2.txt')

Assuming the script is saved as extract_clusters.py, this script can easily be run by typing:

**python extract_clusters.py**

You can learn more about Python scripting here.