preprocess

Purpose:

Scan and identify input data, assign options and out paths based on a template file, and then reconstruct, convert, motion-correct, and fieldmap-correct as necessary.

Usage:

Usage: preprocess [-hvRsbademfcADEMFC] <path–to–data>

Options:

 -h, --help            show this help message and exit
  -v, --verbose         Print stuff to screen.
  -R, --recompute       Recompute all files.
  --epi-key=EPI_KEYS    Keyword and value that must be present in the p-file
                        header if itis to be processed.  For example, suppose
                        epis were collected with for two paradigms during the
                        same session.  If one contained 198 frames and the
                        other 210, specfying the keyword tdim:198 would cause
                        the program to only process EPI runs containing 198
                        framesValid keywords are "tdim" (number of frames),
                        "SeriesNumber", and/or "plane" (axial, sagittal, or
                        coronal). Multiple keywordscan be entered (e.g., "--
                        epi_key=tdim:198 --epi-key=plane:sagittal")
  --hog-disk            Save all intermediate files.
  --base-firstepi       Align EPI runs to reference EPI image instead of first
                        or last timeseries image.
  --dry_run             Analyze data and create output yaml file but done't
                        compute anything.
  --debug-tmp           Write tmp files to /tmp/debug_tmp. For debugging
                        purposes only.
  --skull-strip         Skull strip the anatomical reference image.
  --align-epis          Align EPIs with anatomical reference by
                        concatenatingtransformation computed for motion
                        correction with the transformation from the epis
                        "anat_ref" image to the high-res anatomical.
  -s SKIP, --skip=SKIP  Number of frames to skip at  the beginning of each
                        run.
  -o OUTDIR, --output_dir=OUTDIR
                        Directory where output data are written.  This
                        overides the value in the template file.
  --followon-script=FOLLOWON_SCRIPT
                        Script to be run after preprocess complets.It will be
                        called with the output path as the only argument.
  --no_fmapcor          Skip fieldmpap correction even if a fieldmap exists.
  -a, --anat            Convert structural images.
  -d, --dti             Process dti images.
  -e, --epi             Reconstruct epi images.
  -m, --motion          Motion-correct epi images.
  -f, --fmap            Fieldmap-correct epi images.
  -c, --compute_fmap    Compute fieldmap-correction.
  -A, --Anat            Convert structural images, (recompute).
  -D, --Dti             Process dti images, (recompute).
  -E, --Epi             Reconstruct epi images, (recompute).
  -M, --Motion          Motion-correct epi images, (recompute).
  -F, --Fmap            Fieldmap-correct epi images, (recompute).
  -C, --Compute_fmap    Compute fieldmap-correction, (recompute).
  -V, --version         Display svn version.

Operation

The program begins by looking for a template file as described below. The template file defines options and the output directory structure. The script then examines every file below the directory specified on the command line to determine if it contains data the script knows how to process. A data structure is build during this process the contains the input data, parameters and output filename for each action to be taken. This is stored in the "log" directory in the file "log/preprocess_info.yaml". This file is human-readable and can be used to see exactly what the script did. The program is designed to work with any directory structure, but it assumes that all data below the path you specify belongs to the same subject.

After all files have been scanned, the program sequentially processes each data item. Three log files are created: One contains all commands that are executed (preprocess.bsh), another contains all commands that aborted (preprocess_failed.log), and a third that contains the output of every commmand (preprocess.log).

The default behaviour is to process everything unless the output file already exists. Everything will be recomputed if the -R option is given. Specific types of processing can be done by using the options as listed above. In all cases, the lower-case option will not overwrite existing data, the upper-case option will. The "skip" option on the command line will overide the value in a template file.

Templates

preprocess uses a template file to determine many options and paths. The default template file can be found at the Methods list website. The template file is an ascii file that can be read directly by the software.

Template files are processed hierarchically as follows: First, the default template file (see below) is read and all values are initialized. Then, the directory one level above the subject directory (e.g., if the raw data for the first subject are stored in /study/mystudy/data/sub1, /study/mystudy/data is examined for a study-specific template file. Values in this file are used to overwrite values read from the default file. Finally, the subject-level directory (e.g./study/mystudy/data/sub1) is examined for a subject-specific template and its values will overwrite those in the first two template files. Note that the study-specific and subject_specific template files only need to contain variables that are changed.

There are a few syntax requirements but htey are simple. Here are some basic rules:

  • The code "#!fmri_file_template" must appear at the beginning of the first line of the file.
  • The code "---" is a delimiter used by yaml. Keep them where they are.
  • Indentation is important. Always use the same indentation for a given block. For example, the "anat" block has two items, "outdir" and "format". The parser figures this out because both are indented 4 spaces from the left margin.
  • The colon is used instead of an equal.
  • A space must follow the colon.
  • Brackets denote a list. Each member of the list must be followed by a comma.
  • The &id_*** variables are required by the parser. Each of the lines where it appears (e.g. anat:) defines the beginning of a Python dictionary, and the indented elements after it are the dictionaries members. The name of the dictionary and its "id" must be unique. For example, epis could have names of epi1, epi2 ... and id's of &id001, &id002, ...

Example template files.

  • This version specifies the output directory and the EPI names. The names in the list of names are applied in the order the EPIs were acquired.
  • 
    #!fmri_file_template
    
    top_outdir: /Users/jmo/data1/test_data/processed/fmfmri
    
    epidir_dflt: &id001       # First set of epis.
        type: epi       # Type of data. The only value currently used is "epi"
        acq_order: 0    # Acquisition order. 
        outdir: "epis"  # Directory for first set of epis.
        names: [warm, pain, istroop, pain_cstroop, cstroop, pain_istroop] # EPI names.
        pepolar: 0       # 0 = default phase-encode direction, 1 = reversed.
    ---
    
    
    • In addition to the output directory and the EPI names, this version specifies the output directories and several processing options:
      • Skip three frames at the beginning of each run rather than the default of 5.
      • Send an email to somebody@wisc.edu if an error occurs. This only works on computers running sendmail
      • Save the EPIs in two subdirectories named "gonogo" and "faces" with names "run1" and "run2". Note that "acq_order" defines the order in which subdirectories are filled. In this example, the gonogo data were acquired first (acq_order=0) and the faces data were acquired second (acq_order=1)
      • If fsl_flip is set to True, the EPIs will be reoriented to an axial view and saved in nifti format.
      
      #!fmri_file_template
      
      ---
      top_outdir:  /Users/jmo/data1/test_data/processed/asthma2
      fsl_flip: False
      
      skip: 3                     # Number of frames to skip
      epi_file_format: brik       # Output EPI file format.
      email: somebody@wisc.edu 
      
      gonogo: &id001              # First set of epis.
          type: epi               # Type of data.
          acq_order: 0            # Order in acquisition. 0, 1, 2, ...
          outdir: epis     # Directory for first set of epis.
          names: [run1, run2] # List of names.
      faces: &id002               # Second set of epis.
          type: epi               # Type of data.
          pepolar: 0
          acq_order: 1            # Acquisition order. 0, 1, 2, ...
          outdir: epis      # Output directory
          names: [run3, run4] # List of names.
      ---
      
      

      Where to put them

      The program will begin looking for a template file in the directory specified on the command line, i.e., the directory containing the "anatomicals" and "raw" subdirectories. If none is found it will look in the next directory above it.

      Example:

      If you raw data are stored in /study/mystudy with directories for each subject named "sub001, sub002, ...", you should put a global template file in /study/mystudy. The preprocess script is run ' when the data are uploaded from the scanner, so it will find this directory, and for a typical scan, will correctly process the data. For special cases, say where a session has to be stopped early or when the EPIs are run out of order, a subject specific template file can be put in that subject's directory and then it will be the one found. This makes it possible to use one template for all subjects or to use individual templates for specific subjects.

      Syntax

      This template defines the naming convention to be used by the preprocess script. Edit the fields in the "Value" column to change the file structure. This file follows the "yaml" (yet-another-markup-language) syntax. We chose this method because it is the geekiest sounding format name we could find. Secondary considerations were that it is readable and editable by humans but can be easily read into a Python data structure.

      For this format, indentation is very important. That is how it differentiates between attributes and sub-attributes. There are only a few features of the syntax that we use. They are:

      • The file will only be recognized as a template file if the first line begins with the string "#!fmri_file_template". This isn't yaml syntax - it is detected by the preprocess script.
      • The "– – –" characters are yaml delimiters and should be left alone.
      • The colon (:) after each keyword is a yaml delimiter and is equivalent to an equal sign.
      • The brackets delimit software list structures. They are used to hold lists of filenames. There must be commas between each name in the list.
      • The items such as "faces: &id001" mark the beginnings of a single substructure. It is important to maintain the same indentation for each element of the strucuture. The name of thse substructures ("faces" here) is not used directly as a file name but is used internally by the preprocess script.
      • Comments are the same as in bash scripts - they start with "#" and can be put anywhere.
      • Subject number must be in double quotes. If quotes are not used, the software will interpret any subject id starting with zero in the base 8 (octal) numbering system and yield weird results. The preprocess script will catch this error.
      • Setting the subject tag to "same" will create the EPI subdirectories in the directory specified by "proc". See the example below.

      Default template

      Download this template. and edit it to create your own.


#!fmri_file_template    # The first part of this line MUST be present or
#                         this file won't be recognized as a template.

# Template defining the naming convention to be used by the 
# preprocess script. Edit the fields in the "Value" column to change the file structure.
# This file follows the "yaml" (yet-another-markup-language) syntax. For this format,
# indentation is very important.  That is how it differentiates between attributes and
# sub-attributes.  The only other special syntax in the file below is "---", the colon 
# after each tag-name, the brackets around lists of items separated by commas, and the 
# "&id00n" variables.  These latter variables should be numbered sequentially as below.
#
# Here are some basic syntax rules:
# 1. The code "#!fmri_file_template" must appear at the beginning of the first line of the file.
#
# 2. The code "---" is a delimiter used by yaml. Keep them where they are.
#
# 3. Indentation is important.  Always use the same indentation for a given block.  For
#                               example, the "anat" block has two items, "outdir" and
#                               "format".  The parser figures this out because both
#                               are indented 4 spaces from the left margin.
#
# 4. The colon is used instead of an equal. 
#
# 5. A space must follow the colon.
#
# 6. Brackets denote a list.  Each member of the list must be followed by a comma.
#
# 7. The &id_*** variables are required by the parser. Each of the lines where it appears
#    (e.g. anat:) defines the beginning of a Python dictionary, and the indented elements
#    after it are the dictionaries members.  The name of the dictionary and its "id" must 
#    be unique. For example, epis could have names of epi1, epi2 ... and id's of &id001, &id002, ...
#
# File type codes: brik=BRIK, nii=one-file nifti, ni1 = two_file nifti

#keyword   Value                  Meaning
#-------   -------                -------
---
# Global variables.
top_outdir: ""         # Directory for output data (defaults to raw data directory)
subject: "same"        # Subdirectory for processed data. MUST be in quotes.
                       # If set to "same" use the the same name as the data, e.g.,
                       #     if the data are in /study/mystudy/sub001, and proc=/study/mystudy/processed,
                       #     the data will be stored in /study/mystudy/processed/sub001.
fsl_flip: False        # If true, all output images will be flipped physically such that they are 
                       # in LPI, PSL, or LSP orientation.  This is workaround for a bug in flsview that
                       # requires this orientation. The header will correctly represent the orientation,
                       # so files can still be viewed in AFNI, SPM, VoxBo, or mricron.

# Structural images:
anat: &id_anat         # Structural image info.
    outdir: anat       # Directory where anatomical images should be stored.
    format: brik       # File format for structural images. 'brik', 'nii', or 'n+1'

# DTI processing
dti: &id_dti           # DTI info
    outdir: dti        # Directory where DTI images should be stored.
    format: nii        # Default type is nifti one-file.
    pepolar: 0         # Default phase encode direction. (pe axis read from header.)

# Log file location.
logdir: log            # Directory for log files

# Field maps Processing
fmap: &id_fmap         # Fieldmap info
    outdir: fieldmap   # Directory where fieldmaps should be stored.
    echo_spacing: .688

# EPI processing.
first_epi: epi_setup   # Directory for first two EPI images.
epi_type: brik         # Output epi file type.
skip: 5                # Number of frames to skip.
epi_motion_interp: -Fourier # Interpolation method argument for 3dvolreg.
epi_file_format: brik  # Format of final epi files.
email: noname@wisc.edu # Email address where completion status should be sent. Set
                       #       to "noname" or "noname@whatever.whatever for no email.
epidir_dflt: &id001       # First set of epis.
    type: epi          # Type of data. The only value currently used is "epi"
    acq_order: 0       # Acquisition order. 
    outdir: "run_1"    # Directory for first set of epis.
    names: [epi_run1, epi_run2, epi_run3, epi_run4, epi_run5, epi_run6, ] # EPI  names.
    pepolar: 0         # 0 = default phase-encode direction, 1 = reversed.
---

Example of study containing a faces and a go-nogo task.

#!fmri_file_template # The first part of this line MUST be present. --- top_outdir: /study/jjo/tmp/BRDEVEL/processed subject: "064" # Subdirectory for processed data. MUST be in quotes. # If set to "same" use the the same name as the data, e.g., # if the data are in /study/mystudy/sub001, and proc=/study/mystudy/processed, # the data will be stored in /study/mystudy/processed/sub001. fsl_flip: False # Structural images: anat: &id_anat outdir: anat format: brik # DTI processing dti: &id_dti # Directory for all dti data. outdir: dti format: nii # Default type is nifit one-file. pepolar: 0 # Default phase encode direction. (pe axis read from header.) # Log file location. logdir: log # Directory for log files # Field maps Processing fmap: &id_fmap # Directory for all fieldmaps. outdir: fieldmap echo_spacing: .688 # EPI processing. first_epi: epi_setup # Directory for first two EPI images. epi_type: brik # Output epi file type. skip: 5 # Number of frames to skip. epi_motion_interp: -Fourier epi_file_format: brik email: ollinger@wisc.edu faces: &id001 # Second set of epis. type: epi # Type of data. The only value currently used is "epi" acq_order: 1 # Acquisition order. Is it the first, second, third etc set of epi runs. outdir: fMRI/faces # Directory names: [faces_run1, faces_run2] # List of names. gonogo: &id002 # First set of epis. type: epi # Type of data. The only value currently used is "epi" acq_order: 2 # Acquisition order. Is it the first, second, third etc set of epi runs. outdir: fMRI/gonogo # Directory for first set of epis. names: [gonogo_run1, gonogo_run2] # List of names. pepolar: 0 ---
This file would yield the directory structure:

                             /study/jjo/tmp/BRDEVEL/processed/064
                                           |
                                           |
   —————————————————————————————————————————————————————————————————————————————————
   |               |             |        |         |                |             | 
 anat          fieldmap         dti      log       fMRI           epi_setup       log
                                                    |
                                                    |
                                         ———————————————————————
                                         |                     |
                                       gonogo                faces

A template file for a study with 5 epi runs where each is stored in its own directory would be:
#!fmri_file_template #keyword Value Meaning --- top_outdir: /study/jjo/tmp/pain_regulation/tmp_outdir subject: "same" # Subdirectory for processed data. MUST be in quotes. # If set to "same" use the the same name as the data, e.g., # if the data are in /study/mystudy/sub001, and proc=/study/mystudy/processed, # the data will be stored in /study/mystudy/processed/sub001. fsl_flip: False # Structural images: anat: &id_anat outdir: anat format: brik # DTI processing dti: &id_dti # Directory for all dti data. outdir: dti format: nii # Default type is nifit one-file. pepolar: 0 # Default phase encode direction. (pe axis read from header.) # Log file location. logdir: log # Directory for log files # Field maps Processing fmap: &id_fmap # Directory for all fieldmaps. outdir: fieldmap echo_spacing: .688 # EPI processing. first_epi: epi_setup # Directory for first two EPI images. epi_type: brik # Output epi file type. skip: 5 # Number of frames to skip. epi_motion_interp: -Fourier epi_file_format: brik email: ollinger@wisc.edu skip: 5 # Number of frames to skip. outdir_1: &id001 # First set of epis. type: epi # Type of data. The only value currently used is "epi" acq_order: 1 # Acquisition order. outdir: "run_1" # Directory for first set of epis. names: [run_1] # List of names. outdir_2: &id002 # First set of epis. type: epi # Type of data. The only value currently used is "epi" acq_order: 2 # Acquisition order. outdir: "run_2" # Directory for first set of epis. names: [run_2] # List of names. outdir_3: &id003 # First set of epis. type: epi # Type of data. The only value currently used is "epi" acq_order: 3 # Acquisition order. outdir: "run_3" # Directory for first set of epis. names: [run_3] # List of names. outdir_4: &id004 # First set of epis. type: epi # Type of data. The only value currently used is "epi" acq_order: 4 # Acquisition order. outdir: "run_4" # Directory for first set of epis. names: [run_4] # List of names. outdir_5: &id005 # First set of epis. type: epi # Type of data. The only value currently used is "epi" acq_order: 5 # Acquisition order. outdir: "run_5" # Directory for first set of epis. names: [run_5] # List of names. outdir_6: &id006 # First set of epis. type: epi # Type of data. The only value currently used is "epi" acq_order: 6 # Acquisition order. outdir: "run_6" # Directory for first set of epis. names: [run_6] # List of names. ---
With a directory structure:
/study/jjo/tmp/pain_regulation_test/processed | | --------------------------------------------------------------------------------- | | | | | | | anat fieldmap dti log | epi_setup log | | | -------------- fieldmap_sagittal.nii | | | T1High+orig T2+orig | | --------------------------------------------------------- | | | | | | run_1 run_2 run_3 run_4 run_5 run_6 | | | | | | | ------------------------------- | | | run_3+orig run_3_m+orig run_3_mf+orig

Last modified February 24, 2011