Uploading data from the scanner.
Program name: upload_data
- Purpose: Upload data from GE's database to a remote host.
- Usage: upload_data
- Options:
- --fast: No compression or anonymization.
- --nocheck: Omit consistency check.
- --nopreprocess: Do not invoke preprocessing program.
- --save-local: Save data to a local file rather than directly to a remote server.
- --no-proc-pfiles: Don't process pfiles and ref.dat files.
- --root-dir=OUTPUT_PATH Directory where images should be written
- --regenerate_db: Regenerate list of exam numbers.
- --no_anonymize: Don't anonymize patient name and birthdate fields.
- --compress=
: Compress data if set to "gzip" or "bzip." Set to none for no compression.
Assumptions
- That all dicom images are stored in set of directories in a common tree with a depth of two levels.
- P-files are stored in the usual place.
- ref.dat files are copied from /usr/g/bin to /usr/g/mrraw
where is any file-name beginning with "ref.dat"
What it does
- Opens an ssh channel to a remote server and creates the output directory.
- Read an exam-index created during a previous run. This index is a python dictionary containing an entry for each exam that contains a list of directories where data for that exam are stored. It also contains a couple of other data structures used to speed things up.
- Scan all of the directories in the database looking for new series containing data for the exam being uploaded and checking for known series that have been updated.
- When new dicom images are found, the following steps are performed:
- Read the entire dicom file and parse the header.
- Replace the patient-name as described below. Remove the month and day from the PatientBirthdate field.
- Compress the file. The default method is bzip2 because this yields compressed files sizes 20% less than gzip. It is slightly slower than gzip while compressing but just as fast when decompressing.
- Append the file to a data file on the remote server.
- The raw data directory is checked for new p-files or ref.dat files. If any are found, they are anonymized and transferred directly to the remote server. md5 checksums are computed and checked locally and on the remote host. The compressed files are moved to /usr/g/mrraw/Trash (The exam number is also appended to the filename.)
- If there are no data to be processed, the progrma will display the message "Waiting for more data".
- Clicking the "Finish" button causes the following to happen:
- The database is rechecked for new data.
- Gating files are collected into a tarfile and transferred to the remote server.
- Any new data is processed.
- A table of contents is written to the output file
- The exam-index data structures are written to disk.
- The progam to unpack the data file is spawned on the remote host. See below for more details.
- A program to check that the data are consistent is run. This program compares header information extracted from the header when the file was initially read from GE's database to information extracted from the compressed data on the remote-host. It creates an object used by the preprocessing program, and will fail if certain data are not present. (for example, it expects to find a T1 weighted anatomical image.). This program is probably overkill and will be omitted after the program is burned-in.
- Fork a process to run the preprocess script, then exit.
Anonymization
The following rules are used to anonymize the patient name:
- Numerical digits are not changed.
- Leading or trailing single alphapetic characters are not changed.
- A prefix of "sub" is unchanged.
- All alphabetic characters that are not contained in the above categories are converted to "x"s.
Unpacking the data
The data are stored in a single file on the remote host, with the dicom images first, then a tar file containing the gating data, then a table of contents, and finally the byte offset to the start of the table-of-contents. The table-of-contents is a python dicitionary converted to yaml format that defines the where each dicom slice is located and what its filename is.