How pyAFQ uses BIDS

The pyAFQ API relies heavily on the Brain Imaging Data Standard (BIDS). This means that the software assumes that its inputs are organized according to the BIDS spec and its outputs conform where possible with the BIDS spec.

Note

Derivatives of processing diffusion MRI are not currently fully described in the existing BIDS specification, but describing these is part of an ongoing effort. Wherever possible, we conform with the draft implementation of the BIDS DWI derivatives available here

In this example, we will explore the use of BIDS in pyAFQ and see how BIDS allows us to extend and provide flexibility to the users of the software.

import os
import os.path as op

import matplotlib.pyplot as plt
import nibabel as nib

from AFQ import api
import AFQ.data as afd
To interact with and query BIDS datasets, we use

pyBIDS.

import bids

We start with some example data. The data we will use here is generated from the Stanford HARDI dataset. The call below fetches this dataset and organized it within the ~/AFQ_data folder in the BIDS format.

afd.organize_stanford_data(clear_previous_afq=True)

After doing that, we should have a folder that looks like this:

stanford_hardi
├── dataset_description.json
└── derivatives
├── freesurfer
│   ├── dataset_description.json
│   └── sub-01
│   └── ses-01
│   └── anat
│   ├── sub-01_ses-01_T1w.nii.gz
│   └── sub-01_ses-01_seg.nii.gz
└── vistasoft
├── dataset_description.json
└── sub-01
└── ses-01
└── dwi
├── sub-01_ses-01_dwi.bvals
├── sub-01_ses-01_dwi.bvecs
└── sub-01_ses-01_dwi.nii.gz

The top level directory is our overall bids dataset folder. In most cases, this folder will include a raw folder that will contain the raw data. In this case, we do not include the raw folder and only have the pipelines that contains the outputs of preprocessing the data. In general, only the preprocessed diffusion data is required. See the “Organizing your data” section of “Using pyAFQ” for more details. In this case, one folder containing Freesurfer derivatives and another folder containing the DWI data that has been preprocessed with Vistasoft. pyAFQ provides facilities to segment tractography results obtained using other software. For example, we often use qsiprep to preprocess our data and reconstruct tractographies with software such as MRTRIX. Here, we will demonstrate how to use these reconstructions in the pyAFQ segmentation and tractometry pipeline We fetch this data and add it as a separate pipeline

afd.fetch_stanford_hardi_tractography()

bids_path = op.join(op.expanduser('~'), 'AFQ_data', 'stanford_hardi')
tractography_path = op.join(bids_path, 'derivatives', 'my_tractography')
sub_path = op.join(tractography_path, 'sub-01', 'ses-01', 'dwi')

os.makedirs(sub_path, exist_ok=True)
os.rename(
    op.join(
        op.expanduser('~'),
        'AFQ_data',
        'stanford_hardi_tractography',
        'full_segmented_cleaned_tractography.trk'),
    op.join(
        sub_path,
        'sub-01_ses-01-dwi_tractography.trk'))

afd.to_bids_description(
    tractography_path,
    **{"Name": "my_tractography",
        "PipelineDescription": {"Name": "my_tractography"}})

Out:

  0%|          | 0/11337 [00:00<?, ? MB/s]
  0%|          | 4/11337 [00:00<08:12, 23.03 MB/s]
  0%|          | 12/11337 [00:00<05:11, 36.38 MB/s]
  0%|          | 48/11337 [00:00<01:39, 113.89 MB/s]
  2%|1         | 192/11337 [00:00<00:28, 392.38 MB/s]
  5%|4         | 519/11337 [00:00<00:09, 1091.86 MB/s]
  6%|6         | 734/11337 [00:00<00:08, 1181.77 MB/s]
  9%|8         | 1005/11337 [00:01<00:06, 1542.43 MB/s]
 11%|#         | 1244/11337 [00:01<00:06, 1524.44 MB/s]
 13%|#3        | 1497/11337 [00:01<00:05, 1765.02 MB/s]
 16%|#5        | 1775/11337 [00:01<00:04, 2023.68 MB/s]
 18%|#7        | 1997/11337 [00:01<00:05, 1826.14 MB/s]
 20%|#9        | 2246/11337 [00:01<00:04, 1991.95 MB/s]
 22%|##1       | 2485/11337 [00:01<00:04, 2091.83 MB/s]
 24%|##3       | 2711/11337 [00:01<00:04, 1896.95 MB/s]
 26%|##6       | 2972/11337 [00:02<00:04, 2076.11 MB/s]
 28%|##8       | 3231/11337 [00:02<00:03, 2196.15 MB/s]
 31%|###       | 3460/11337 [00:02<00:03, 1969.91 MB/s]
 33%|###2      | 3718/11337 [00:02<00:03, 2120.88 MB/s]
 35%|###4      | 3946/11337 [00:02<00:03, 2163.47 MB/s]
 37%|###7      | 4203/11337 [00:02<00:03, 2276.20 MB/s]
 39%|###9      | 4437/11337 [00:02<00:03, 2048.08 MB/s]
 41%|####1     | 4660/11337 [00:02<00:03, 2080.92 MB/s]
 43%|####3     | 4894/11337 [00:02<00:02, 2148.24 MB/s]
 45%|####5     | 5132/11337 [00:03<00:03, 1986.92 MB/s]
 48%|####7     | 5399/11337 [00:03<00:02, 2167.59 MB/s]
 50%|####9     | 5623/11337 [00:03<00:02, 2145.73 MB/s]
 52%|#####1    | 5857/11337 [00:03<00:02, 2197.68 MB/s]
 54%|#####3    | 6081/11337 [00:03<00:02, 2020.97 MB/s]
 56%|#####5    | 6324/11337 [00:03<00:02, 2129.05 MB/s]
 58%|#####7    | 6552/11337 [00:03<00:02, 2170.11 MB/s]
 60%|#####9    | 6792/11337 [00:03<00:02, 2206.00 MB/s]
 62%|######1   | 7016/11337 [00:03<00:02, 1995.76 MB/s]
 64%|######4   | 7259/11337 [00:04<00:01, 2108.52 MB/s]
 66%|######6   | 7500/11337 [00:04<00:01, 2191.68 MB/s]
 68%|######8   | 7724/11337 [00:04<00:01, 2197.24 MB/s]
 70%|#######   | 7947/11337 [00:04<00:01, 2014.56 MB/s]
 72%|#######2  | 8173/11337 [00:04<00:01, 2081.37 MB/s]
 74%|#######4  | 8407/11337 [00:04<00:01, 2153.57 MB/s]
 76%|#######6  | 8637/11337 [00:04<00:01, 2195.28 MB/s]
 78%|#######8  | 8859/11337 [00:04<00:01, 2009.26 MB/s]
 80%|########  | 9092/11337 [00:04<00:01, 2082.41 MB/s]
 82%|########2 | 9336/11337 [00:05<00:00, 2177.07 MB/s]
 84%|########4 | 9567/11337 [00:05<00:00, 2212.00 MB/s]
 86%|########6 | 9795/11337 [00:05<00:00, 2035.59 MB/s]
 88%|########8 | 10027/11337 [00:05<00:00, 2110.23 MB/s]
 90%|######### | 10247/11337 [00:05<00:00, 2126.48 MB/s]
 92%|#########2| 10474/11337 [00:05<00:00, 2166.59 MB/s]
 95%|#########4| 10734/11337 [00:05<00:00, 2072.90 MB/s]
 97%|#########6| 10970/11337 [00:05<00:00, 2150.83 MB/s]
 99%|#########8| 11188/11337 [00:05<00:00, 2128.48 MB/s]
100%|##########| 11337/11337 [00:05<00:00, 1902.61 MB/s]

  0%|          | 0/14 [00:00<?, ? MB/s]
 21%|##1       | 3/14 [00:00<00:00, 16.95 MB/s]
 79%|#######8  | 11/14 [00:00<00:00, 33.49 MB/s]
100%|##########| 14/14 [00:00<00:00, 39.37 MB/s]

  0%|          | 0/1037 [00:00<?, ? MB/s]
  0%|          | 4/1037 [00:00<00:44, 22.99 MB/s]
  2%|1         | 16/1037 [00:00<00:20, 49.93 MB/s]
  6%|6         | 64/1037 [00:00<00:06, 152.62 MB/s]
 21%|##        | 216/1037 [00:00<00:01, 523.85 MB/s]
 41%|####      | 421/1037 [00:00<00:00, 810.08 MB/s]
 72%|#######1  | 742/1037 [00:00<00:00, 1417.76 MB/s]
 95%|#########4| 982/1037 [00:01<00:00, 1441.09 MB/s]
100%|##########| 1037/1037 [00:01<00:00, 982.16 MB/s]

After we do that, our dataset folder should look like this:

stanford_hardi
├── dataset_description.json
└── derivatives
├── freesurfer
│   ├── dataset_description.json
│   └── sub-01
│   └── ses-01
│   └── anat
│   ├── sub-01_ses-01_T1w.nii.gz
│   └── sub-01_ses-01_seg.nii.gz
├── my_tractography
| ├── dataset_description.json
│   └── sub-01
│   └── ses-01
│   └── dwi
│   └── sub-01_ses-01-dwi_tractography.trk
└── vistasoft
├── dataset_description.json
└── sub-01
└── ses-01
└── dwi
├── sub-01_ses-01_dwi.bvals
├── sub-01_ses-01_dwi.bvecs
└── sub-01_ses-01_dwi.nii.gz

To explore the layout of these derivatives, we will initialize a BIDSLayout class instance to help us see what is in this dataset

layout = bids.BIDSLayout(bids_path, derivatives=True)

Because there is no raw data in this BIDS layout (only derivatives), pybids will report that there are no subjects and sessions:

print(layout)

Out:

BIDS Layout: ...runner/AFQ_data/stanford_hardi | Subjects: 0 | Sessions: 0 | Runs: 0

But a query on the derivatives will reveal the different derivatives that are stored here:

print(layout.derivatives)

Out:

{'freesurfer': BIDS Layout: ...d_hardi/derivatives/freesurfer | Subjects: 1 | Sessions: 1 | Runs: 0, 'my_tractography': BIDS Layout: ...di/derivatives/my_tractography | Subjects: 1 | Sessions: 1 | Runs: 0, 'vistasoft': BIDS Layout: ...rd_hardi/derivatives/vistasoft | Subjects: 1 | Sessions: 1 | Runs: 0}

We can use a bids.BIDSValidator object to make sure that the files within our data set are BIDS-compliant. For example, we can extract the tractography derivatives part of our layout using:

my_tractography = layout.derivatives["my_tractography"]

This variable is also a BIDS layout object. This object has a get method, which allows us to query and find specific items within the layout. For example, we can ask for files that have a suffix consistent with tractography results:

tractography_files = my_tractography.get(suffix='tractography')

Or ask for files that have a .trk extension:

tractography_files = my_tractography.get(extension='.trk')

In this case, both of these would produce the same result.

tractography_file = tractography_files[0]
print(tractography_file)

Out:

<BIDSFile filename='/home/runner/AFQ_data/stanford_hardi/derivatives/my_tractography/sub-01/ses-01/dwi/sub-01_ses-01-dwi_tractography.trk'>

We can also get some more structured information about this file:

print(tractography_file.get_entities())

Out:

{'datatype': 'dwi', 'extension': '.trk', 'session': '01', 'subject': '01', 'suffix': 'tractography'}

We can use a bids.BIDSValidator class instance to validate that this file is compliant with the specification. Note that the validator requires that the filename be provided relative to the root of the BIDS dataset, so we have to split the string that contains the full path of the tractography to extract only the part that is relative to the root of the entire BIDS layout object:

tractography_full_path = tractography_file.path
tractography_relative_path = tractography_full_path.split(layout.root)[-1]

validator = bids.BIDSValidator()
print(validator.is_bids(tractography_relative_path))

Out:

True

Next, we specify the information we need to define the bundles that we are interested in segmenting. In this case, we are going to use a list of bundle names for the bundle info. These names refer to bundles for which we already have clear definitions of the information needed to segment them (e.g., waypoint ROIs and probability maps). For an example that includes custom definition of bundle info, see the plot_callosal_tract_profile example.

bundle_info = ["SLF", "ARC", "CST", "FP"]

Now, we can define our AFQ object, pointing to the derivatives of the ‘my_tractography’ pipeline as inputs. This is done by setting the custom_tractography_bids_filters key-word argument. We pass the bundle_info defined above. We also point to the preprocessed data that is in a ‘dmriprep’ pipeline. Note that the pipeline name is not necessarily the name of the folder it is in; the pipeline name is defined in each pipeline’s dataset_description.json. These data were preprocessed with ‘vistasoft’, so this is the pipeline we’ll point to If we were using ‘qsiprep’, this is where we would pass that string instead. If we did that, AFQ would look for a derivatives folder called ‘stanford_hardi/derivatives/qsiprep’ and find the preprocessed DWI data within it. Finally, to speed things up a bit, we also sub-sample the provided tractography. This is done by defining the segmentation_params dictionary input. To sub-sample to 10,000 streamlines, we define ‘nb_streamlines’ = 10000.

my_afq = api.AFQ(
    bids_path,
    preproc_pipeline='vistasoft',
    bundle_info=bundle_info,
    custom_tractography_bids_filters={
        "suffix": "tractography",
        "scope": "my_tractography"
    },
    segmentation_params={'nb_streamlines': 10000})

Finally, to run the segmentation and extract tract profiles, we call The export_all method. This creates all of the derivative outputs of AFQ within the ‘stanford_hardi/derivatives/afq’ folder.

my_afq.export_all()

Out:

Optimizing level 2 [max iter: 10000]
Optimizing level 1 [max iter: 1000]
Optimizing level 0 [max iter: 100]
Optimizing level 2 [max iter: 10000]
Optimizing level 1 [max iter: 1000]
Optimizing level 0 [max iter: 100]
Optimizing level 2 [max iter: 10000]
Optimizing level 1 [max iter: 1000]
Optimizing level 0 [max iter: 100]

  0%|          | 0/2840 [00:00<?, ?it/s]
  0%|          | 1/2840 [00:00<07:03,  6.71it/s]
  0%|          | 2/2840 [00:00<07:57,  5.94it/s]
  0%|          | 4/2840 [00:07<1:47:40,  2.28s/it]
  0%|          | 6/2840 [00:07<58:17,  1.23s/it]
 10%|9         | 272/2840 [00:07<00:35, 71.35it/s]
 28%|##7       | 784/2840 [00:07<00:08, 254.95it/s]
 55%|#####4    | 1552/2840 [00:07<00:02, 592.03it/s]
 73%|#######2  | 2064/2840 [00:08<00:00, 823.46it/s]
 91%|######### | 2576/2840 [00:08<00:00, 996.33it/s]
100%|##########| 2840/2840 [00:08<00:00, 335.27it/s]

  0%|          | 0/1912 [00:00<?, ?it/s]
 20%|#9        | 380/1912 [00:00<00:00, 3328.73it/s]
 53%|#####3    | 1020/1912 [00:00<00:00, 2845.65it/s]
 69%|######8   | 1315/1912 [00:00<00:00, 2280.52it/s]
 81%|########1 | 1549/1912 [00:00<00:00, 1936.60it/s]
 94%|#########3| 1788/1912 [00:00<00:00, 1773.82it/s]
100%|##########| 1912/1912 [00:00<00:00, 2189.52it/s]

  0%|          | 0/2375 [00:00<?, ?it/s]
  0%|          | 8/2375 [00:00<00:37, 63.61it/s]
 21%|##1       | 508/2375 [00:00<00:00, 2412.61it/s]
 43%|####2     | 1020/2375 [00:00<00:00, 3440.51it/s]
 86%|########6 | 2044/2375 [00:00<00:00, 4031.18it/s]
100%|##########| 2375/2375 [00:00<00:00, 4085.51it/s]

  0%|          | 0/1845 [00:00<?, ?it/s]
  0%|          | 8/1845 [00:00<00:28, 64.76it/s]
 21%|##        | 380/1845 [00:00<00:00, 1969.64it/s]
 55%|#####5    | 1020/1845 [00:00<00:00, 3001.35it/s]
100%|##########| 1845/1845 [00:00<00:00, 4082.28it/s]

  0%|          | 0/1432 [00:00<?, ?it/s]
  1%|          | 8/1432 [00:00<00:23, 61.82it/s]
 71%|#######1  | 1020/1432 [00:00<00:00, 4360.16it/s]
100%|##########| 1432/1432 [00:00<00:00, 5250.88it/s]

  0%|          | 0/1054 [00:00<?, ?it/s]
  1%|          | 8/1054 [00:00<00:16, 63.00it/s]
 72%|#######2  | 764/1054 [00:00<00:00, 3730.76it/s]
100%|##########| 1054/1054 [00:00<00:00, 3755.08it/s]

  0%|          | 0/1137 [00:00<?, ?it/s]
  1%|          | 8/1137 [00:00<00:21, 51.84it/s]
 33%|###3      | 380/1137 [00:00<00:00, 1789.61it/s]
 90%|########9 | 1020/1137 [00:00<00:00, 2897.40it/s]
100%|##########| 1137/1137 [00:00<00:00, 2693.25it/s]

  0%|          | 0/7 [00:00<?, ?it/s]
 14%|#4        | 1/7 [00:00<00:00,  7.25it/s]
 57%|#####7    | 4/7 [00:00<00:00, 15.79it/s]
100%|##########| 7/7 [00:00<00:00, 18.36it/s]
100%|##########| 7/7 [00:00<00:00, 16.80it/s]

  0%|          | 0/7 [00:00<?, ?it/s]
 43%|####2     | 3/7 [00:00<00:00, 21.88it/s]
 86%|########5 | 6/7 [00:00<00:00, 22.50it/s]
100%|##########| 7/7 [00:00<00:00, 22.74it/s]

A few common issues that can hinder BIDS from working properly are:

  1. Faulty dataset_description.json file. You need to make sure that the file contains the right names for the pipeline. See above for an example of that.

  2. File naming convention doesn’t uniquely identify file with bids filters.

Total running time of the script: ( 16 minutes 52.145 seconds)

Gallery generated by Sphinx-Gallery