AFQ.data.s3bids#

Module Contents#

Classes#

S3BIDSSubject

A single study subject hosted on AWS S3

HBNSubject

A subject in the HBN study

S3BIDSStudy

A BIDS-compliant study hosted on AWS S3

HBNSite

An HBN study site

Functions#

get_s3_client(anon=True)

Return a boto3 s3 client

_ls_s3fs(s3_prefix, anon=True)

Returns a dict of list of files using s3fs

_get_matching_s3_keys(bucket, prefix='', suffix='', anon=True)

Generate all the matching keys in an S3 bucket.

_download_from_s3(fname, bucket, key, overwrite=False, anon=True)

Download object from S3 to local file

s3fs_nifti_write(img, fname, fs=None)

Write a nifti file straight to S3

s3fs_nifti_read(fname, fs=None, anon=False)

Lazily reads a nifti image from S3.

write_json(fname, data)

Write data to JSON file.

read_json(fname)

Read data from a JSON file.

s3fs_json_read(fname, fs=None, anon=False)

Reads json directly from S3

s3fs_json_write(data, fname, fs=None)

Writes json from a dict directly into S3

AFQ.data.s3bids.get_s3_client(anon=True)[source]#

Return a boto3 s3 client

Global boto clients are not thread safe so we use this function to return independent session clients for different threads.

Parameters
anonbool

Whether to use anonymous connection (public buckets only). If False, uses the key/secret given, or boto’s credential resolver (client_kwargs, environment, variables, config files, EC2 IAM server, in that order). Default: True

Returns
s3_clientboto3.client(‘s3’)
AFQ.data.s3bids._ls_s3fs(s3_prefix, anon=True)[source]#

Returns a dict of list of files using s3fs

The files are divided between subject directories/files and non-subject directories/files.

Parameters
s3_prefixstr

AWS S3 key for the study or site “directory” that contains all of the subjects

anonbool

Whether to use anonymous connection (public buckets only). If False, uses the key/secret given, or boto’s credential resolver (client_kwargs, environment, variables, config files, EC2 IAM server, in that order). Default: True

Returns
subjectsdict
AFQ.data.s3bids._get_matching_s3_keys(bucket, prefix='', suffix='', anon=True)[source]#

Generate all the matching keys in an S3 bucket.

Parameters
bucketstr

Name of the S3 bucket

prefixstr, optional

Only fetch keys that start with this prefix

suffixstr, optional

Only fetch keys that end with this suffix

anonbool

Whether to use anonymous connection (public buckets only). If False, uses the key/secret given, or boto’s credential resolver (client_kwargs, environment, variables, config files, EC2 IAM server, in that order). Default: True

Yields
keylist

S3 keys that match the prefix and suffix

AFQ.data.s3bids._download_from_s3(fname, bucket, key, overwrite=False, anon=True)[source]#

Download object from S3 to local file

Parameters
fnamestr

File path to which to download the object

bucketstr

S3 bucket name

keystr

S3 key for the object to download

overwritebool

If True, overwrite file if it already exists. If False, skip download and return. Default: False

anonbool

Whether to use anonymous connection (public buckets only). If False, uses the key/secret given, or boto’s credential resolver (client_kwargs, environment, variables, config files, EC2 IAM server, in that order). Default: True

class AFQ.data.s3bids.S3BIDSSubject(subject_id, study)[source]#

A single study subject hosted on AWS S3

property subject_id(self)[source]#

An identifier string for the subject

property study(self)[source]#

The study in which this subject participated

property s3_keys(self)[source]#

A dict of S3 keys for this subject’s data

The S3 keys are divided between “raw” data and derivatives

property files(self)[source]#

Local files for this subject’s dMRI data

Before the call to subject.download(), this is None. Afterward, the files are stored in a dict with keys for each Amazon S3 key and values corresponding to the local file.

__repr__(self)[source]#

Return repr(self).

_get_s3_keys(self)[source]#

Get all required S3 keys for this subject

Returns
s3_keysdict

S3 keys organized into “raw” and “derivatives” lists

download(self, directory, include_derivs=False, overwrite=False, suffix=None, pbar=True, pbar_idx=0)[source]#

Download files from S3

Parameters
directorystr

Directory to which to download subject files

include_derivsbool or str

If True, download all derivatives files. If False, do not. If a string or sequence of strings is passed, this will only download derivatives that match the string(s) (e.g. [‘dmriprep’, ‘afq’]). Default: False

overwritebool

If True, overwrite files for each subject. Default: False

suffixstr

Suffix, including extension, of file(s) to download. Default: None

pbarbool

If True, include download progress bar. Default: True

pbar_idxint

Progress bar index for multithreaded progress bars. Default: 0

class AFQ.data.s3bids.HBNSubject(subject_id, study, site=None)[source]#

Bases: S3BIDSSubject

A subject in the HBN study

See also

AFQ.data.S3BIDSSubject
property site(self)[source]#

The site at which this subject was a participant

__repr__(self)[source]#

Return repr(self).

_get_s3_keys(self)[source]#

Get all required S3 keys for this subject

Returns
s3_keysdict

S3 keys organized into “raw” and “derivatives” lists

class AFQ.data.s3bids.S3BIDSStudy(study_id, bucket, s3_prefix, subjects=None, anon=True, use_participants_tsv=False, random_seed=None, _subject_class=S3BIDSSubject)[source]#

A BIDS-compliant study hosted on AWS S3

property study_id(self)[source]#

An identifier string for the study

property bucket(self)[source]#

The S3 bucket that contains the study data

property s3_prefix(self)[source]#

The S3 prefix common to all of the study objects on S3

property subjects(self)[source]#

A list of Subject instances for each requested subject

property anon(self)[source]#

Is this study using an anonymous S3 connection?

property derivative_types(self)[source]#

A list of derivative pipelines available in this study

property non_sub_s3_keys(self)[source]#

A dict of S3 keys that are not in subject directories

property local_directories(self)[source]#

A list of local directories where this study has been downloaded

property use_participants_tsv(self)[source]#

Did we use a participants.tsv file to populate the list of study subjects.

property random_seed(self)[source]#

The random seed used to retrieve study subjects

__repr__(self)[source]#

Return repr(self).

_get_subject(self, subject_id)[source]#

Return a Subject instance from a subject-ID

_get_derivative_types(self)[source]#

Return a list of available derivatives pipelines

Returns
list

list of available derivatives pipelines

_get_non_subject_s3_keys(self)[source]#

Return a list of ‘non-subject’ files

In this context, a ‘non-subject’ file is any file or directory that is not a subject ID folder

Returns
dict

dict with keys ‘raw’ and ‘derivatives’ and whose values are lists of S3 keys for non-subject files

_list_all_subjects(self)[source]#

Return list of subjects

Returns
list

list of participant_ids

_download_non_sub_keys(self, directory, select=('dataset_description.json',), filenames=None)[source]#
_download_derivative_descriptions(self, include_derivs, directory)[source]#
download(self, directory, include_modality_agnostic=('dataset_description.json',), include_derivs=False, include_derivs_dataset_description=True, suffix=None, overwrite=False, pbar=True)[source]#

Download files for each subject in the study

Parameters
directorystr

Directory to which to download subject files

include_modality_agnosticbool, “all” or any subset of [

“dataset_description.json”, “CHANGES”, “README”, “LICENSE”]

If True or “all”, download all keys in self.non_sub_s3_keys also. If a subset of [“dataset_description.json”, “CHANGES”, “README”, “LICENSE”], download only those files. This is useful if the non_sub_s3_keys contain files common to all subjects that should be inherited. Default: (“dataset_description.json”,)

include_derivsbool or str

If True, download all derivatives files. If False, do not. If a string or sequence of strings is passed, this will only download derivatives that match the string(s) (e.g. [“dmriprep”, “afq”]). Default: False

include_derivs_dataset_descriptionbool

Used only if include_derivs is not False. If True, dataset_description.json downloaded for each derivative.

suffixstr

Suffix, including extension, of file(s) to download. Default: None

overwritebool

If True, overwrite files for each subject. Default: False

pbarbool

If True, include progress bar. Default: True

See also

AFQ.data.S3BIDSSubject.download
class AFQ.data.s3bids.HBNSite(site, study_id='HBN', bucket='fcp-indi', s3_prefix='data/Projects/HBN/MRI', subjects=None, use_participants_tsv=False, random_seed=None)[source]#

Bases: S3BIDSStudy

An HBN study site

See also

AFQ.data.S3BIDSStudy
property site(self)[source]#

The HBN site

_get_subject(self, subject_id)[source]#

Return a Subject instance from a subject-ID

_get_derivative_types(self)[source]#

Return a list of available derivatives pipelines

The HBN dataset is not BIDS compliant so to go a list of available derivatives, we must peak inside every directory in derivatives/sub-XXXX/

Returns
list

list of available derivatives pipelines

_get_non_subject_s3_keys(self)[source]#

Return a list of ‘non-subject’ files

In this context, a ‘non-subject’ file is any file or directory that is not a subject ID folder. This method is different from AFQ.data.S3BIDSStudy because the HBN dataset is not BIDS compliant

Returns
dict

dict with keys ‘raw’ and ‘derivatives’ and whose values are lists of S3 keys for non-subject files

See also

AFQ.data.S3BIDSStudy._get_non_subject_s3_keys
download(self, directory, include_modality_agnostic=False, include_derivs=False, overwrite=False, pbar=True)[source]#

Download files for each subject in the study

Parameters
directorystr

Directory to which to download subject files

include_modality_agnosticbool, “all” or any subset of [

“dataset_description.json”, “CHANGES”, “README”, “LICENSE”]

If True or “all”, download all keys in self.non_sub_s3_keys also. If a subset of [“dataset_description.json”, “CHANGES”, “README”, “LICENSE”], download only those files. This is useful if the non_sub_s3_keys contain files common to all subjects that should be inherited. Default: False

include_derivsbool or str

If True, download all derivatives files. If False, do not. If a string or sequence of strings is passed, this will only download derivatives that match the string(s) (e.g. [“dmriprep”, “afq”]). Default: False

overwritebool

If True, overwrite files for each subject. Default: False

pbarbool

If True, include progress bar. Default: True

See also

AFQ.data.S3BIDSSubject.download
AFQ.data.s3bids.s3fs_nifti_write(img, fname, fs=None)[source]#

Write a nifti file straight to S3

AFQ.data.s3bids.s3fs_nifti_read(fname, fs=None, anon=False)[source]#

Lazily reads a nifti image from S3.

Returns
nib.Nifti1Image class instance

Notes

Because the image is lazily loaded, data stored in the file is not transferred until get_fdata is called.

AFQ.data.s3bids.write_json(fname, data)[source]#

Write data to JSON file.

Parameters
fnamestr

Full path to the file to write.

datadict

A dict containing the data to write.

Returns
None
AFQ.data.s3bids.read_json(fname)[source]#

Read data from a JSON file.

Parameters
fnamestr

Full path to the data-containing file

Returns
dict
AFQ.data.s3bids.s3fs_json_read(fname, fs=None, anon=False)[source]#

Reads json directly from S3

AFQ.data.s3bids.s3fs_json_write(data, fname, fs=None)[source]#

Writes json from a dict directly into S3

Parameters
datadict

The json to be written out

fnamestr

Full path (including bucket name and extension) to the file to be written out on S3

fsan s3fs.S3FileSystem class instance, optional

A file-system to refer to. Default to create a new file-system.