AFQ.data.s3bids
#
Module Contents#
Classes#
A single study subject hosted on AWS S3 |
|
A subject in the HBN study |
|
A BIDS-compliant study hosted on AWS S3 |
|
An HBN study site |
Functions#
|
Return a boto3 s3 client |
|
Returns a dict of list of files using s3fs |
|
Generate all the matching keys in an S3 bucket. |
|
Download object from S3 to local file |
|
Write a nifti file straight to S3 |
|
Lazily reads a nifti image from S3. |
|
Write data to JSON file. |
|
Read data from a JSON file. |
|
Reads json directly from S3 |
|
Writes json from a dict directly into S3 |
- AFQ.data.s3bids.get_s3_client(anon=True)[source]#
Return a boto3 s3 client
Global boto clients are not thread safe so we use this function to return independent session clients for different threads.
- Parameters
- anonbool
Whether to use anonymous connection (public buckets only). If False, uses the key/secret given, or boto’s credential resolver (client_kwargs, environment, variables, config files, EC2 IAM server, in that order). Default: True
- Returns
- s3_clientboto3.client(‘s3’)
- AFQ.data.s3bids._ls_s3fs(s3_prefix, anon=True)[source]#
Returns a dict of list of files using s3fs
The files are divided between subject directories/files and non-subject directories/files.
- Parameters
- s3_prefixstr
AWS S3 key for the study or site “directory” that contains all of the subjects
- anonbool
Whether to use anonymous connection (public buckets only). If False, uses the key/secret given, or boto’s credential resolver (client_kwargs, environment, variables, config files, EC2 IAM server, in that order). Default: True
- Returns
- subjectsdict
- AFQ.data.s3bids._get_matching_s3_keys(bucket, prefix='', suffix='', anon=True)[source]#
Generate all the matching keys in an S3 bucket.
- Parameters
- bucketstr
Name of the S3 bucket
- prefixstr, optional
Only fetch keys that start with this prefix
- suffixstr, optional
Only fetch keys that end with this suffix
- anonbool
Whether to use anonymous connection (public buckets only). If False, uses the key/secret given, or boto’s credential resolver (client_kwargs, environment, variables, config files, EC2 IAM server, in that order). Default: True
- Yields
- keylist
S3 keys that match the prefix and suffix
- AFQ.data.s3bids._download_from_s3(fname, bucket, key, overwrite=False, anon=True)[source]#
Download object from S3 to local file
- Parameters
- fnamestr
File path to which to download the object
- bucketstr
S3 bucket name
- keystr
S3 key for the object to download
- overwritebool
If True, overwrite file if it already exists. If False, skip download and return. Default: False
- anonbool
Whether to use anonymous connection (public buckets only). If False, uses the key/secret given, or boto’s credential resolver (client_kwargs, environment, variables, config files, EC2 IAM server, in that order). Default: True
- class AFQ.data.s3bids.S3BIDSSubject(subject_id, study)[source]#
A single study subject hosted on AWS S3
- property s3_keys[source]#
A dict of S3 keys for this subject’s data
The S3 keys are divided between “raw” data and derivatives
- property files[source]#
Local files for this subject’s dMRI data
Before the call to subject.download(), this is None. Afterward, the files are stored in a dict with keys for each Amazon S3 key and values corresponding to the local file.
- _get_s3_keys()[source]#
Get all required S3 keys for this subject
- Returns
- s3_keysdict
S3 keys organized into “raw” and “derivatives” lists
- download(directory, include_derivs=False, overwrite=False, suffix=None, pbar=True, pbar_idx=0)[source]#
Download files from S3
- Parameters
- directorystr
Directory to which to download subject files
- include_derivsbool or str
If True, download all derivatives files. If False, do not. If a string or sequence of strings is passed, this will only download derivatives that match the string(s) (e.g. [‘dmriprep’, ‘afq’]). Default: False
- overwritebool
If True, overwrite files for each subject. Default: False
- suffixstr
Suffix, including extension, of file(s) to download. Default: None
- pbarbool
If True, include download progress bar. Default: True
- pbar_idxint
Progress bar index for multithreaded progress bars. Default: 0
- class AFQ.data.s3bids.HBNSubject(subject_id, study, site=None)[source]#
Bases:
S3BIDSSubject
A subject in the HBN study
See also
AFQ.data.S3BIDSSubject
- class AFQ.data.s3bids.S3BIDSStudy(study_id, bucket, s3_prefix='', subjects=None, anon=True, use_participants_tsv=False, random_seed=None, _subject_class=S3BIDSSubject)[source]#
A BIDS-compliant study hosted on AWS S3
- property local_directories[source]#
A list of local directories where this study has been downloaded
- property use_participants_tsv[source]#
Did we use a participants.tsv file to populate the list of study subjects.
- _get_derivative_types()[source]#
Return a list of available derivatives pipelines
- Returns
- list
list of available derivatives pipelines
- _get_non_subject_s3_keys()[source]#
Return a list of ‘non-subject’ files
In this context, a ‘non-subject’ file is any file or directory that is not a subject ID folder
- Returns
- dict
dict with keys ‘raw’ and ‘derivatives’ and whose values are lists of S3 keys for non-subject files
- download(directory, include_modality_agnostic=('dataset_description.json',), include_derivs=False, include_derivs_dataset_description=True, suffix=None, overwrite=False, pbar=True)[source]#
Download files for each subject in the study
- Parameters
- directorystr
Directory to which to download subject files
- include_modality_agnosticbool, “all” or any subset of [
“dataset_description.json”, “CHANGES”, “README”, “LICENSE”]
If True or “all”, download all keys in self.non_sub_s3_keys also. If a subset of [“dataset_description.json”, “CHANGES”, “README”, “LICENSE”], download only those files. This is useful if the non_sub_s3_keys contain files common to all subjects that should be inherited. Default: (“dataset_description.json”,)
- include_derivsbool or str
If True, download all derivatives files. If False, do not. If a string or sequence of strings is passed, this will only download derivatives that match the string(s) (e.g. [“dmriprep”, “afq”]). Default: False
- include_derivs_dataset_descriptionbool
Used only if include_derivs is not False. If True, dataset_description.json downloaded for each derivative.
- suffixstr
Suffix, including extension, of file(s) to download. Default: None
- overwritebool
If True, overwrite files for each subject. Default: False
- pbarbool
If True, include progress bar. Default: True
See also
AFQ.data.S3BIDSSubject.download
- class AFQ.data.s3bids.HBNSite(site, study_id='HBN', bucket='fcp-indi', s3_prefix='data/Projects/HBN/MRI', subjects=None, use_participants_tsv=False, random_seed=None)[source]#
Bases:
S3BIDSStudy
An HBN study site
See also
AFQ.data.S3BIDSStudy
- _get_derivative_types()[source]#
Return a list of available derivatives pipelines
The HBN dataset is not BIDS compliant so to go a list of available derivatives, we must peak inside every directory in derivatives/sub-XXXX/
- Returns
- list
list of available derivatives pipelines
- _get_non_subject_s3_keys()[source]#
Return a list of ‘non-subject’ files
In this context, a ‘non-subject’ file is any file or directory that is not a subject ID folder. This method is different from AFQ.data.S3BIDSStudy because the HBN dataset is not BIDS compliant
- Returns
- dict
dict with keys ‘raw’ and ‘derivatives’ and whose values are lists of S3 keys for non-subject files
See also
AFQ.data.S3BIDSStudy._get_non_subject_s3_keys
- download(directory, include_modality_agnostic=False, include_derivs=False, overwrite=False, pbar=True)[source]#
Download files for each subject in the study
- Parameters
- directorystr
Directory to which to download subject files
- include_modality_agnosticbool, “all” or any subset of [
“dataset_description.json”, “CHANGES”, “README”, “LICENSE”]
If True or “all”, download all keys in self.non_sub_s3_keys also. If a subset of [“dataset_description.json”, “CHANGES”, “README”, “LICENSE”], download only those files. This is useful if the non_sub_s3_keys contain files common to all subjects that should be inherited. Default: False
- include_derivsbool or str
If True, download all derivatives files. If False, do not. If a string or sequence of strings is passed, this will only download derivatives that match the string(s) (e.g. [“dmriprep”, “afq”]). Default: False
- overwritebool
If True, overwrite files for each subject. Default: False
- pbarbool
If True, include progress bar. Default: True
See also
AFQ.data.S3BIDSSubject.download
- AFQ.data.s3bids.s3fs_nifti_read(fname, fs=None, anon=False)[source]#
Lazily reads a nifti image from S3.
- Returns
- nib.Nifti1Image class instance
Notes
Because the image is lazily loaded, data stored in the file is not transferred until get_fdata is called.
- AFQ.data.s3bids.write_json(fname, data)[source]#
Write data to JSON file.
- Parameters
- fnamestr
Full path to the file to write.
- datadict
A dict containing the data to write.
- Returns
- None
- AFQ.data.s3bids.read_json(fname)[source]#
Read data from a JSON file.
- Parameters
- fnamestr
Full path to the data-containing file
- Returns
- dict
- AFQ.data.s3bids.s3fs_json_write(data, fname, fs=None)[source]#
Writes json from a dict directly into S3
- Parameters
- datadict
The json to be written out
- fnamestr
Full path (including bucket name and extension) to the file to be written out on S3
- fsan s3fs.S3FileSystem class instance, optional
A file-system to refer to. Default to create a new file-system.