.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/demo_afq_dataset.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_demo_afq_dataset.py: ===================================== Load and interact with an AFQ dataset ===================================== This example loads AFQ data from CSV files and manipulates that data using scikit-learn transformers and estimators. First we fetch the Weston-Havens dataset described in Yeatman et al [1]_. This dataset contains tractometry features from 77 subjects ages 6-50. Next, we split the dataset into a train and test split, impute missing values, and fit a LASSO model, all using :class:`AFQDataset` methods. Predictive performance for the default LASSO model is abysmal. It is only used here to demonstrate the use of scikit-learn estimators. In a research setting, one might use more advanced estimators, such as the SGL [2]_, a gradient boosting machine, or a neural network. Finally, we convert the AFQDataset to a tensorflow dataset and fit a basic one-dimensional CNN to predict age from the features. This last step requires that AFQ-Insight has been installed with:: pip install afqinsight[tf] or that tensorflow has been separately installed with:: pip install tensorflow .. [1] Jason D. Yeatman, Brian A. Wandell, & Aviv A. Mezer, "Lifespan maturation and degeneration of human brain white matter" Nature Communications, vol. 5:1, pp. 4932, 2014 DOI: 10.1038/ncomms5932 .. [2] Adam Richie-Halford, Jason Yeatman, Noah Simon, and Ariel Rokem "Multidimensional analysis and detection of informative features in human brain white matter" PLOS Computational Biology, 2021 DOI: 10.1371/journal.pcbi.1009136 .. GENERATED FROM PYTHON SOURCE LINES 38-50 .. code-block:: default import afqinsight.nn.tf_models as nn import os.path as op import tensorflow as tf from afqinsight.datasets import download_weston_havens from afqinsight import AFQDataset from sklearn.impute import SimpleImputer from sklearn.linear_model import Lasso from sklearn.metrics import r2_score from sklearn.model_selection import train_test_split .. GENERATED FROM PYTHON SOURCE LINES 51-63 Fetch example data ------------------ The :func:`download_weston_havens` function download the data used in this example and places it in the `~/.cache/afq-insight/weston_havens` directory. If the directory does not exist, it is created. The data follows the format expected by the :func:`load_afq_data` function: a file called `nodes.csv` that contains AFQ tract profiles and a file called `subjects.csv` that contains information about the subjects. The two files are linked through the `subjectID` column that should exist in both of them. For more information about this format, see also the `AFQ-Browser documentation `_ (items 2 and 3). .. GENERATED FROM PYTHON SOURCE LINES 63-66 .. code-block:: default workdir = download_weston_havens() .. GENERATED FROM PYTHON SOURCE LINES 67-73 Read in the data ---------------- Next, we read in the data. The :func:`AFQDataset.from_files` static method expects a the filenames of a nodes.csv and subjects.csv file, and returns a dataset object. .. GENERATED FROM PYTHON SOURCE LINES 73-81 .. code-block:: default dataset = AFQDataset.from_files( fn_nodes=op.join(workdir, "nodes.csv"), fn_subjects=op.join(workdir, "subjects.csv"), dwi_metrics=["md", "fa"], target_cols=["Age"], ) .. GENERATED FROM PYTHON SOURCE LINES 82-87 Train / test split ------------------ We can use the dataset in the :func:`train_test_split` function just as we would with an array. .. GENERATED FROM PYTHON SOURCE LINES 87-90 .. code-block:: default dataset_train, dataset_test = train_test_split(dataset, test_size=1 / 3) .. GENERATED FROM PYTHON SOURCE LINES 91-96 Impute missing values --------------------- Next we train an imputer on the training set and use it to transform the features in both the training and the test set. .. GENERATED FROM PYTHON SOURCE LINES 96-101 .. code-block:: default imputer = dataset_train.model_fit(SimpleImputer(strategy="median")) dataset_train = dataset_train.model_transform(imputer) dataset_test = dataset_test.model_transform(imputer) .. GENERATED FROM PYTHON SOURCE LINES 102-107 Fit a LASSO model ----------------- Next we fit a LASSO estimator to the training data and print the score of that model on the test dataset. .. GENERATED FROM PYTHON SOURCE LINES 107-115 .. code-block:: default estimator = dataset_train.model_fit(Lasso()) y_pred = dataset_test.model_predict(estimator) train_score = dataset_train.model_score(estimator) test_score = dataset_test.model_score(estimator) print("LASSO train score:", train_score) print("LASSO test score: ", test_score) .. GENERATED FROM PYTHON SOURCE LINES 116-132 Convert to tensorflow datasets ------------------------------ Next we convert the train and test datasets to tensorflow datasets and use one of AFQ-Insight's built-in one-dimensional CNNs to predict age. This part of the example will only work if you have either installed AFQ-Insight with tensorflow using:: pip install afqinsight[tf] or separately install tensorflow using:: pip install tensorflow This model also performs poorly. It turns out predicting age in this dataset requires a bit more work. .. GENERATED FROM PYTHON SOURCE LINES 132-155 .. code-block:: default tfset_train = dataset_train.as_tensorflow_dataset() tfset_test = dataset_test.as_tensorflow_dataset() batch_size = 2 tfset_train = tfset_train.batch(8) tfset_test = tfset_test.batch(8) print("CNN Architecture") model = nn.cnn_lenet( input_shape=(100, 40), output_activation=None, n_classes=1, verbose=True ) model.compile( loss="mean_squared_error", optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4), metrics=["mean_squared_error"], ) model.fit(tfset_train, epochs=500, validation_data=tfset_test, verbose=0) print() print("CNN R^2 score: ", r2_score(dataset_test.y, model.predict(tfset_test))) .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.000 seconds) .. _sphx_glr_download_auto_examples_demo_afq_dataset.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: demo_afq_dataset.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: demo_afq_dataset.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_