Tutorial 2: Load physio data in BIDS format
This tutorial shows how to load physiological data that is in BIDS format. For details on BIDS specifications about physiological recordings, please visit https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/06-physiological-and-other-continuous-recordings.html
In niphlen we’ve got a function, niphlem.input_data.load_bids_physio
, dedicated to loading data in such a format. Let’s import it and show its documentation.
[1]:
from niphlem.input_data import load_bids_physio
print(load_bids_physio.__doc__)
Load physiological data in BIDS format.
Parameters
----------
data_file : str, pathlike
Path to recording bids physio file.
json_file : str, pathlike
Path to the sidecar json file of the input bids physio.
resample_freq : float, optional
Frequency to resample the data. The default is None.
sync_scan : bool, optional
Whether we want the signal to be synchronized
with the scanner times. The default is True.
Returns
-------
signal : ndarray
The signal, where each columns corresponds to a particular
recording, whose names can wh be identfied in the meta_info
dictionary returned, and the rows to observations.
meta_info : dict
Meta information that at least contains the sampling frequency,
the start time of the signals, and the name of each signal column.
As we can see, this function accepts four arguments, with the last two being optional:
The first argument, data_file, is a path to the file that contains the data that we want to load. According to BIDS specs, physiological recordings’ filenames with the data should use the “_physio” suffix and be a gzip compressed TSV file. Therefore, they should end with “_physio.tsv.gz” and niphlem will prompt an error if they are not so.
The second argument is a json sidecar file that contains meta information about the data. Again, according to BIDS, this should be a json file and contain at least three required fields: “SamplingFrequency”, “StartTime” and “Columns”. Niphlem thus checks that one passes a json file and that these fields are presented. It also checks that the number of names in “Columns” is the same as the number of columns in the data, otherwise it gives a warning message.
The third argument, resample_freq, is optional and allows you to resample the data to a given frequency. It is common that different physiological recordings are acquired at different frequencies, so this parameter allows the user to be able to have all the recordings at the same frequency.
The last argument, sync_scan, is also optional and basically ensures that physiological data starts at the same time of the scanner.
Let’s see how we can load data through this function for a couple of our recordings.
[2]:
ecg_file = "./data/demo/physio/bids/sub-06_ses-04_task-resting_run-01_recording-ECG_physio.tsv.gz"
ecg_json = "./data/demo/physio/bids/sub-06_ses-04_task-resting_run-01_recording-ECG_physio.json"
puls_file = "./data/demo/physio/bids/sub-06_ses-04_task-resting_run-01_recording-pulse_physio.tsv.gz"
puls_json = "./data/demo/physio/bids/sub-06_ses-04_task-resting_run-01_recording-pulse_physio.json"
Let’s start with ECG data:
[3]:
ecg_data, meta_ecg = load_bids_physio(ecg_file, ecg_json)
print("ECG data has %d observations and %d columns." % ecg_data.shape)
print("And they have the following meta information:")
print(meta_ecg)
ECG data has 211791 observations and 7 columns.
And they have the following meta information:
{'Columns': ['ECG3', 'ECG2', 'ECG4', 'ECG1', 'ECG_TRIGGER', 'PULS_TRIGGER', 'scanner'], 'SamplingFrequency': 400, 'StartTime': 0.0, 'TaskName': 'resting'}
As we can see, our ECG data have signal for four electrodes, two triggers and scanner ticks that were acquired during a resting-state task, at a frequency of 400 Hz. The first data point of the data corresponds to time=0, i.e. the starting time of the scanner.
Now let’s see what happens if we set sync_scan to False.
[4]:
ecg_data, meta_ecg = load_bids_physio(ecg_file, ecg_json, sync_scan=False)
print("ECG data has %d observations and %d columns." % ecg_data.shape)
print("And they have the following meta information:")
print(meta_ecg)
ECG data has 216021 observations and 7 columns.
And they have the following meta information:
{'Columns': ['ECG3', 'ECG2', 'ECG4', 'ECG1', 'ECG_TRIGGER', 'PULS_TRIGGER', 'scanner'], 'SamplingFrequency': 400, 'StartTime': -10.575000000000001, 'TaskName': 'resting'}
As we can see, we have more observations than before, because we are including those physiological points acquired before scanner started to recollect data.
Now let’s do the same with pulse-ox data:
[5]:
puls_data, meta_puls = load_bids_physio(puls_file, json_file=puls_json)
print("Pulse-ox data has %d observations and %d columns." % puls_data.shape)
print("And they have the following meta information:")
print(meta_puls)
Pulse-ox data has 105895 observations and 4 columns.
And they have the following meta information:
{'Columns': ['pulse', 'ECG_TRIGGER', 'PULS_TRIGGER', 'scanner'], 'SamplingFrequency': 200, 'StartTime': 0.0, 'TaskName': 'resting'}
As we can see, the pulse-ox data were instead acquired at a frequency of 200 Hz. We can try to have this signal to the same frequency of that of ECG by performing a resampling to 400 Hz.
[6]:
puls_data, meta_puls = load_bids_physio(puls_file, json_file=puls_json, resample_freq=400)
print("Pulse-ox data has been resampled to 400 Hz, "
"so now they have %d observations" % puls_data.shape[0])
print("And they have the following meta information:")
print(meta_puls)
Pulse-ox data has been resampled to 400 Hz, so now they have 211791 observations
And they have the following meta information:
{'Columns': ['pulse', 'ECG_TRIGGER', 'PULS_TRIGGER', 'scanner'], 'SamplingFrequency': 400.0, 'StartTime': 0.0, 'TaskName': 'resting'}
Note: load_bids_physio
will fail if the files passed are not BIDS compliance. For example, if the data and json file do not have the same name pattern. Please, make sure that your data is in BIDS and that you pass it correctly.
[7]:
load_bids_physio(puls_file, ecg_json)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_2344130/4107982115.py in <module>
----> 1 load_bids_physio(puls_file, ecg_json)
~/anaconda3/lib/python3.8/site-packages/niphlem/input_data.py in load_bids_physio(data_file, json_file, resample_freq, sync_scan)
354 # Check that both files have the same name without extensions
355 if data_file.split(".tsv.gz")[0] != json_file.split(".json")[0]:
--> 356 raise ValueError("data file and json file do not have the same "
357 "name (without extensions), which invalidates "
358 " BIDS specification")
ValueError: data file and json file do not have the same name (without extensions), which invalidates BIDS specification