Data Cleaning Algorithms

While we do provide cleaning algorithms it is important to recognize that there are circumstances for which application of a cleaning algorithm can not separate noise from the signals of interest. For example, segments of EEG data that are clipped at the full range of the amplifier or A/D converter can not be cleaned. This is similarly the case for other types of data such as ECG and GSR.  This is not a limitation of the cleaning algorithms but is a result that clipping the data actually reduces the information in the data to zero at those points that the data are clipped at the saturation point of the amplifier or A/D converter.

Moreover, persons using these algorithms should continue to do their best to collect noise-free data.  Once the noise begins to dominate the data, the performance of the cleaning algorithms with become diminished.  Generally speaking, the noisier the dataset provided, additional minutes of data should be collected.

For best performance, it is important that the same equipment is always used for the same user account.  For example, if you have two different EEG machines in your lab, a separate URDA-EEG account should be used for each machine.  If this requirement is not adhered to, non-optimal results might be generated.

Description of Algorithms for Cleaning The Electroencephalogram (EEG)

“(1) Wavelet”

General Description: This algorithm is designed for cleaning the EEG resulting from the brain function of “stationary” behavior such as for an eyes-open or eyes-closed data collection paradigm. This algorithm assumes that the useful information contained in the EEG data are stationary. That is, it assume that there is very little change in the on-going statistics of the signals of interest contained in the EEG data. Hence, this algorithm is optimal for removing momentary artifacts in EEG collected while a participant is in an eyes-closed or eyes-opened state. Of course, a wondering mind will introduce varied brain function into the EEG and result in some non-stationarity.  Try the algorithm and see if it is right for you.

For best performance: Provide data sampled at at least 256 samples per second. Even better–  sample at 512 samples per second.  This helps the algorithm separate noise from signal.  Please use sampling rates that are a power of 2.  Do not low-pass or bandpass filter the data as this will be done for you and high frequencies are required by the cleaning algorithm.  Do not notch filter the line noise– this will be done automatically for you. *Best results are obtained if 5 minutes or more of continuously recorded data are provided. We recommend cleaning your data as a continuous segment before segmenting your data into small epochs for analysis.

How it works: This algorithm uses statistics calculated in the wavelet domain to identify the location and wavelet-scale/level of artifact-related wavelet coefficients. Once artifact wavelet coefficients are identified, they are removed in the wavelet domain as to preserve (as much as possible) the integrity of the non-artifact information contained in the EEG data. No ICA or PCA unmixing of the EEG is used by this cleaning algorithm. This particular algorithm also includes a process called “cycle-spinning” which make the wavelet transform “translation invariant”.

Important Features: Only segments of data that are determined to contain artifact are ‘touched’ by the algorithm. Data in time intervals without artifact are not modified by the algorithm. Hence, this algorithm causes zero distortion to the clean data and (as best as possible) improves the signal quality in noisy sections of data.

The output of this algorithm provides both the cleaned data and an indicator of the time locations of the artifacts identified by the algorithm.

“(2) Wavelet Subspace”

General Description: This algorithm is more aggressive at removing artifact than the Wavelet Only v1 algorithm.  More description to follow…

For best performance: Provide data sampled at at least 256 samples per second. Even better–  sample at 512 samples per second.  This helps the algorithm separate noise from signal.  Please use sampling rates that are a power of 2.  Do not low-pass or bandpass filter the data as this will be done for you and high frequencies are required by the cleaning algorithm.  Do not notch filter the line noise– this will be done automatically for you.  *Best results are obtained if 5 minutes or more of continuously recorded data are provided. We recommend cleaning your data as a continuous segment before segmenting your data into small epochs for analysis.

“(3) Wavelet Feature”

General Description: This algorithm targets specific features in the EEG and by doing so, gives the user a little more control over the cleaning process.  The algorithm is currently experimental and requires supervision on the server side.

Description of Algorithms for Cleaning The Electrocardiogram (ECG)

No description is available.


General Definitions Pertaining to All Algorithm

Stationary Behavior – this implies that there is no change in psychological parameters or brain function of the participant during the interval in which EEG data are recorded. Of course, this is not absolutely possible in the strictest sense– participants will always have ‘stray’ thoughts and normal brain processing related to changes in the environment. However, it is expected that the behavior be as unchanging as possible. For example, a paradigm of “eyes-closed” data collection or “eyes-open” data collection (with a clear mind) might be considered “stationary behavior”.

Translation invariant – this means that the wavelet transform will work optimally at all time locations of events in the data.  The event to be matched to the wavelet does not have to occur at a particular time location in the analysis window.