Data Cleaner Software
Two statistical cleaning algorithms are currently available using the URDA-EEG Data Cleaning software.
Our first algorithm simply removes very large outliers from the data while the second algorithm maps aspects of the data into multiple ‘spaces’ for analysis to successively separate variance in the data related to brain function from non-brain function sources. Each of these algorithms have an advantage over the other and we are continually improving our existing algorithms and will be adding more algorithms to this set in the future.
Data cleaning results described in terms of improved reliability or improved stability of statistics resulting from artefact removal are provided in Figures 1 and 2 below. The plots were generated by using the “Test – Retest” method. Essentially, we pulled random blocks of data from our dataset before cleaning and made ratio comparisons of the variance in each block to measure the similarity of their variances. The basic principle behind this method is described on the futurehealth.org site under the heading “Statistical Inferences and Reliability”. The method is also described on a related QEEG site. Adding to their method described, we have used permutations in order to generate stable statistics from the dataset (many, many block comparisons). Hence, the figures below illustrate the percent of the pair-wise comparisons that are above a block variance similarity threshold of 0.9. All you really need to know is that when an algorithm does well, the bars in the bar graph go up towards 100%. For both of these figures, we have used the same dataset but cleaned this dataset using the (Figure 1) Wavelet Only method, and (Figure 2) Wavelet Subspace method. The dataset used to generate these results might generally be considered ‘reasonably clean’ and it is clear effort was taken to recored good data. However, the data have noticeable (by visual inspection of the raw data) eye-movements, blink, and cranial muscle artefact. It is worthwhile to note that this is a continuous recording– special physiologically constant ‘brain-states’ that might be used in QEEG classification were not segmented out from a larger dataset.
Figure 1. Wavelet Only Method: Red bars pertain to raw data before cleaning while Blue bars pertain to the data after they have been cleaned. The data plotted in this figure show that for most EEG channels, the percent similarity of variance among blocks (10-second blocks) of data increased when the data were put through the Wavelet Only cleaning process. In the case of channel 6, the similarity of variance decreased slightly.
The most notable recent result comes from application of our Wavelet Subspace algorithm illustrated in Figure 2. Notice that in contrast to the results of Figure 1, the percent similarity of variance among blocks increased for all channels and it increased much more than in the Figure 1 result. The next step in our development is to demonstrate that what the data cleaning software is subtracting from the data is indeed truly artefact and non-brain.
Figure 2. Wavelet Subspace Method: Red bars pertain to raw data before cleaning while Blue bars pertain to the data after they have been cleaned. The data plotted in this figure show that for all EEG channels, the percent similarity of variance among blocks of data increased (10-second blocks). The increase in this case is much larger than the increase in the case illustrated in Figure 1.
To download a copy of our URDA-EEG software that provide data cleaning, go to our product download page.
The data cleaning software also provides the function of separating the estimates of the statistically stationary (by time-frequency) and the non-stationary components of the cleaned brain activity data. Doing so supports two types of analysis: (1) an analysis of brain function that varies statistically over the course of the recording paradigm and in theory should be reflected in variables of behavior obtained over the recording period, and (2) an analysis of the brain function ‘background’ of the non-stationary component. This ‘background’ brain function should be indicative of brain function that belies the moment-to-moment behavior during the data recording session. The procedure is not unlike prior research that involves the characterization of ‘background’ or tonic muscle activity for the purpose of detecting momentary non-stationary muscle activity. (Prinz, R., Zeman, P.M., Neville S., Livingston, N.J. Feature Extraction Through Wavelet De-Noising of Surface EMG Signals for the Purpose of Mouse Click Emulation, IEEE Xplore. ) The process was originally established by David Donoho and his colleagues. In this case, the background brain function component is a sum of the stationary brain function and of the stationary noise in the EEG recording system and is analogous to the sum of noise generated by the EMG recording system and the tonic background muscle activity described in the paper cited above. To demonstrate, the same EEG dataset as was used in Figures 1 and 2 was used to generate the results of Figure 3. Figure 3 below illustrates a comparison of the test re-test results for the background brain function and raw unprocessed data. It is interesting that although artefact has been largely removed and the non-stationary component of the remaining brain activity has been ‘subtracted’ from the data, the pair-wise block comparisons do not have exactly the same variance; the level of similarity of variance is at approximately 50% for 10-second blocks. This could be because the wavelet methods that separate the stationary and non-stationary components separately at each wavelet scale while the measure of test-retest examines the variance of the sum of all wavelet scales (in the regular time-domain. Hence, differences in blocks might be the result of synchronization and desyncronization of peaks and troughs in different frequency bands. (A better test re-test method would examine each frequency band or wavelet scale separately and identify the percentage of similar moments of brain function contained in the EEG.) Whatever the reason, the plot does provide characteristics of the brain activity after the estimated moment-to-moment non-stationary brain function have been removed.
Figure 3. Background EEG: Calculated by subtracting the non-stationary component of the EEG after removing artefacts from the EEG using the Wavelet Subspace Method). Error bars were generated as the standard deviation of 10 trials of 500 permutations of block comparisons each to estimate what percentage of 10-second blocks have a matching variance of 90% or above. Blue bars pertain to the background EEG calculated using wavelet methods after the data have been cleaned while the red bars pertain to the raw data before processing.


