Feature Extraction and Cluster Analysis on Brain Imaging Data


Home > Professional > Programming > Feature Extraction and Cluster Analysis
	Feature Extraction and Cluster Analysis on Brain Imaging Data This program, somewhat misguidedly named VV-Classifier, performs feature extraction and subsequent cluster analysis for certain EEG (electroencephalography) brain imaging data. The Data The data is so called TFR (Time-Frequency Representation) data, also known as spectrogram. What you see above is part of a matrix from a single electrode measuring brain activation. The X axis is time in the order of milliseconds, the whole width of the matrix above represents about one second. (As is usual in cognitive brain research the data is already averaged over a number of trials, so it doesn't represent a single physical period of time.) The Y axis is frequency, here about 0-40Hz. Each point represents a voltage. Bright red shows the highest positive values and deep blue shows lowest negative values. This program reads data files in Matlab format made by NeuroScan software. Each matrix in Matlab format actually contains two vectors and one 3D matrix. The vectors give the scales for X and Y axis. The 3D matrix contains data for all electrodes that were used, e.g. 20 or 64 channels. Feature Extraction The feature extraction is visualized in the animated picture above. Feature extraction begins by marking the highest and lowest areas, shown in white in the picture above. The high and low areas are enclosed in minimum bounding rectangles. The rectangles are "sliced" to get smaller rectangles that conform rather tightly to the shapes of the marked areas. In further processing, only the coordinates of the small rectangles are used. Thus, information in a matrix with thousands of data points is reduced to a few dozen points, the coordinates of the rectangles. Cluster analysis The cluster analysis is based on a distance matrix. The distance metric is based on finding the closest corresponding corners in the two matrice being compared, e.g. the closest positive top-left corner in matrix B for each positive top-left corner in matrix A. To perform the cluster analysis you can set your desired number of clusters in the end and a cutoff percentage to say what portion of data you're willing to leave out of the clusters. For example you could say that you want at least 90% of the data to be clustered into 3 largest clusters. You can always cluster 100% of the data if you will, but it may be just reasonable to assume that a part of the data is 'unclean' or otherwise invalid. By leaving a part out, you get in your main clusters only the percentage that best fit in those clusters. The graphical result of the cluster analysis can be seen below. The cluster "Class 1" here is composed of 1591 individual matrice. This picture shows what is common to all those matrice. The brighness of red in each spot is determined by the ratio of matrice belonging to the class that have a positive rectangle containing that point. The brighness of blue for each point is the same with negative rectangles. Features Program is a stand-alone Java application, runs on any system where Java 1.4 or newer is installed. Reads Matlab format files output by NeuroScan software. Currently all files in an analysis need to have exactly the same scale and dimensions. Can be run in matrix-at-once or electrode-at-once mode. In matrix-at-once mode whole 3D matrice are compared against each other so that only electrodes at corresponding locations are compared with each other. In electrode-at-once each 2D matrix from a single electrode is handled independently, without regard to the 3D matrix it belongs to or electrode location. Computational efficiency is remarkable compared to other methods for analysing this type of data. Electrode-at-once mode is the more complex, but calculating the distance matrix for about 4000 electrodes took roughly 2 hours on an Athlon64 3500+ PC. The distance matrix is the heaviest part of computation. The cluster analysis happens within seconds even for large data sets. If you're interested in trying out the program or developing it further, please contact me!