Non-invasive EEG based Feature Extraction framework for Major Depressive Disorder analysis

epression and several other behavioral health disorders are serious public health concerns worldwide. Persistent behavioral health issues have a wide range of consequences that affect people personally, culturally and socially. Major depressive disorder (MDD) is a psychiatric ailment that affects people of all ages worldwide. It has grown into a major global health issue as well as an economic burden. Clinicians are using several medications to limit the growth of this disease at an early stage in young people. The goal of this research is to improve the depression diagnosis by altering Electroencephalogram (EEG) signals and extracting the Differential Entropy (DE) and Power Spectral Density (PSD), using machine learning and deep learning techniques. This study analyzed the EEG signals of 30 healthy people and 34 people with Major Depressive Disorder (MDD). K-nearest neighbors (KNN) had the highest accuracy among machine learning algorithms of 99.7%, while Support vector machine (SVM) had acquired 95.7% accuracy. The developed Deep Learning approach, convolution neural network (CNN), achieved 99.6% accuracy. With these promising results, this study establishes the viability of an Electroencephalogram based diagnosis of MDD.

Signal capture using reduced electrode sensors can minimize the computing complications of ML models and strain out extraneous noises in brain-computer interface (BCI) equipment. Differential entropy (DE) and Power Spectral Density (PSD) are essential components of emotional signals that might indicate changes in regional activity. As a result, a technique of DE and PSD features recognition based on multiple Machine Learning models was proposed in this research to acquire prominent feature signals and increase classification accuracy based on features extracted.

RELATED WORK
Before beginning our research, a list of previous study characteristics was compiled to compare and debate reliable methods and outcomes. In persons with depression, Berman et al. [12] discovered that anxious behaviors could only be diagnosed in the EEG recordings of resting-state and non-tasked signals. The electrodes count and position used to acquire the signals and the standards employed in the investigations were all different. The data preparation methods were then examined, which varied in the studies, either typical sub-bands or broadband signals were employed. The raw signals were analyzed along with some reduction techniques, for instance Fourier perusal, wavelet-based or cosine transmute. Some chose to manually or remove the artifact or to remove the noise (automatically) by certain approaches.
In contrast others rely on the evaluation of the epochs from noiseless areas of obtained signal (potentially creating new sources of artifacts in the process) (no artifact removal). The amount of filtering and preprocessing done, whether researchers concentrated on any particular feature of the spectral makeup of the signal, were other points of discussion. The techniques utilized to extricate or choose the features differed from study to study.
The EEG samples of 30 MDD subjects and 30 MDD free people were collected by Faust et al. [13]. He used wavelet packet decomposition to extricate sub-bands from the raw EEG data. The collected sub-bands were used to determine bispectral entropy (containing maximum sequenced spectra approach from Fourier perusal), Renyi entropy, approximation entropy, and sample entropy. The entropy measures were calculated using the sub-bands that were recovered as input. With 10-fold cross-validation, a mixture model (Gaussian), decision tree model, KNN, PNN, NBC, fuzzy classifier (Sugeno), and SVM were implemented. The accuracy obtained was 99.5 percent, with a sensitivity of 99.2 percent and a specificity of 99.7 percent.
A technique based on EEG signals from the scalp was developed by Liao et al. [14]. It contained robust spectral, spatial feature extrication of EEG signals based on the kernel eigen filter bank common spatial pattern (KEFB-CSP). They then filtered each sub-multichannel band's EEG data (30 electrode traces) from the indigenous electrode space to a new area where the current EEG signals, i.e., CSPs, are most appropriate for discriminating MDD subjects from those with the healthy controls. Lastly, they employed PCA (kernel) to convert CSPs from every frequency sub-bands to the vector feature of KEFB CSP, which is a lowerdimensional feature vector. Using KEFB CSP, the study had achieved an accuracy of 80%.
Mumtaz et al. [15] in 2017 reported just one sample size with same methodologies proposed three techniques [15] and two investigations in the year 2018 with 33 depressed and 30 healthy people [16,17]. Mumtaz et al. [15] achieved 91.7 percent LR classification accuracy, 98 percent SVM classification accuracy, and 93.6 percent NB classification accuracy.
In addition to the SASI spectral measure, Bachmann et al. [18] evaluated the HFD, DFA and Lempel-Ziv complexity (LZC) from the EEG signals as features. A total of 26 people were employed in the study, 13 of them had MDD, while 13 healthy people were chosen to gather their EEG signals. He used LOOCV and logistic regression to classify persons based on their traits. Using nonlinear measures to characterize the resting-state EEG resulted in exceptionally effective classification. With HFD and DFA, and HFD and LZC, he reached the greatest accuracy of 85 percent, and high accuracy of 77 percent with only one nonlinear component.
Mohammadi et al. [19] employed a genetic algorithm that (GA) to determine the closely relevant characteristics after mapping the EEG features into the new space of the feature, which is the data evaluation phase. He then used DT to build a several other prediction models. The EEG signals were gained from 53 depressed patients and 43 normal volunteers. The model they employed for their investigation had an average accuracy of 80%. In terms of generalizations, the authentication of brilliant accuracies achieved in the study was missing and ambiguous. It questions the reliability of the study along with the validation. The average accuracy of ML models was in between 90.24 percent to 97.56 percent, according to Cukic et al. [20]. The Sample Entropy metric outperformed the other two. This study looked at the brain waves of 26 MDD and 20 normal people. When compared to previous research, including only the resting EEG signals with eyes opened or closed only, it became a possibility to authenticate that the electrodes number is a significant influence, as PCA data illustrated that each electrode has its very own effect on the outcomes [20,21].
The related work clearly shows that the EEG features depict some great information about various brain pathologies and the evaluation of only resting EEG signals has reached to a bottleneck. So, analysis of EEG features from the non-resting EEG signals was entailed. In this proposed research, a complete feature-based framework is developed to analyze the Differential Entropy and Power Spectral Density of EEG signals. It is then followed by a complete process of feeding the features into several Machine Learning algorithms to classify pathological and non-pathological EEG from resting and non-resting EEGs. MATERIAL AND METHOD Participants: This research is based on a dataset provided by Mumtaz [22], which is freely available to the public. The committee approved the procedure of ethics of Hospital University Sains Malaysia. The depressed (MDD) patients qualified the diagnosis criteria for unipolar depression without any psychotic symptoms. Participants were required to abstain from caffeine, nicotine, and alcohol before recording brain signals. To avoid any errors, data collection for each participant was done at the same time of day. The EEG data were captured for a short period of time, 5 minutes, which helped the subjects avoid tiredness and disinterest. EEG signals were obtained from 30 healthy people ranging in age from 22 to 53 (mean = 38.3 +/-15.6) and 34 people with severe depressive disorder ranging in age from 27 to 53 (mean = 40.3 +/-12.9). EEG data acquisition and preprocessing: The brain waves were recorded utilizing an EEG cap on the scalp and the 10-20 international standard placement of electrode system, as shown in figure 1. EEG signals were recorded for 5 minutes in several states: open eyes (EO), closed eyes (EC), and while executing a cognitive task (P300). Temporal (t3, t4, t5, t6), occipital (O1, O2), frontal (fp1, fp2, f3, f4, f7, f8, fpz), parietal (P3, P4, P7, P8), and central (C3, C4) electrodes are placed on the skull and cover all four lobes. EEG signals were re-referenced to the infinity reference (IR) for further EEG analysis using a reference electrode (LE) attached to the ear [23].
The reference electrode is utilized to influence the amplitude at each EEG channel and at a certain time point by establishing the electrode potential. The reference electrode determines the zero voltage level, and the rest of the channels are stated in respect to the reference electrode. EEG signals are among the most complicated and disordered biological signals. Muscle artifacts or EMG signals, heart signals or ECG, eyeball movements EOG, or interference with another device can all generate disturbance in these signals.
All raw EEG data must first be denoised in order to achieve accurate findings in the feature extraction and classification models.The noise is removed using a series of filters. The sample frequency of the EEG signals was adjusted to 256 Hz in order to exclude these unrequired signals. A Notch filter was later used to denoise the 50 Hz power line. Also, an amplifier was employed to boost all of the weak brain signals simultaneously. To fully process the raw the EEG signals, they were band-pass filtered with cutoff frequencies of 0.1 Hz and 50 Hz for each channel (22 channels) and state (EO, EC, and TASK). The EEG signals data is first preprocessed then the extrusion of two EEG features is done. The extracted features are then provided to various classifiers, and then the performance is measured. Feature extraction: The main principle behind feature extraction is that the highly dimensioned input data is turned into a smaller representative collection of characteristics that still contain important data. The power spectral density (PSD) method is a widely featured extraction technique because it isolates the important properties of signals that allow the observation of mental and moving tasks [24]. Most prior studies [25][26][27][28][29] employed this strategy to investigate epileptics and hypnosis.
PSD techniques have shown to be the most consistent in terms of robustness and efficacy in extruding the different patterns of spectra needed to effectively discriminate among motor imagery EEG signals [30]. A novel extraction of features method based on the Power Spectral Density (PSD) and Differential Entropy (DE) of the signal's frequency distributions is presented. We initially estimated the PSD using the Welch Periodogram method, using a 400point hamming window with a 50% overlap between subsequent windowed portions. For each participant, we visually examined the entire frequency range. In keeping with prior research [31,32], we discovered that the PSD had a consistent pattern with varying amplitudes across all subjects and tasks. This physiologic phenomenon enables the classification of various mental functions, including depression.
The baseline signal was separated into three parts for Differential Entropy (DE) using sliding windows of 1, 10, 30, and60 seconds. The original signal length was the 60s; hence 60 seconds was the time-step window. A τ window divided the signal into à=60/τ, à=60/τ portions. The actual data's channel signal was extracted. Each segment's original signal was decomposed into alpha brain wave, beta brain wave, gamma brain wave, theta brain wave, and delta brain wave using Butterworth filters.
The feature extraction is followed by the classifying of EEG signals into MDD and normal subjects. The models used are KNN, SVM, and CNN. In the end, the performance is measured with the help of a confusion matrix. The confusion obtains the classification accuracy of the various models used. The steps taken in the proposed methodology are presented. Classification Figure 2 depicts the step by step flow of methods followed in this study, including EEG preprocessing, feature extraction, new feature generation, and classification using several models. K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Convolutional Neural Network (CNN) were the machine learning classifiers employed in this study. The preprocessing is followed by feature extraction, which extracts PSD and DE, and finally, classifiers are fed with predetermined features to detect depressed and healthy patients. The block diagram of the proposed methodology is presented in figure 2.

K-Nearest Neighbors (KNN):
The supervised classifier K-Nearest Neighbors is one of the most basic classification models. For pairs of monitoring, KNN uses an in-space function. The K-Nearest Neighbor algorithm finds the k-nearest training sample for a test sample. After then, the testing sample is assigned to a certain class, which is usually the most common class among all the k-nearest training data. The KNN method requires an integer value for the variable k and a metric to calculate proximity [33].In this suggested study the n-neighbors hyper parameter was set to 5, which is the default value. To avoid a snag in the classification, an odd number is chosen. Support Vector Machine (SVM): A support vector machine (SVM) divides the space of features into decision borderlines, which must be linear in the modified area, explained by the kernel function, and uniquely provided by a subset of the data [34,35]. The maximum margin classifier created by Support Vector Machines widens the space between the decision boundary and the support vectors. Coupled with the soft-margin classifier, the permanent regularization value of a constant C = 1 was used in the study. SVMs are supervised by nature, and by design, they increase the classifier margin, lowering the error rate. Convolutional Neural Network (CNN): Two 1D convolutional layers, one flatten layer, and two dense layers make up the suggested model. The hidden layers employ the relu activation function, whereas the output layer uses softmax, with a batch size of 32 for input and 64 for output. The input layer has a filter size of 32, the hidden layer has a filter size of 64, and the output layer has a filter size of 1. Tensorflow was used to create the CNN model. Rmsprop is the optimizer used for the suggested model since it improves accuracy and reduces losses. Because the learning rate is one of the most essential hyperparameters, Keras`s deep learning library was able to obtain the best learning rate. The significant improvement was calculated across 20 training epochs, with the learning rate inflated by a minute factor until it reached 0.1.
These parameters were assessed using k-fold cross-validation on the training set. Various strategies such as batch processing, dropout, cross-validation, and balanced training were used to optimize the model and avoid the phenomena of overfitting. The training and validation of the classifier were set at 80% and 20% of the total data, respectively. Figure 3 shows the working of the proposed model. Finally, a SoftMax classifier was used at the layer to acquire the DE and PSD feature signal recognition results. Table 1 shows the parameters set for the CNN model which includes the number of epochs, batch size, optimizer utilized, the learning rate set, filter size and the loss function.

RESULTS
The evaluation of performance is an essential component of any research project. In this research, the results were based on classification accuracy and time complexity, which are proved as reliable sources to measure the model's performance. The classification accuracy is one of the most popular and reliable metric for assessing a model's performance. The confusion matrix was used to determine classification accuracy in the proposed study. Accuracy of the classification is basically a measure that indicates a classification model's performance by dividing the number of correct predictions made by the model with total number of predictions. The higher the classification accuracy the better a model has performed. The classification accuracy of all the models is shown in figure 4.
Whereas, the time complexity of the algorithms deployed measures of how long an algorithm takes to run as a function of the length of the input. This parameter is crucial in determining how well an algorithm works. Deep neural networks need a long time and a lot of space to run. On the other hand, the linear regression models produce results in a shorter amount of time, but their performance is not up to the standard. This demonstrates a tradeoff between resource demand and algorithm efficiency. Reducing the runtime complexity can be achieved by reducing the number of irrelevant characteristics, variables, and data kinds, and data optimization. The training time of all the models is summarized in figure 5.   The best accuracy was 99.7 percent obtained by the K-Nearest Neighbors (KNN) model, and 95.7 percent is achieved by the Support vector machine (SVM); greater than 90% classification performance was obtained for the accuracy for the two classical ML models. The brain signals of normal and MDD participants were then evaluated using a deep learning architecture. The proposed CNN model performed extraordinarily well, with a 99.6% accuracy rate. On the contrary the CNN model took more computational time as compared to KNN and SVM. The SVM model was the fastest to compute among all. On the contrary, the CNN model took longest time to compute. The table 2 shows the classification accuracy of the models used in proposed methodology. The table 3 summarizes the time complexity of the algorithms used in proposed methodology.

DISCUSSION
The main objectives of this study were to analyze the non-resting EEG signals with some decision making tasks along with the resting ones. From the studies, it is proved that MDD deteriorates the cognitive abilities, so the analysis of EEG signals with some cognitive information was required for better understanding of this ailment. Along with it the analyses of various features was also needed to understand the nature of EEG signals of depressed and normal subjects. In this study, we analyzed resting and non-resting EEG signals of 34 depressed patients 30

Classification Accuracy
EEG signals were acquired from all five bands (alpha, beta, theta, gamma and delta) from both the hemispheres of brain. The two features were then extracted which were DE and PSD. These extracted features were then fed into the classifiers. Three different classifiers were employed in this research. The KNN classifier outperformed the SVM and CNN classifiers in this categorization. Furthermore, frequency based, time based and nonlinear aspects of EEG signals should be examined for evaluating EEG of depressed individuals, suggested by this study. In addition, instead of recording and analyzing EEG signals in resting state and decision-making tasks only, EEG in a variety of conditions and activities could be acquired as it can help to evaluate depressed patients and normal individuals. Future research should focus on identifying the brain areas that are implicated in depression.
Lastly, more EEG data would allow researchers to test the accuracy of these features being extracted and the classifiers employed. These phenomenal obtained results in terms of classification accuracy and training time confirmed that PSD and DE features are potentially useful features to evaluate EEG signals. The inflated classification accuracy shows that the non-resting and resting EEG signals of depressed and normal subjects are classified virtuously.