Comparison of Machine Learning Algorithms for Sepsis Detection

. Sepsis is a very fatal disease, causing a lot of causalities all over the world, about 2, 70,000 die of Sepsis annually, thus early detection of Sepsis disease would be a remedy to prevent this disease and it would be a big relief to the family of sepsis patients. Different researchers have worked on sepsis disease detection and its prediction but still the need to have an improved model for Sepsis detection remains. We compared various machine learning algorithms for Sepsis detection and used the dataset publicly available for all the researchers at Physionet.org, the dataset contains many empty or Null values, we applied backward filling and forward filling techniques, and we calculated missing values of MAP using equation (1) which gives more precise results, we divided the 40,336 files of datasets A and B into 80% training set and 20% testing set. We applied the algorithms twice one time using vital signs and clinical values of patients and the second time using only vital signs of the patients; using vital signs only the training accuracy of KNN, Logistic Regression, Random Forest, MLP, and Decision Trees was 0.992, 0.999, 0.981, 0.981, and 0.981 respectively, while the testing accuracy of KNN, Logistic Regression, Random Forest, MLP, and Decision Trees was 0.987, 0.980, 0.983, 0.981, and 0.981 respectively, for Sepsis Label 0, the value of precision for KNN, Random Forest, Decision Trees, Logistic Regression, and MLP was 0.99, 0.98, 0.98, 0.98, and 0.98 respectively, while the value of recall for KNN, Random Forest, Decision Trees, Logistic Regression, and MLP was 1.00, 1.00, 1.00, 1.00, and 1.00 respectively; the comparison of all the above-mentioned algorithms showed that KNN leads over all the competitors regarding the accuracy, precision, and recall.


INTRODUCTION
Sepsis is initiated by injection of infection into the human body, the infection leads to internal organs disorders including the heart, lungs, and kidneys, this results in the ultimate death of the patients, annually about 35% of the Septic patients die, and 24% budget of the USA is consumed annually for the purpose of diagnosis and the treatment of the Sepsis disease [2]. This disease is caused because of an abnormal response of internal body tissues towards infection, the unbalanced response of the body indicates lower level of immunity system within the human body [3]. Sepsis lifecycle starts from infection entry into the human body, which proceeds towards the lungs and gets them infected; after being infected the lungs bring infection into the heart which is spreads throughout the whole body via blood vessels, this makes the internal organs disorder, which ultimately causes the death of infected person [3]. Figure 1 shows the lifecycle of Sepsis disease.
The electronic health record is used for training and testing the proposed model. Delahanty et al. [3] used the HER (Electronic Health Record) which comprised of one of seven departments of the Cerner Millennium. Administrative credentials were achieved through the usual course of hospital's sops. The data were stored at Tenets' warehouse and then achieved using SQL of the database. They used gradient boosting for the detection of Sepsis. Most of the researches about Sepsis has focused on specific patient conditions and each has used a different Sepsis definition. Calvert et al. [4] proposed an Insight model for early detection of Sepsis (EDOS) using S.I.R.S. (systematic inflammatory response syndrome) criteria. Reyna et al. [2] used the dataset available at physionet [1]; they used k-means clustering for the training of algorithm and decision tree for the recommendation of the presence of Sepsis in the patients. They used k-means clustering as it represents non-linearly spreader values to be more robust and aligned [5]. We applied multiple machine learning algorithms on a dataset which is publicly available at most visited platform of Physionet.org [1], the dataset contains clinical values as well as vital signs of Sepsis disease, we applied the algorithms in two different ways, one using vital signs only and second time using vital signs and clinical values. We compared the algorithms based on accuracy, precision, recall, and f1-measure.
Sepsis detection and prediction became easy due to publicly available datasets, which are easy to manipulate and get the required results by getting training and testing of the algorithms to get improved models [4], most of the recent research on Sepsis detection using clinical values has been entertaining the publicly available dataset provided by Physionet.org [1], the Physionet platform provides as much information and guidance for using and applying the dataset, they give information how to access the dataset, and show the tabular form of the dataset and define all the attributes present in the dataset. They also tell the limitations of the dataset, i.e., there are empty values in the dataset denoted by NAN. Many clinical values of the patients were missing from the dataset i.e., no hourly update between of clinical values was available throughout the dataset except a few of them are present frequent for some entries of the patients [6]. Shankar-Hari et al. [6] applied the forward filling technique to fill in the missing values of the column, the forward filling technique.

Forward Filling
The forward filling is a simple data filling technique used for filling the missing values in a csv or excel file; the missing value is replaced by the previous value of that index. The current value of a certain place remains the same if the previous values are not found [2].
In a comparison of machine learning approaches, Hsu et al. [7] evaluated different machine learning approaches including Naïve Bayes, LLSE (Linear Least Square), Hidden Markov Model, Support Vector Machine, LS-Gradient, and Random Forests. They used LLSE as a baseline and experimented with Online Lasso and SQR as online learners [8]. Bilal, et al. [9] applied machine learning algorithm for detection of initial and severe conditions of Sepsis conditions. They utilized electronic health records of the patients, for the training and the testing of model, they selected the patient's record with initial eight vital signs of Sepsis for the Sepsis prediction. The vital signs they used include Temp, H.R, S.B.P, and M.A.P, they applied the S.I.R.S. criteria to define Sepsis disease, they applied deep learning algorithm for the classification, and after the prediction, the comparison of Adaptive CNN, RNN-LSTM, and SVM-quadratic kernel, the training accuracy for SVM-quadratic kernel, RNN-LSTM, and Adaptive CNN were 78.00%, 92.72%, and 93.84% respectively.
The testing accuracy of the accuracy of SVM-quadratic kernel, RNN-LSTM, and Adaptive CNN were 68.00%, 91.10%, and 93.18% respectively. Jia, et al. [10] utilized the dataset publicly available at Physionet [1], they applied three machine learning algorithms named random forest, decision tree, and logistic regression, they applied multiple autoencoders and compared their accuracy values with different autoencoders, they applied T.A.E(Temporal Autoencoder), S.A.E(Spatial Autoencoder), S.T.A.E(Spatial-Temporal Autoencoder), and T.S.A.E(Temporal and Spatial Autoencoder), the random forest gave maximum accuracy of 72.2% using T.S.A.E., the decision trees gave maximum accuracy of 67.9% using Temporal Autoencoder, and the logistic regression gave the maximum of 60.4% accuracy using T.A.E.
Waseem et al. [13], applied LMA based ANN on big data for air conditioners controlling, they selected calibration network architecture and no. of neurons were selected according to the requirements, they divided the dataset into 75% training and 25% testing, the evaluation was made based on Mean Square Error, mean absolute error, and Mean Absolute Percentage Error. Zhenzi et al., [14] worked on power system planning and prediction, they used case studies of various countries for differentiating between maximum and minimum temperature in certain days. The main contributions of our work are as under:  We proposed a novel set of integrated features, which give a better approach towards Sepsis disease detection using a machine learning approach.  The proposed system is capable of successfully classifying the Sepsis and Non-Sepsis cases for early detection of Sepsis.  Our method is capable to detect the Sepsis infection with high accuracy, precision, recall, and f1-score.  We compared our system with our state-of-art models and achieved best performance of our proposed model. The remaining paper is organized as, in section 2, we discussed the proposed methodology followed by the data pre-processing, filling in missing values, feature extraction and classification. Section 3 presents the detail of experimental results and discussion. Finally, we conclude our work in section 4.

PROPOSED METHODOLOGY
This section presents a detailed description of the proposed sepsis detection system. The main objective of the proposed framework is to differentiate between sepsis and non-sepsis cases of input data. The proposed system comprises of three stages such as data pre-processing, features extraction and classification. The input dataset was downloaded from Physionet [1], the input dataset contained two sets namely A and B, both sets comprised of psv files containing hourly records, Set A contained more record of than 20,000 patients and Set B contained hourly record of about 20,000 patients. In the initial stage, the dataset contained about 31% empty values, the empty values were filled in by applying two most effective data filling approaches of forward filling and backward filling, after that the dataset still contained many empty values of MAP and DBP, there were empty values for M.A.P and in some cases M.A.P values were present but D.B.P values were missing in the files. We applied the standard formula of D.B.P and M.A.P. Equation 1 shows the formulae that gives the minor difference between the calculated and the measured values of M.A.P. [11]. M.A.P = D.B.P + 1 / 3 (S.B.P -D.B.P) (1) After filling and preprocessing of the data, we extracted those patients whose clinical values were available for frequent hours and we applied the filter of patients with more than one observation for each clinical value. Moreover, we applied the filter of patients with prediction between 3 to 24 hours and at the end, we sent the patients list to classifier for the training. Then we appended the 20,336 CSV files of set A and the first 12000 files of set B in the training.csv file and 8000 files in testing.csv file. These training and testing files contained the vital signs and clinical values of the patients, then only vital signs of both training.csv and testing.csv file were selected and then we saved the selected data to another CSV files naming vital_training.csv and vital_testing.csv. We applied classifiers in two ways; first time on training and testing set with vital signs and clinical values and the second time on training and testing set with only vital signs. We applied Random Forest, KNN, Decision Trees, MLP, and Logistic Regression and compared their results. Figure 2 shows the methodology for Sepsis Detection. The detailed working mechanism of the proposed system is shown in Figure 2.

Backward filling
Backward filling technique is a simple and effective filling technique which is required when initial row is empty and the proceeding rows have valid values, and empty row cannot be filled in using forward filling technique [2] Machine Learning Algorithms K.N.N. k-nearest neighbors [22] supervised learning algorithm, used as a solution to resolve both the regression and classification problems [12]. The Random forest [22] a flexible machine learning algorithm, it produces results even without tuning of hyperparameters of any algorithm [15]. A decision tree [22] a supervised learning technique used mostly for the classification problems. It has a pre-defined target variable i.e., actual values are present in the testing data and the predicted values are compared with them [16]. The Logistic Regression [22][23][24][25][26] is a well suitable algorithm used to analyze the regression of a model; it is applied when the target variable is in a binary form i.e., positive, and negative [17] [24].

Neural Networks
MLP (Multi-Layer Perceptron) is a deep learning algorithm, it works on the principle of feeding forward, it takes a set of input values and generates a set of output values. [20].

Tunning Parameters
The algorithms were tuned at suitable input parameters where it gains maximum values for each evaluation parameter, logistic regression was tuned at max_iteration=1000, Random Forest was tuned at no. of estimators =100, Decision Trees algorithm was tuned at max_depth =5 layers, and MLP was tuned at alpha = 1 and max_iteration =1000, while KNN was tuned at depth = 5 layers. These parameters gave the results for all the evaluation parameters, and among them the KNN declared its highest position.
The algorithms were tuned at suitable input parameters where it gains maximum values for each evaluation parameter, logistic regression was tuned at max_iteration =1000, Random Forest was tuned at no. of estimators =100, Decision Trees algorithm was tuned at max_depth =5 layers, and MLP was tuned at alpha = 1 and max_iteration =1000, while KNN was tuned at depth = 5 layers. These parameters gave the results for all the evaluation parameters, and among them the KNN declared its highest position.

EXPERIMENTAL RESULTS AND DISCUSSION Dataset
We used the input dataset downloaded from Physionet [1], the input dataset contained two sets namely A and B, both sets comprised of psv files containing hourly records, Set A contained 20,336 patients and Set B contained hourly record of 20,000 patients. The dataset contains about 31% empty values denoted by N.A.N which means not a number, these empty values depict that the corresponding values were not examined at the time of organization of the dataset. The dataset contains eight initial columns with the label of vital signs for Sepsis disease, and succeeding 26 columns contains clinical values of the patients, the last 5 columns contain demographic values of the patient including age, gender, ICU entry time, etc.
Initially, the dataset contained empty values and after applying forward filling, and backward filling techniques mostly the empty values were filled in. After applying filling in techniques there was still a need for further pre-processing to make the dataset well enough for being used to detect Sepsis disease more accurately and precisely, so we applied the standard equation for calculating the value of M.A.P.
After filling in the missing values of the dataset, we proceeded towards training and testing set distribution, we divided the set A, and the set B in the ratio of 80% training set and 20% testing set. We selected all the 20 336 csv files of set A, and 12 000 files of set B in the training set and remaining 8 000 csv files of set B in the testing set, in this way, a standard partition for training and testing sets is organized so the algorithms can be trained to the fruitful limit and tested on the suitable testing data, we then selected vital signs and saved the training and testing sets in two separate CSV files named train_patient_vitals.csv and test_patient_vitals.csv respectively. Then five machine learning algorithms were applied, firstly we applied the algorithms using only vital signs and the accuracy of all the five machine learning algorithms.

Evaluation Metrics
To evaluate the performance of the proposed system, we used an accuracy, precision, recall, and F1-score. This indicates better classification performance of the systems to detect Sepsis disease. We compared the performance of our method with baseline methods and other existing systems based on accuracy, precision, recall, and f1-   Table 2 shows the results of precision and recall for the machine learning models using Sepsis Label 0,1 and average.  0.49 0.50 F1-Score is applied for handling the issues of inverse relation of the precision and the recall; the precision decreases with the increase in the value of recall [19].  Table 3 traverse the results of the F1-score for the machine learning models using vital signs only. 0.49 In the second phase, all the five algorithms were applied on the dataset, containing both the vital signs and the clinical values for patients with all the features of the dataset, we trained the algorithms using the training set and after training they were analyzed on the basis of their accuracy, precision, recall, and f1-score, Table 4 shows the accuracy of all the above-mentioned algorithms, the training accuracy values for K-Nearest Neighbours, MLP, Random Forest, Decision Trees, and Logistic Regression were 1.000, 0.997, 0.981, 0.981, and 0.981 respectively, and the testing accuracy values for MLP, K-Nearest Neighbours, Random Forest, Decision Trees, and Logistic Regression were 0.995, 0.993, 0.981, 0.980, and 0.981 respectively.
The precision values using vital signs and clinical values of patients for MLP, RF, Decision Trees, Logistic Regression, and KNN, for Sepsis Label 0 were 0.99, 0.99, 0.98, 0.98, and 0.98 respectively, and for Sepsis Label 1 the precision of KNN, Random Forest, Logistic Regression, MLP, and Decision Tree was 0.96, 0.95, 0.20, 0.00, and 0.00 respectively while the average precision for KNN, Random Forest, Logistic Regression, MLP, and Decision Tree was 0.97, 0.97, 0.59, 0.49, and 0.49 respectively. For Sepsis Label 0 the recall of KNN, Random Forest, Logistic Regression, MLP, and Decision Tree was 1.00, 1.00, 1.00, 1.00, and 1.00 respectively, and for Sepsis Label 1 the recall of KNN, Random Forest, Logistic Regression, MLP, and Decision Tree was 0.50, 0.14, 0.00, 0.00, and 0.00 respectively while the average recall for KNN, Random Forest, Logistic Regression, MLP, and Decision Tree was 0. Table 5 shows the results of precision and recall for the machine learning models using vital signs and clinical values.  Table 6 shows the f1-score of machine learning models using vital signs and clinical values.

Comparison with other methods
We compared the results of our proposed model with the results of Pa Yo et al., [7], table 7 shows the accuracy values of KNN, Naïve Bayes, SQR, SVM, CNN-LSTM + transfer, and CNN-LSTM. The accuracy values of KNN, Naïve Bayes, SQR, SVM, CNN-LSTM + transfer, and CNN-LSTM are 99.1%, 84%, 60%, 90%, 90%, and 95% respectively. The comparison shows that KNN shows the highest accuracy value among these competitors. Jaccob et al., [21] calculated the accuracy of 90.9 % for their proposed algorithm insight, table 8 shows the results of comparison of proposed model with the results of Jaccob et al., [21]. Table 7. Comparison of Accuracy of proposed system with other systems.
Model Name Accuracy K.N.N. [7] 99.1% Naïve Bayes [7] 84% SQR [7] 60% SVM [7] 90% CNN-LSTM + transfer [7] 90% CNN-LSTM [7] 95% Table 8. Comparison of Accuracy of proposed system with previous work. Model Name Accuracy K.N.N. [21] 99.1% Insight [21] 90.9% SAPS ( II ) [21] 50% SIRS [21] 20% K.N.N. [21] 99.1% The dataset contained many empty values, and the dataset was not able to be used for training of an algorithm to give improved results, the data pre-processing techniques including backward filling, forward filling technique, and MAP calculation formula gave a smartly filled up dataset which is ready to be used. Results of Table 1 and 4 showed that the accuracy of all the algorithms increases when the input dataset contains vital signs as well as all clinical values, KNN shows the best results among all its peers in both scenarios. All the above results showed that KNN leads all the competitors regarding all aspects including accuracy, precision, recall, and F1-score.

CONCLUSION
This paper has presented a novel and reliable sepsis detection framework based on a machine learning algorithm. We evaluated the performance by employing numerous machine learning algorithms such as MLP, Decision Trees, Logistic Regression, and Random Forest, KNN and compared their results. The experimental results demonstrated that KNN among all the machine learning classifiers outperformed in terms of accuracy, precision, recall, and F1-score. In the future, we would extend this work to implementation in the real life by facilitating patients to check whether the Sepsis infection is present or not, with the help of Android Applications as well as Web Applications. Acknowledgment. The completion of this work has become possible due to the cooperation, guidance, help, and sincere contribution of so many people who cannot be mentioned here, but their contribution and effort in terms of meaningful guidance has been appreciated and acknowledged. My research supervisor, for outstanding support and guidance from start to end. To my parents, teachers, colleagues, and brother for helping me in every situation. Above all, to Allah Almighty the Most Beneficent, the Merciful, the author of knowledge and wisdom, for HIS countless love and care. Author's Contribution. All the authors of this article have contributed to its completion, author Asad Ullah gave the idea for starting research on this topic, he proposed the methodology of this research, he gathered the input dataset and other useful material needed for research purpose, he also done work in coding to get results; author Huma Qayum supervised the overall progress of work from start to end, author Farman Hassan and Auliya Ur Rahman completed the backend coding for getting required results, he analyzed the results as well; author Muhammad Khateeb gave the idea to publish the article in the IJIST Journal, he provided the template and contributed in checking formatting and plagiarism of the article.