Data Mining and Machine Learning Algorithms Using IL28B Genotype and Biochemical Markers Best Predicted Advanced Liver Fibrosis in Chronic Hepatitis C.

Citation:
Shousha, H. I., A. H. Awad, D. A. H. Omran, M. M. Elnegouly, and M. Mabrouk, "Data Mining and Machine Learning Algorithms Using IL28B Genotype and Biochemical Markers Best Predicted Advanced Liver Fibrosis in Chronic Hepatitis C.", Japanese journal of infectious diseases, vol. 71, issue 1, pp. 51-57, 2018.

Abstract:

IL28B single nucleotide polymorphism (rs12979860) is an etiology-independent predictor of hepatitis C virus (HCV)-related hepatic fibrosis. Data mining is a method of predictive analysis which can explore tremendous volumes of information from health records to discover hidden patterns and relationships. The current study aims to evaluate and compare the prediction accuracy of scoring system like aspartate aminotransferase-to-platelet ratio index (APRI) and fibrosis-4 (FIB-4) index versus data mining for the prediction of HCV-related advanced fibrosis. This retrospective study included 427 patients with chronic hepatitis C. We used data mining analysis to construct a decision tree by reduced error (REP) technique, followed by Auto-WEKA tool to select the best classifier out of 39 algorithms to predict advanced fibrosis. APRI and FIB-4 had sensitivity-specificity parameters of 0.523-0.831 and 0.415-0.917, respectively. REPTree algorithm was able to predict advanced fibrosis with sensitivity of 0.749, specificity of 0.729, and receiver operating characteristic (ROC) area of 0.796. Out of the 16 attributes, IL28B genotype was selected by the REPTree as the best predictor for advanced fibrosis. Using Auto-WEKA, the multilayer perceptron (MLP) neural model was selected as the best predictive algorithm with sensitivity of 0.825, specificity of 0.811, and ROC area of 0.880. Thus, MLP is better than APRI, FIB-4, and REPTree for predicting advanced fibrosis for patients with chronic hepatitis C.