Medical informatics

Salama, M., Data Mining for Medical Informatics, , Cairo, Cairo Unv, 2012. AbstractThesis.pdfPresentation.pdf

The work presented in this thesis investigates the nature of real-life data, mainly in the medical field, and the problems in handling such nature by the conventional data mining techniques. Accordingly, a set of alternative techniques are proposed in this thesis to handle the medical data in the three stages of data mining process. In the first stage which is preprocessing, a proposed technique named as interval-based feature evaluation technique that depends on a hypothesis that the decrease of the overlapped interval of values for every class label leads to increase the importance of such attribute. Such technique handles the difficulty of dealing with continuous data attributes without the need of applying discretization of the input and it is proved by comparing the results of the proposed technique to other attribute evaluation and selection techniques. Also in the preprocessing stage, the negative effect of normalization algorithm before applying the conventional PCA has been investigated and how the avoidance of such algorithm enhances the resulted classification accuracy. Finally in the preprocessing stage, an experimental analysis introduces the ability of rough set methodology to successfully classify data without the need of applying feature reduction technique. It shows that the overall classification accuracy offered by the employed rough set approach is high compared with other machine learning techniques including Support Vector Machine, Hidden Naive Bayesian network, Bayesian network and other techniques.
In the machine learning stage, frequent pattern-based classification technique is proposed; it depends on the detection of variation of attributes among objects of the same class. The preprocessing of the data like standardization, normalization, discretization or feature reduction is not required in this technique which enhances the performance in time and keeps the original data without being distorted. Another contribution has been proposed in the machine learning stage including the support vector machine and fuzzy c-mean clustering techniques; this contribution is about the enhancement of the Euclidean space calculations through applying the fuzzy logic in such calculations. This enhancement has used chimerge feature evaluation techniques in applying fuzzification on the level of features. A comparison is applied on these enhanced techniques to the other classical data mining techniques and the results shows that classical models suffers from low classification accuracy due to the dependence of un-existed presumption.
Finally, in the visualization stage, a proposed technique is presented to visualize the continuous data using Formal Concept Analysis that is better than the complications resulted from the scaling algorithms.