data mining

Showing results in 'Publications'. Show all posts
Salama, M., Data Mining for Medical Informatics, , Cairo, Cairo Unv, 2012. AbstractThesis.pdfPresentation.pdf

The work presented in this thesis investigates the nature of real-life data, mainly in the medical field, and the problems in handling such nature by the conventional data mining techniques. Accordingly, a set of alternative techniques are proposed in this thesis to handle the medical data in the three stages of data mining process. In the first stage which is preprocessing, a proposed technique named as interval-based feature evaluation technique that depends on a hypothesis that the decrease of the overlapped interval of values for every class label leads to increase the importance of such attribute. Such technique handles the difficulty of dealing with continuous data attributes without the need of applying discretization of the input and it is proved by comparing the results of the proposed technique to other attribute evaluation and selection techniques. Also in the preprocessing stage, the negative effect of normalization algorithm before applying the conventional PCA has been investigated and how the avoidance of such algorithm enhances the resulted classification accuracy. Finally in the preprocessing stage, an experimental analysis introduces the ability of rough set methodology to successfully classify data without the need of applying feature reduction technique. It shows that the overall classification accuracy offered by the employed rough set approach is high compared with other machine learning techniques including Support Vector Machine, Hidden Naive Bayesian network, Bayesian network and other techniques.
In the machine learning stage, frequent pattern-based classification technique is proposed; it depends on the detection of variation of attributes among objects of the same class. The preprocessing of the data like standardization, normalization, discretization or feature reduction is not required in this technique which enhances the performance in time and keeps the original data without being distorted. Another contribution has been proposed in the machine learning stage including the support vector machine and fuzzy c-mean clustering techniques; this contribution is about the enhancement of the Euclidean space calculations through applying the fuzzy logic in such calculations. This enhancement has used chimerge feature evaluation techniques in applying fuzzification on the level of features. A comparison is applied on these enhanced techniques to the other classical data mining techniques and the results shows that classical models suffers from low classification accuracy due to the dependence of un-existed presumption.
Finally, in the visualization stage, a proposed technique is presented to visualize the continuous data using Formal Concept Analysis that is better than the complications resulted from the scaling algorithms.

I.Ghali, N., R. Wahid, and A. E. Hassanien, "Heart Sounds Human Identification and Verification Approaches using Vector Quantization and Gaussian Mixture Models", International Journal of Systems Biology and Biomedical Technologies, , vol. 1, issue 4, pp. 75-88, 2012. Abstract

In this paper the possibility of using the human heart sounds as a human print is investigated. To evaluate the performance and the uniqueness of the proposed approach, tests using a high resolution auscultation digital stethoscope are done for nearly 80 heart sound samples. The verification approach consists of a robust feature extraction with a specified configuration in conjunction with Gaussian mixture modeling. The similarity of two samples is estimated by measuring the difference between their log-likelihood similarities of the features. The experimental results obtained show that the overall accuracy offered by the employed Gaussian mixture modeling reach up to 85%. The identification approach consists of a robust feature extraction with a specified configuration in conjunction with LBG-VQ. The experimental results obtained show that the overall accuracy offered by the employed LBG-VQ reach up to 88.7%