Data mining is an analytic process of exploring huge amount of data, extract useful information, finding consistent patterns and trends between variables, and build predictive computer models from the relationship discovered using a combination of classical statistics, machine learning and artificial intelligence. The findings are then applied to new subsets of data to test its validity. It performs two essential tasks, descripting and predicting. Descriptive mining tasks characterize the general properties of the data in database. Predictive mining tasks perform inference on the current data in order to make predictions. It involves classifications of data which is to identify patterns to be put into known classes, mining association between sets of data items, and clustering similar groups in the data. This is followed by mining sequential pattern, for example, event A will be immediately followed by event B, deviation detection, which is to determine significant changes in the data, followed by data visualization, a graphical methods to show patterns in data.
Data mining in healthcare is not a new idea, the technique was discovered by William Farr (1807-1883), a British epidemiologist and pioneer in the field of medical statistics. His major contribution was to find the link of the causes of death and professions. Similarly, John Snow (1813-1858), a London physician, has also applied statistical analysis to prove true cause of Cholera outbreak.
Healthcare industry today generates large amounts of complex data about patients, hospitals resources, disease diagnosis, electronic patient records, and medical devices. These massive amounts of data has been organizes intensively and extensively by utilizing the methodology and technology of data mining to find correlation, patterns and relationship in mass amount of patients’ medical data for the purposes of early detection, prevention and treatment of diseases. It provides easier way for evaluation of treatment effectiveness by analyzing patients’ data, their profile, history and physical examination, comparing and contrasting of causes and symptoms, as well as identifying the various side effects of treatment. This can open way to more options on the courses of treatment, and better insights to be selective on them, considering on the effectiveness and the costs.
In general, the stages of data mining used in healthcare are data preparation, analysis, knowledge acquisition and prognosis identification. For data preparation, the data is stored in a database on the HIS, either entered directly by data entry employee or entered via Electronic Health Records (EHR). Data can then be organized into a data warehouse. Data Analysis is divided into steps of exploratory data modelling, descriptive modelling, predictive modelling and pattern discovery and rules. This stage can be summarised as transforming useful data into models to identify patterns and relationships between variables, to further confirm...