Abstract—Privacy Preserving Data Mining (PPDM) is getting attention of the researchers in different domain especially in Association Rule Mining. The purpose of the preserving association rules is to minimize the disclosing risk on shared information to the external parties. In this paper, we proposed a PPDM model for XML Association Rules (XARs). The proposed model identifies the most probable item called as sensitive to modify the original data source with more accuracy and reliability. Such reliability is not addressed before in the literature in any kind of methodology used in PPDM domain and especially in XML association rules mining. Thus, the significance of the suggested model sets and open new dimension to the academia in order to control the sensitive information in a more unyielding line of attack.
Keywords: XARs, PPDM, K2 algorithm,Bayesian Network, Association Rules
n data mining, trends and patterns are identified on a huge set of data to discover knowledge. In such analysis, varieties of algorithms exist for extracting knowledge such as clustering, classification and association rule mining. Thus, association rules mining one domain for delivering knowledge on complex data. Moreover, the basis of the discovered association rules is usually determined by the minimum support s % and minimum confidence c% to represent the transactional items in database D. Thus, it has the implication of the form AB, where A is the antecedent and B is the consequent. The problem with such display of rules is the disclosure of sensitive information to the external part when data is shared. Hence Privacy Preserving in Data Mining (PPDM) related to Association Rules emerges.
In PPDM, Sensitive information is controlled with the help of identification of sensitive item(s) or sensitive rules. The question is how to select or identify the sensitive item(s)? In literature, various methodologies, such as in [2, 3, 4, 5, 6], are proposed by the researchers. The problem with these techniques follow to avoid the generation of sensitive rules with the use of antecedent and consequent while some uses the assumption based sensitive item(s) identification. Furthermore, this act rises another question that whether the assumed or the identified item is estimated with reliability to modify the original data source? Thus, all these questions need to be answered for the more accurate results for such NP-hard problem .
To answer this question in a reliable manner, we decided to use Bayesian network [8, 10] for reliability. The problem again stood up because in most areas, a lot of work has been proposed especially in rule mining. This question has been solved through XML which is used for interchange of information over the web and may have information disclosure in the form of association rules. In such case, XML association rules can be found in literature as in  but the security issue of XML Association Rules (XARs) has been ignored....