Data quality is defined as “an inexact science in terms of assessments and benchmarks” . Similarly high quality data can be described as “data that is fit for use by data consumers” .
11.2. Origin of Bad Data
There may be different sources from where erroneous data is originated. Data may become dirty if it is mistakenly entered, received from invalid external data source, or when good data is combined with outdated data and there is no way to distinguish between the two.
11.3. Categories and Dimensions of Data Quality
Since before data was the most valuable asset of an organization and data was rarely shared. Now businesses, governments, and research organizations rely on the exchange and sharing of various forms of data. As there is an increase in interconnectivity among data producers and data consumers; interest in data quality increases steadily. The management of data quality is typically a complex job. For the entire data management process all data quality aspects should be observed. Following table indicates the categories and dimensions of data quality :
Table 11.1 Categories and Dimensions of Data Quality 
Among of data
Ease of Understanding
11.4. Classification of data quality problems in data sources
Data quality problems are classified in two main categories: Single-Source problems and Multi-Source Problems . A brief view of the classification and sub-classification is shown in the figure below that shows some typical problems for the various cases.
Figure 11.1 Classification of data quality problems in data sources 
11.4.1. Single-source Problems
Single-Source Problems can be occurred at Schema Level or Instance Level. Database systems usually enforce the restrictions of a specific data model along with the limitations of the application. Therefore at schema level there may be problems of lack of an appropriate model-specific integrity constraints or application specific integrity constraints. Data model limitations and poor schema design results in data quality problems at schema level. Also there is a high probability of errors and inconsistencies in data that arise from the sources having no proper schema, such as files. The inaccuracies and inconsistencies that cannot be handled at schema level are termed as instance level problems. As shown in above figure 1 instance level problems arise due to data entry errors like misspellings, redundancy and contradictory values. Data cleaning technique helps to overcome these issues. Data cleaning is an expensive technique, so to avoid all such problems an appropriate design is required. Also, the discovery of data cleaning rules during warehouse design can...