1 Introduction 1
2 The Simple Case Study 1
2.1 Understanding the data 2
2.2 From Excel to CSV 3
2.2.1 Separating the single column into multiple data columns or fields 3
2.2.2 Practice Activity 1 5
3 Working with the Comma Separated Values File 7
3.1 What is a CSV File? 7
3.2 From Excel to CSV 7
3.3 Watch out for the CSV Gotchas 8
4 Moving Data to the Google Cloud Platform 9
The supporting files needed for these exercise can be found in Blackboard.
The purpose of this exercise to gain experience in using some of the techniques available to database professionals, to help them move data from one place to another.
When dealing with actual data it is all too easy to forget that each row is potentially as important as the others. Our task is not only to move the data, but to move it accurately. Indeed, in some circumstances, the data we are provided with is itself unclean. We therefore need to ensure we understand the data we are dealing with and that when it completes its journey it is a valid and exact representation of the information being stored.
The Simple Case Study
This set of exercises is set in the context of a simple case study based around a fictional organisation called The North Yorkshire Classical CD Society. It is a non-profit organisation which buys CDs for use by its members. It was formed in 1991, when CDs were generally more expensive to buy, but although membership has not been increasing over the past few years, there is a solid base of loyal members who benefit from the service offered.
CD Data was formerly held only in spreadsheets, but the business case to move to an MySQL database hosted on the Google Cloud Platform has been made. The design decisions have already been taken and our task here is to populate two tables within the database with the existing Excel data.
The partial Class Diagram below describes the entities and their relationship:
Understanding the data
Each CD is assumed to have a primary COMPOSER. That is the person who wrote the piece in the TITLE field. The Title may consist of one or more pieces of music. The music is played by an ORCHESTRA which has CONDUCTOR in charge of it.
In the design phase it has been established that there is a need for a separate table which contains composers, and that every entry in the CD master file must be for a composer that exists in the COMPOSERS table.
The format in which this data is stored in the Excel is discussed later.
It is always sensible to try and understand the data you are moving around.
BEWARE: There has been very little data validation carried out up until now. The data that has been kept in the spreadsheet is far from perfect. There are quite a few records that have been incorrectly entered. For example, the Orchestra and Conductor fields are occasionally input in the wrong order.
From Excel to CSV
Many of the string handling functions within Excel can help in making sense of the data.
TIP: ALWAYS, ALWAYS,...