Big Data: A Continuing Evolution
Big Data today is continuing to evolve, and appears to be in the beginning stages of evolution. It will continue to grow and need constant research initiatives to keep up. This paper will look at the definition of Big Data and how it is being used, why the current DBMS is unable to handle Big Data efficiently, what hardware and software solutions are being tested, and what challenges the researchers are facing.
Big Data is a term used today to talk about the vastly growing amounts of data, (mainly unstructured, but can also include structured and semi structured data), out there to be mined . Data mining attempts to derive meaningful information from data. As the amount of data in different varieties keeps increasing, it becomes harder to process useful information at an acceptable return time rate. Current software tools and hardware are failing to keep up with Big Data needs. Big Data requires one to be able to process complex computer data at the Petabyte or Exabyte level. .
Big Data is developing from many sources, and with storage capacity has been doubling in size every 14 months for the past 30 years, keeping data has become cheaper and cheaper . Some of the sources of data are social media on the Internet, mobile sensors, astronomy, transaction logs, and many more . Companies today desire to collect mass amounts of data that may not be useful today, but could be later. The popular social media site, Facebook, collects over 500 Terabytes of day a day . The term Big Data is not only defined by its volume, by its ability to retrieve knowledge in a reasonable amount of time. For example, Netflix, a video streaming service, utilizes a machine-learning technique called alternating least squares to make real-time recommendations to its users based on what the user has previously watched .
Several industries are making use of Big Data applications such as businesses, health industry, and smart cities . Businesses use it for applications such as customer personalization (such as Netflix movie recommendations) and predicting market changes. The health industry uses data such as DNA information to find patterns to help predict ways to improve the quality of overall health. Smart cities collect and use Big Data to find ways to improve the quality of life and management of natural resources.
The DBMS (Database Management System) platform has been the standard platform for data mining; however it is falling short with Big Data coming into the forefront in today’s technology filled world. With small scale data, a DBMS is able to store, update, and process data on a single desktop computer . DBMS is capable of querying large tables and relational DBMSs have done well with structured tabular data. With the Internet constantly creating loads of data, the different data types, and the complexity of all the data, the volume of Big Data has taken off at astronomical levels....