April 16, 2014
Draft 2 Paper #4
Database in Distributive Environment
Database is a diverse collection of information which manages data and allows fast storage and retrieval of that data. Each application requires database to hold the data specific to the application, which is accessed by the users. However, each application according to its requirement needs different type of database. Researchers classify the databases according to the user specific functionalities, parameters as well as application.
There have been several discussions and researches on the joins, which is a key performance indicator of any database. Some researchers outweighed the centralized database over distributed database, based on the analysis of joins performed in above mentioned two databases. For example, Sharma and Singh (2012) conclude that in centralized database data ...view middle of the document...
Cheng, Yu and Yu (2011) show that the HPSJ algorithm processes an R-join between two base relations, it first gets all centers that have a nonempty x-labeled F sub cluster and a nonempty y-labeled T sub cluster, using the table and maintains them. The authors describe that two step R-join algorithm is used to process temporal relation that contains R-join attributes. On the other hand, Carbunar and Sion (2012) explain that join algorithms returns all matching tuple which makes parallel database faster. Different authors see the parameters according to the use in specific application.
According to researchers another category is load balancing. For example Lubbe, Reuter and Mitschang (2012) proposed an algorithm for load balancing of partitioned data. This aims at balancing the amount of data and focus on reducing data skew between partitions. They also showed that if current load rises above some certain threshold in a particular node then it will check the load in the neighboring node and if the load in that node is below the threshold then the load will be shared amongst them. On the other hand load balancing in distributed environment, according to Yfoulis and Gounaris (2012), depends on the flux that does not exceed the time of query processing and such dynamics can be modeled either through transfer function or through state-space models.
Furthermore, some research show that load in hybrid database is more optimized than DHT data system which is primarily a part of cloud database. Mehta, Agrawal and Jinwala (2012) evaluated the performance of hybrid database in heterogeneous distributed system. They explained that hybrid algorithm divides the nodes of distributed system into clusters to achieve less communication overhead. Authors suggested that using hybrid algorithm the scalability issues in centralized algorithm can be overcome. In contrast, Pitoura, Ntarmos and Traintafillou (2012) suggest that in DHT based network, peers and data are assigned unique identifiers from a circular m-bit identifier space in which each tuple is stored on the peer mapped by securely