This website uses cookies to ensure you have the best experience. Learn more

Improvising Data Locality And Availlability In The Hbase Ecosystem

684 words - 3 pages

HBase has a rigid master slave architecture and its main purpose is to be a scalable and efficient NoSQL database which helps in storing data. HBase has strongly constant read/writes which makes it suitable for high-speed counter aggregation. There is automatic sharding which helps in splitting of regions as the volume of data grows in a particular region. The automatic failover mechanism of HBase allows availability of data to a higher probability as the regions are reallocated among the rest of the region servers. HBase stores all its data in the end in HDFS so that data is permanently stored. HBase even supports many API like Java client API for programmatic access and Thrift/REST API as options for other programmatic options. HBase even supports MapReduce framework for processing parallel with a large number of jobs.
From the above architecture diagram, HBase is located on top of HDFS and it is definitely a fact that HBase uses HDFS as its ...view middle of the document...

As you can see from above the replication factor in HDFS is 3 and hence all the data blocks in HRegion is replicated thrice across the cluster. The HBase client HTable is responsible for finding Region Servers that are serving the particular row range of interest. It does this by querying the .META and -ROOT- catalog tables. The catalog tables -ROOT- and .META exist as HBase tables. They are filtered out of the HBase shell's list command, but they are in fact tables just like any other. The –ROOT- keeps track of where the META table is and its table structure consists of a META region key and its corresponding values which states the location of META table.
The flow of data is in a top down methodology as you can observe from the architecture diagram. Whenever a client sends a write request to HRegionServer, it first writes changes into memory and commit log; then at some point it decides that it is time to write changes to permanent storage on HDFS. Here is where data locality comes into play: since you run RegionServer and Datanode on the same server, first HDFS block replica of the file will be written to the same server. Two other replicas will be written to, well, other Datanodes. One replica is written to a datanode in a remote rack and another replication is written in the same rack but in a different datanode or let me put it this way a different HReionServer.As a result RegionServer serving the region will almost always have access to local copy of data.
In typical HBase setups a RegionServer is co-located with an HDFS DataNode on the same physical machine. Thus every write is written locally and then to the two nodes as mentioned above. As long the regions are not moved between RegionServers there is good data locality: A RegionServer can serve most reads just from the local disk (and cache), provided short circuit reads are enabled. So clearly the data locality is maintained and performance is enhanced until there is any violation happening to the regions stored in a particular region server. These violation can be of any method and this creates a problem of data locality in the HBase ecosystem. The violations are discussed in the next section.

Find Another Essay On Improvising Data Locality and availlability in the Hbase Ecosystem

The Importance of Ecosystem Management and Protection

1525 words - 6 pages soil of pollutants, recycle vital chemical elements and conserve soil and water resources. Loss of biodiversity caused by humans may threaten the capacity of ecosystems to capture energy through photosynthesis, cycle nutrients and resist or adapt to the step functional change. They parts of the ecosystem are used by humans as medicines, pigments, fibres, poisons, chemicals, perfumes and food. Over 25% of prescriptions in the USA contain drugs made

Oasis agro-ecosystem and date palm (Phoenix dactylifera L.) production in the MENA region

643 words - 3 pages The oasis agro-ecosystem is a combination of human settlement and a cultivated area (often a palm grove) in desert or semi-desert environment (Jaradat, 2011). Oasis expansion in arid regions is usually regarded as the opposite to desertification, referring to the process of transformation from desert to oasis in an arid region due to combined action of anthropic and natural factors (Wang, 2009). In MENA, approximately 4 million people live

Migration of the wildebeest in the Serengeti-Mara Ecosystem

2114 words - 8 pages habitat heterogeneity with migration occurring in response to the spatial distribution of food and resources within their environment (Musiega, et al., 2004). However, the migration itself has a profound effect on both the flora and fauna of the ecosystem not only driving the movement of other migratory herbivores such as zebra Eques burchelli and Thompson gazelle Gazelli thompsoni, and thereby the carnivores that prey on them, but also directly

The Mississippi Delta and Oil: Ecosystem Services and Human Health

1382 words - 6 pages Growing up near the Chesapeake Bay, I was bombarded with guest speakers since elementary school about protecting the environment. I knew what an ecosystem was by fifth grade, and in seventh grade our class went on a class trip to Smith Island and Port Isabel in the Chesapeake Bay for more intensive education about how humans are connected to ecosystems. Water and ecosystems are important to public health all over the globe, as water touches

Data and Methodology in Brazil

960 words - 4 pages Specification The study adopts a comparative approach to examine the effectiveness of fiscal policy in stimulating economic growth under periods of high and/or low economic activity in Brazil. This analysis will initiate a Dicky-Fuller test and Augmented Dicky-Fuller test to ensure adequate data before performing the (OLS) regression. Secondary data will be collected from World Bank before being computed on E-views (Econometric software). Hypothesis

Privacy and Security Issues in Data Mining

2342 words - 10 pages to do data mining and classifying the user into some group may result in a variety of ethical issues. In this paper, we deal with two kinds of ethical issues caused by data mining techniques: informational privacy issues in web-data mining and database security issues in data mining. We also look at these ethical issues in a societal level and a global level. Informational Privacy Issues in Web-data Mining There is a debate between the benefits

Minority Women in Business: Data and Research

1175 words - 5 pages Minority Women in Business: Data and ResearchIn a previous report, the issue of minority women in business, specifically those reaching high level positions in corporate America, was presented. In this report, primary data will be gathered, by way of a survey administered to two newspaper corporations, and secondary data will be gathered, by way of internet and University library research. This data will show that a barrier is felt by many women

Security Breaches in Comunication and Data Systems

4082 words - 17 pages There are many average security measures that have a major effect in counteracting terrorist activity over the computer; these are things such as firewalls and virus checkers. A company needs to be ready and practical, they need to have set controls and trained professionals in order to prevent a cyber-attack. Cyber security is a moving target, because they never keep the information up for long periods of time. Therefore there should be

Season Your Data with Theory and Common Sense in Nate Silver's Book, Signal and The Noise

1604 words - 7 pages customers and the channels that matter the most so we can get a meaningful patterns. Lesson number two is “there is one thing that humans do better than computers” which is incorporate data visualization and collaborate with the system forecast as the starting point. Computers has an ability to discern patterns, detect outliers, crunch data and manage transactions but they are no good at nuance or sentiment. In the other word, computer good at ‘always

Differences in the Woodland Ecosystem as the Result of Different Management Strategies

790 words - 3 pages Differences in the Woodland Ecosystem as the Result of Different Management Strategies Introduction: In this piece of work, I am studying the hypothesis, "Differences in woodland ecosystems are the result of different management strategies." This means the way woodlands are managed affect the ecosystems. Places like Bishops wood, need to be looked after and carefully managed, if they are to remain attractive

Shark Hunting: The Loss of an Apex Predator, and the Corruption of the Ocean Ecosystem

2588 words - 10 pages well documented. Shark fin soup continues to grow ever more popular in Asian countries, shark fishing as a sport is becoming more and more prominent, and thousands of sharks' teeth and jaws can be seen for sale in tourist shops each day. As the demand for shark related products continues to rise, so does the killing. The loss of these apex predators is the cause for a large outcry due to the implications associated with their extinction. The

Similar Essays

Improvising Data Locality And Availability In Hbase Ecosystem

1076 words - 5 pages . This is the phenomenon how Hadoop makes sure that data locality is preserved. There is always a curiosity if HDFS can preserve data locality and can successfully take data near logic, then how will HBase be successful in doing the same. An interesting fact to note is HBase saves its data in HDFS as a permanent basis. The actual data is saved in HFile and even its log files in WAL are saved in HDFS at the end of the process. It uses

Mangroves In The Australian Ecosystem Essay

2092 words - 9 pages Mangroves in Australia Introduction Mangroves are an integral part of the Australian ecosystem. They are valuable ecologically, economically and socially. Mangroves provide nurseries to an abundance of marine species which would die if the ecosystem was knocked out of balance. They also account for about 75% of commercially caught fish in Australia. Unfortunately mangroves are under constant threat from humans. Since European settlement in

Common Mycorrhizal Network In The Forest Ecosystem

607 words - 3 pages seedlings survival rate (Nara, 2006) and increasement of the whole ecosystem stability. AM mycelial networks The Glomeromycota have a coenocytic mycelium, which allows free mobility of numerous nuclei in the cytoplasmatic content, within one fungal individual. In the same time, they are able to create a conjoined cell growth called anastomosis. The anastomosis development is possible even between two different fungal individuals

Climate Reconstruction In The Greater Yellowstone Ecosystem

1633 words - 7 pages Climate Reconstruction in the Greater Yellowstone Ecosystem When you think about visiting national parks like Yellowstone and Grand Teton National Parks, usually it’s about where you’re staying and learning a little bit of the history of the area. What usually isn’t thought of, however, is that vast amount of physical phenomena that occur in one of the few intact ecosystems left in the world. In this research paper, I will be conducting a