This website uses cookies to ensure you have the best experience. Learn more

Map Reduce: A Programming Model Essay

764 words - 4 pages

MapReduce is presented by Dean and Ghemawt [1] as a programming model for parallelizing the computations of data intensive application across a cluster. It has two primitives, map and reduce. Map computes a set of intermediate key/value pairs for each record, then reduce is applied to all the values with similar keys so it may forms a smaller set of values.
Several implementation of MapReduce have been implemented according to the hardware infrastructure. Phoenix [2], as an example, is a shared-memory implementation of MapReduce. Apache Hadoop is another example implemented for running applications on large cluster built of commodity hardware [3].
MATE-CG [4] is another map-reduce-like framework, implemented appropriately for programming the CPU+GPU clusters. MATE-CG aims the acceleration of map-reduce applications on parallel heterogeneous environments, especially CPU+GPU clusters. It enables accelerating different types of applications, supporting three schemes. Based on its dataset, an application could be accelerated by one of the CPU-only, GPU-only and CPU-n-GPU schemes, to get the best performance. Appropriate APIs allow the implementer to specify reduction functions for CPU and GPU. GPU reduction is implemented by CUDA kernels to perform on GPUs. Also the user defines application specific partitioner and splitter functions. The former is used to partition the dataset among the computing nodes and the latter splits the data blocks into smaller chunks to be processed on the CPUs and GPUs. In the runtime system, the CPU is used to execute partitioning, scheduling, etc. GPU is mainly responsible for accelerating the computation. Based on the user defined partitioning parameter, data is distributed among the nodes. After completion of data block distribution, each block is cut into two parts and both the CPU and the GPU start their execution concurrently. The data parts are split into smaller chunks to enable the load to be balanced. The CPU-GPU data distribution fraction is automatically tuned based on an analytical model. The chunk size for the CPU and the chunk size for the GPU are also defined with an analytical model, but can be provided by the developer.
In [Ravi et al.], two scheduling scheme is presented for single node of CPU/GPU and a cluster of CPU/GPU nodes. The first scheme outperforms the Blind round Robin scheme....

Find Another Essay On MapReduce: A Programming Model

Why I Want to Get a Master in Science of Computer Science

848 words - 4 pages , I worked with two other members on Development of Controlling and Stabilizing Algorithms for a Quadcopter using LPC2148 ARM Controller under the supervision of Dr. Hariprasad. This project aimed at stabilizing the quadcopter using PID control by correcting pitch and roll errors caused by external disturbances. My work in this project involved coming up with a mathematical model of a quadcopter to correct pitch and roll errors by changing motor

Programming Languages and Paradigms - Computer Science - Assignment

1092 words - 5 pages of the mathematical function while the other one is based on the use of predicate logic. The examples of declarative programming languages are Lisp, Prolog, Scheme, ML, Miranda, and Haskell. As the computer’s memory size increased, the programmers designed object-oriented programming (OOP) paradigm. It is a model organized around objects rather than “actions” and data rather than logic (Rouse, 2008). The first object-oriented language was

DataStorm Project: Developing Integrated Real-Time Monitoring Framework for Cloud-Based Systems

588 words - 2 pages model definition and its validation was a success in disease outbreak detection modeling. From the time BioSTORM was started streams of data to be analyzed have been rapidly increasing in size and amount. JADE-based deployment become a bottleneck of the framework. Last year, UiS (with and SU have started creating new framework called DataStorm to address the problem of the previous version. Two main goals were: using Hadoop as a


4117 words - 16 pages help them translate high-level language into machine code. The computer code used to write a program is called source code before being translated into machine code, and object code after it has been translated. Kinds of Compilers [B/D] -- interactive programming language for analyses of Bayes linear statistical problems. ACL2 -- programming language to model computer system and prove properties. ADAMO -- scientific programming system

Cloud Computing and IT Professionals

695 words - 3 pages ACO which is a load balancing solution set within the nodes of a cloud system[5]. However, these solutions require drastic changes to ISP backbones and hardware, and therefore are not reasonable solutions to be positioned in the short term. Our idea is to reduce the cost of bulk data transfer by applying three different approaches simultaneously to a cloud computing model i)Traffic differentiation: in which our focus is on bulk data transfers

History of Programming Languages

1667 words - 7 pages .” (“History of PHP” - According to this statement, the PHP language resembles to Perl, another programming language, but with a similar purpose. It also had produced a larger and richer implementation than what it used to be. “This new model was capable of database interaction and more, providing a framework upon which users could develop simple dynamic web applications such as guest books.” (“History of PHP”). Such features allowed

Google Group Case Study: Google Takes on the World

3176 words - 13 pages Google using the competitive forces and value chain models.There are several information systems that companies may apply their business strategy to, in hopes, of achieving a competitive advantage. Two of these models that will be examined are Porter's Competitive Forces Model and the Business Value Chain Model. The first of these systems - Porter's competitive forces model refers to the five factors that affect competitive advantage. Although

The Use and Accuracy of Holt-Winters Forecasting Method

1666 words - 7 pages Introduction Based the principle that easy and friendly to apply this model in practice to forecast C1 product of MAD Ltd, especially provide information for the people that don’t equipped with any skills on VBA and forecasting knowledge. This model using solver, Holt-Winters forecasting method and VBA to achieve automatically calculation to obtain the result of next quarter or next whole year forecast. This report will first give a guidance

Constant Coefficients Linear Prediction for Lossless Compression of Ultraspectral Sounder Data using a Graphics Processing Unit

594 words - 2 pages ultraspectral sounder data having an average compression ratio of 3.39 on publicly available NASA AIRS data. CUDA is a parallel programming architecture that is designed for data-parallel computation. CUDA does not require the programmers to explicitly manage threads. This simplifies the programming model. Our GPU implementation on Nvidia 9600 GT is experimentally compared to the native Central Processing Unit (CPU) implementation. We achieved a speed-up

Programming experience

1027 words - 5 pages binary file by using fstream.h and iostream.h library. We used structural programming methodology and subroutines were coded for various operations. The project size was more than thousand lines of code. As a part of Computer Graphics, I developed graphics programs such as Graphical Binary Search Tree and game projects using C++ and OpenGL library. Graphical Binary Search Tree program was a graphical demonstration of formation of Binary Search

Master in Programming Admission Letter

611 words - 3 pages model. The better I coded the material model, the better its approximation was. Programming skill again showed its great significance and necessity. From coursework, club activities and research, I gradually have a great interest in computer science. I am eager to improve specialized skills and master advanced techniques through high-quality courses and under supervising of excellent faculty. I am really interested in the topic related to robotics or simulation. Enrolling in a master program will enable me to be more professional and competitive in the field of robotics.

Similar Essays

Database Systems: Big Data Evolution And Efficiency

2224 words - 9 pages process of exporting Big Data causes a bottleneck of I/O processes reducing CPU time. Parallel computing is one method that is being explored to work with Big Data. This requires making use of different hardware setups and different software platforms than the traditional DBMS that is been widely used for data mining. MapReduce (or its open-source equivalent Hadoop) and Enterprise Control Language are two parallel programming tools that make

Access Control Layer On Top Of Pig Using Xacml

1366 words - 5 pages (JRE) 1.6 or higher. The standard start-up and shutdown scripts require ssh to be set up between nodes in the cluster. The Apache Hadoop framework is composed of the modules Hadoop Common which contains libraries and utilities for other Hadoop modules, Hadoop MapReduce is a programming model for large scale data processing, Hadoop Distributed File System (HDFS) is a distributed file-system which stores data that provides very high aggregate

Google Technolgy Essay

1179 words - 5 pages it comes to hardware cost reduction is necessary. “Google took good programming ideas from other languages, implementing new functions and libraries to eliminate most of the manual coding required paralleling an application across Google’s servers (Arnold 55-79).” Google uses MapReduce, a programming tool developed in C++ that implements parallel computations for large data sets. MapReduce is a programming model and an associated

Social Media's Role In Network Management In Big Data

844 words - 4 pages influence the design of the system. MapReduce provides a divide and conquer data processing model, where large workloads are split into smaller tasks, each processed by a single server in a cluster (the map phase). The results of each task are sent over the cluster network (the shuffle phase) and merged to obtain the final result (the reduce phase). The network footprint of a MapReduce job consists pre dominantly of traffic sent during the shuffle