This website uses cookies to ensure you have the best experience. Learn more

Fault Tolerance In Gpgpu Essay

969 words - 4 pages

[paper report 5] studies three software approaches for GPGPU reliability. These approaches are based on the redundant execution. The first approach is to execute the kernel twice, so the performance overhead is around 100 percent. The other two approaches use the interleaved execution of the main kernel with redundant threads. The paper explores the usefulness of employing ECC/parity bits in memories considering it’s exerted overhead. The first approach, called R-Native executes the kernel twice. One drawback is the similar effect of the permanent hardware defects on both of the executions that could not be detected. This could be avoided by reorganizing the input data for redundant ...view middle of the document...

The results for six application benchmark show that the benefits of using complex approaches is depend on the application and the architecture of GPU. Therefore, executing the kernel twice is sufficient in most cases.
In [paper – report5], the reliability properties of GPGPU using error injection are studied. This study considers the permanent errors (SEU) in ALU and LD/ST units. The error is injected using a developed error injector and a heuristic method is adopted to recognizing hot spots of reliability in the code and depending on the type of this hot spot, a convenient error detector is inserted into the code.
The error injector, inject the errors at the assembly code level by choosing a register at random with normal distribution and injecting an error to it. One error is injected in each execution and the result of it is considered only if the error becomes active. The hot spot of reliability are grouped into three categories including loop condition, branch with a thread or block index and computational statements. Each of these categories equipped with an appropriate error detector. In the paper, only errors causing SDC (i.e. incorrect result) is considered. In a conducted experiment, eight to forty percent of errors caused SDCs that show their considerable participation. The results show that sixty percent of SDC errors are covered by the presented scheme while exerting an overhead between 95 to 35 percent.

Vulnerability of GPU cores to soft errors is also studied in [paper report5]. Two techniques are presented in the paper to detecting the soft errors and improving reliability of SMs, with low overhead. During the branch divergence and pipeline stalls, SMs are underutilized and the paper suggests using idle times of SM to execute redundant threads to improve the reliability and enhance the error coverage. This approach is called RISE by the authors. RISE is...

Find Another Essay On fault tolerance in gpgpu

Zero tolerance policies Essay

921 words - 4 pages a zero tolerance policy, both the bully and the victim will be punished because violence is impermissible according to the school rules. This shows that there really is no tolerance with violence. Children are punished regardless of the reason why they acted a certain way. Another fault with these policies is that they do not take age into account. In the past, students have been punished severely for behavior that is age appropriate. A young

Achieving Fault Tolerance Using RAID Technology

2916 words - 12 pages Achieving Fault Tolerance Using RAID TechnologyData storage, integrity, and availability are critical concerns in enterprise network environments. A Single Large Expensive Drive (SLED) running in a network server is the network's single point of failure; if the hard drive crashes then the network crashes. Dependence on hard disk storage combined with the volatility of every hard disk's Mean Time between Failure (MTBF) has led to the widespread

The Impact of Zero Tolerance Policies

1513 words - 7 pages fight back in an act of self defense. Under a zero tolerance policy, both the bully and the victim will be punished because violence is impermissible according to the school rules. This shows that there really is no tolerance with violence. Children are punished regardless of the reason why they acted a certain way. Another fault with these policies is that they do not take age into account. In the past, students have been punished severely for

Sensors Allocation Fault

1808 words - 7 pages used are chosen based on cost of effectiveness and coverage ratio. Fault tolerance H3 faulty -85% faulty W1 faulty-85% faulty C2 faulty -85% faulty AVG (%) H1 33% 33% 0 22.3 H2 67% 33% 33% 44.2 H4 67% 33% 33% 44.2 H5 33% 67% No change 33.3 H6 33% 33% 33% 33.3 W2 No change 67% 67% 44.6 C1 67% 67% 33% 55.7 C3 33% 33% 33% 33.3 In the previous section the solution provides a system that could identify each of the boxes however in case

A Wireless Sensor Network

2321 words - 10 pages independently. E. Fault tolerance If any Machine is Being shut down ,then all agents will be warned and given time to dispatch it to any other host and execute their operations on that machine. Earlier till now, it was seen that most of the researchers work on the single cause of fault nodes that is battery depletion. But I have tried to identify all types of faults and cure them by the agent based fault management in wireless sensor networks. That

Algorithms in Engineering Control Systems

1526 words - 7 pages Increasing requirements for design of system with increased fault tolerance is the current trend in design of control systems. Since the majority of real control systems are non- linear, algorithms dealing with a class of non-linear systems use the approach of Takagi-Sugeno fuzzy models. These algorithms deals with fault diagnosis, reconfiguration of the system and solve problems related to the constraints of the system and the time delay that

MapReduce: A Programming Model

764 words - 4 pages some tasks. A node does not request a task before it has completed the previous tasks. In order to be fault tolerant, MARLA uses a specific fault tolerance scheme. Failed tasks are submitted to another worker. If the task succeeds with another worker, the faulty node is given a strike. Nodes with three strikes are considered as faulty node and not allowed to participate in the processing. This scheme avoids expensive data relocation used by many

Diagnosis of Unwanted Behaviours in Multi-Agent Systems

3796 words - 16 pages Multi-Agent Systems (MAS) must increase their reliability in order to guarantee a solid and safe response to unexpected situations in complex domains. To address such an issue, both fault tolerance and software testing techniques are applied to ensure complying services; the foundation of these is the effectiveness in identifying and classifying fault events. Despite the advances in the field, most of fault-tolerant MAS rely on diagnosis

Cloud Computing

760 words - 4 pages including natural disaster, which causing data loss, real time work technical issues when everything was shifted to cloud computing provider. The enterprise would give more awareness on their other main business planning to operate the business to be more firm. “Fault-tolerance enables the systems to continue to operate in the event of the failure of some of its components. In general, fault-tolerance requires fault isolation to the following

"To Kill a Mockingbird" by Harper Lee. "The Wisdom of a Father" describes the important lessons Atticus taught to his children

770 words - 3 pages care of his children and to teach them important lessons about life. Two important lessons in the book are tolerance and self-control, which he teaches them through both his example and explanation.To begin, Atticus teaches tolerance by his example many times in the book. One way he shows tolerance is towards the African-Americans. He is not ashamed to be defending an African-American, even when the rest of the town is putting him down for doing

Resistors in series and parallel

1203 words - 5 pages relatively small amount of current may be destroyed.All resistors have a level of tolerance. This is to allow for imperfections in the manufactured object. It was determined through experimentation that all of the resistors that were used in this experiment were within their tolerance range with the exception of R4 (See Table 1). This resistor had a nominal value of 1600 ohms and a tolerance of +/-5%. This means that this resistor should have had a

Similar Essays

Fault Tolerance And Power System Essay

834 words - 3 pages PAGE PAGE 3 Fault Tolerance Fault Tolerance and Power SystemDeborah TuckerCIS Risk Management/CMGT 579University of PhoenixKrystal HallDecember 4, 2007Fault Tolerance and Power SystemOur culture relying on electronic information processing has produced an increasing desire for diligent operating systems. This degree of functioning in financial organizations, healthcare providers, transportation providers, and phone systems, has been connected

Raid: Redundant Array Of Inexpensive Disks

963 words - 4 pages . RAID 1 mirrors the data across multiple disks without parity or stripping; resulting in the best possible fault tolerance and good read performance. However because of the mirrored nature of RAID 0, it has the worst space efficiency, where the useable capacity of a RAID 0 array is 50% of the available drives in the set. In addition there is a slight reduction in writing performance since the data has to written and verified to multiple disks

Response Of Every Type Of Software Error

1643 words - 7 pages of error are determined which allow service to be maintained. The approach presents a classification scheme for errors and techniques for the provision of software fault tolerance in real-time systems. For Error classification, errors will be classified according to a set of definitions, internal error that can be adequately handled by the process in which the error is detected. External error, that cannot be adequately handled by the process in

Fault Tolerance Essay

1436 words - 6 pages Fault Tolerance is described as a design feature that allows a system to continue operating in spite of errors or problems that occur. "Many areas of fault tolerance are considered when designing or altering information systems. These include but are not limited to, power system, transaction journalizing, database shadowing or mirroring, raid technologies, network redundancy, and security to name a few" (UOP, 2005). Many organizations rely on