The ARM1176JZF-S (which I will refer to just as ARM) microprocessor belongs to the ARM11 family and uses ARMv6 32-bit RISC architecture.
The AVR has a 2 stage single level pipeline, which is a simple pre-fetch and execute system. The ARM however has a much more complex pipeline system. It uses a 8-stage dual level pipeline; fetch1, fetch2, decode, register, shift, data1, data2 and write-back. This has many advantages over a basic 2 stage pipeline. Parallelism within an instruction allows continued execution for instructions that use both the memory access pathway and the arithmetic pathway in the event that the data cache misses, this means that the requested data was not in the cache and had to be accessed in the data memory. The pipeline can also take alternate paths for different memory operations. Using a direct mapped 128 entry cache, which stores previous branch instructions, the pipeline can make targeted address, or dynamic branch predictions. This means it fills the pipeline with the expected instructions, which reduces the chance that in the event of a branch the whole pipeline must be refilled. When the dynamic prediction is not available it uses a static branch prediction which always predicts a branch with a negative offset, this is particularly helpful for loops and they generally branch back to the start of the loop. The ARM pipeline can also use branch folding, which completely removes the branch instruction from the pipeline.
Both the AVR and ARM use Harvard base architectures, separating the program memory and instruction memory. However the main difference lies in the instruction set. The AVR uses mostly 16-bit instructions, with some 32-bit instructions. The ARM uses 32-bit instructions with 16-bit THUMB instruction set also available. The 16-bit THUMB instructions are converted to ARM instructions during execution which uses an extra clock cycle. Although it takes longer to process the THUMB instructions they have the advantage of dense code which is ideal for devices with small amounts of memory. THUMB instructions are ~65% of ARM code size.
In the AVR the only conditional instructions are some branch instructions, but the ARM has conditional execution of arbitrary instructions. The highest 4-bits of each 32-bit instruction are the execution condition. These bits correspond with the 4 condition flags in the Current Program Status Register (CPSR), and the instruction will only be executed if the flags in the CPSR match the instruction. This means that the instruction can be discarded as soon as it has been decoded, only 2 steps into the pipeline, and that fewer branch instructions are required.
The ARM power saving features include; branch prediction, physically tagged caches to reduce cache flushes, the caches use sequential access to the ram to reduce accesses and gates to disable unused inputs.
The ARM uses a cache memory system to increase processing speed. It has a two level cache, with separate caches for data...

