One of the earlier goals of CPU designers was to provide more and more instructions in the instruction set of a CPU to ensure that the CPU supports more functions directly. This makes it easier to translate high-level language programs to machine language and ensures that the machine language programs run more effectively. Of course, additional instruction in the instruction set of a CPU requires additional hardware circuitry to handle that instruction, adding more complexity to the CPU’s hardware circuitry. Another goal of CPU designers was to optimize the usage of expensive memory. T
o achieve this, designers tried to pack more instructions in memory by introducing the concept of variable-length instructions such as half-word, one-and-half-word etc. For example, an operand in an immediate instruction needs fewer bits, and hence, a CPU designer can design it as a half-word instruction. Additionally, designers originally designed CPUs to support a variety of addressing modes (discussed later in this chapter during the discussion of memory).
CPUs with large instruction set, variable-length instructions, and a variety of addressing modes are called CPUs based on CSIC (Complex Instruction Set Computer) architecture. Since CSIC processors possess so many processing features, they make the job of machine language programmers easier. However, they are complex and expensive to produce. Most modern personal computers use CSIC processors.
In early 1980s, some CPU designers realized that many instructions supported by a CSIC-based CPU are rarely used. Hence, they came up with the idea of greatly reducing design complexity of a CPU by implementing in hardware circuitry only bare minimum basic set of instructions and some of the more frequently used instructions. The instruction set of the CPU need not support other complex instructions because a computer can implement them in software by using the basic set of instructions. While working on simpler CPU design, the designers also came up with the idea of making all instructions of uniform length so that decoding and execution of all instructions becomes simple and fast.
Furthermore, to speed up computation and to reduce the complexity of handling a number of addressing modes they decided to design all instructions in such a way that they retrieve operands stored in registers in CPU rather than from memory. These design ideas resulted in producing faster and less expensive processors. CPUs with a small instruction set, fixed-length instructions, and reduced references to memory to retrieve operands are called CPUs based on RISC (Reduced Instruction Set Computer) architecture.
Since RISC processors have a small instruction set, they place extra demand on programmers who must consider how to implement complex computations by combining simple instructions. However, RISC processors are faster for most applications, less complex, and less expensive to produce than CSIC processors because of simpler design.
Popular RISC processors used in workstations are POWER (used in IBM workstations), SPARC (USED IN sun workstations), and PA-RISC (used in HP workstations).
Supporters of RISC technology claim that increased processing speed and lower cost of RISC processors easily offset limitations of a reduced instruction set. However, critics of RISC technology are of the opinion that a RISC processor has to process more of these simple programmed instructions to complete a task, placing additional burden on system software. There seems to be clear answer as to which technology is better. The answer may be that each technology lends itself best to certain applications, and so both technologies will coexist.
Explicitly Parallel Instruction Computing (EPIC) technology breaks through the sequential nature of conventional processor architecture by allowing the software to communicate explicitly to the processors when operations can be done in parallel. For this, it uses tighter coupling between the compiler and the processor. It enables the compiler to extract maximum parallelism in the original code and explicitly describe it to the processor.
For this, EPIC processors use the following three key techniques:
EPIC technology breaks through the sequential nature of today’s conventions processors architectures by allowing the software to communicate explicitly to the processor when the system can perform an operation in parallel. For this, it uses tighter coupling between compiler and processor. It enables the compiler to extract maximum parallelism in original code and explicitly describe it to the processor. At compile time, the compiler detects which of the instructions can the system execute in parallel. It then reorders them and groups them in such a way that the system can execute instructions belonging to separate groups in parallel. At runtime, the processor exploits this explicit parallelism information provided by the compiler to execute the instructions faster.
Predication techniques improves performance by reducing the number of branches and branch mispredicts. Once again, the system first takes the help of compiler to reorder the instructions to reduce the number of branches as much as possible at compile time. Conventional processors use “branch prediction” technique in which the processor predicts which way a branch will fork and speculatively executes instructions along the predicted path. At the time of execution of the branch instruction, if the system finds the prediction to be correct, the processor gains performance improvement because instructions lying in the path to be executed now have already been executed and the processor can use their results directly. However, if the system finds the prediction to be wrong, it discards the results of execution of the predicted path and takes up the instructions of the correct path for execution.
However, EPIC technology uses “branch predication” instead of “branch prediction”. In this technique, instead of predicting and executing one of the paths of a branch, the processor executes instructions of all the paths of the branch exploiting as much parallelism as possible. Now when the processors discovers the actual branch outcome, it retains the valid results and discards other results. Thus, branch predication effectively removes the negative affect of branch predication technique in cases of branch mispredicts.
Speculation technique improves performance by reducing the effect of memory-to-processor speed mismatch. Memory access speed is much slower than processor speed. Speculative data loading technique takes care of this by loading a piece of data and keeping it ready before the processor actually requires it. It not only allows the processor to load a piece of data from memory before a program actually needs it, but it also postpones the reporting of exceptions if the loaded data is illegal.
We can also implement speculation technique by taking help of complier analyses a program at compile time, looking for any instructions that will need data from memory. It inserts speculative load instructions in the instructions stream of the program well ahead of the instructions that need data from memory. It also inserts a matching speculative check instruction immediately before the instructions that need data from memory. It now reorders the surrounding instructions so that the processor can dispatch them in parallel.
Now when the processor encounters a speculative load instruction at runtime, it retrieves and loads the data from memory. When the processor encounters the speculative check instruction, it verifies the load before allowing the program to use the loaded data in the next instruction. If the load is invalid, the processor does not immediately report an exception. It postpones exception reporting until it encounters a check instruction that matches the speculative load. If the load is valid, the system behaves as if the exception never happened.
Speculation technique combined with predication technique gives the compiler more flexibility to reorder instructions and increase parallelism.
Notice that all three techniques are based on the availability of an intelligent compiler and closed coupling between compiler and processor. Hence, performance of processors based on EPIC technology comes from both hardware and software.
Processors based on EPIC architecture are simpler and more powerful than traditional CISC or RISC processors. These processors are mainly for 64-bit, high-end server and workstation market (not for personal computer market). Intel’s IA-64 (code-named Itanium) was the first EPIC processor.
Till recently, the approach used for building faster processors to keep reducing the size of chips while increasing the number of transistors they contain. Although, this trend has driven computing industry for several years, it has now been realized that transistors cannot shrink forever. Current transistor technology limits the ability to continue making single-core processors more powerful due to following reasons:
- As a transistor gets smaller, the gate, which switches the electricity ON and OFF, gets thinner and less able to block flow of electrons. Thus, small transistors tend to use electricity all the time, even when they are not switching. This wastes power.
- Increasing clock speeds causes transistors to switch faster and generate more heat and consume more power.
These and other challenges forced processor manufacturers to research foe new approaches for building faster processors. In response, manufacturers came out with the idea of building multicore processor chips instead of increasingly powerful (faster) single-core processor chips. That is, in the new architecture, a processor chip has multiple cooler-running, more energy-efficient processing cores instead of one increasingly powerful core. The multicore chips do not necessarily run as fast as the highest performing single-core models, but they improve overall performance by handling more work in parallel. For instance, a dual-core chip running multiple applications is about 1.5 times faster than a chip with just one comparable core.
Operating system (OS) controls overall assignment of tasks in a multicore processor. In a multicore processor, each core has its independent cache (though in some designs all cores share the same cache), thus providing the OS with sufficient resources to handle multiple applications in parallel. When a single-core chip runs multiple programs, the OS assigns a time slice to work on one program and then assigns different time slices for other programs.
This can cause conflicts, errors, or slowdowns when the processor must perform multiple tasks simultaneously. However, a multicore chip can run multiple programs at the same time with each core handling a separate program. The same logic holds for running multiple threads of a multithread application at the same time on a multicore chip with each core handling a separate thread. Based on this, either the OS or a multithreaded applications parcels out work multiple cores.
Multiple processors have following advantages over single-core processors:
- They enabling of computers with better overall system performance by handling more work in parallel.
- For comparable performance, multicore chips consume less power and generate less heat than single-core chips. . Hence, multicore technology is also referred to as energy-efficient or power-aware processor technology.
- Because the chips cores are on the same die in case of multicore processors architecture, they can share architectural components, such as memory elements and memory management. They thus have fewer components and lower costs than systems running multiple chips (each a single-core processor).
- Also, signaling between cores can be faster and use less electricity than on multichip systems.
Multicore processors, however, currently have following limitations:
- To take advantage of multicore chips, we must redesign applications so that the processor can run them as multiple thread. Note that it is more challenging to create software that is multithreaded.
- To redesign applications, programmers must find good places to break up the applications, divide the work into roughly equal pieces that can run at the same time, and determine the best times for the threads to communicate with one another. All these add to extra work for programmers.
- Software vendors often charge customers for each processor that will run the software (one software license per processor). A customer running an application on an 8-processor machine (multiprocessor computer) with single-core processors would thus pay for 8 licenses. A key issue with multicore chips is whether software vendors should consider a processor to be a single core or an entire chip. Currently, different vendors have different views regarding this issue. Some consider a processor as a unit that plugs into a single socket on the motherboard, regardless of whether it has one or more cores. Hence, a single software license is sufficient for a multicore chip. On the other hand, others charge more to use their software on multicore chips for pre-processor licensing. They are of the opinion that customers get added performance benefit by running the software on a chip with multiple cores, so they should pay more. Multiple chip makers are concerned that this type of non-uniform policy will hurt their products’ sales.
Chips makers like Intel, AMD, IBM, and Sun have already introduced multicore chips for servers, desktops, and laptops. The current multicore chips are dual-core (2 cores per chip), 8 cores per chip, and 16 cores per chip. Industry experts predict that multicore processors will be useful immediately in server class machines but won’t be very useful on the desktop systems until software vendors develop considerably more multithreaded software. Until this occurs, single-core chips will continue to be used. Also, since single-core chips are inexpensive to manufacture, they will continue to be popular for low-priced PCs for a while.