Microprocessor Without Interlocked Pipeline Stages Computer Science Essay
MIPS is originally an acronym for Microprocessor without Interlocked Pipeline Stages, a reduced instruction set computing (RISC) architecture developed by MIPS Technologies. In the mid to late 1990s, it was estimated that one in three RISC microprocessors produced would be MIPS implementations.
Table 1.1 Specifications
Core Frequency:
200 MHz
Data bus (ext.):
64 Bit
Address bus:
64 Bit
Transistors:
2,300,000
Voltage:
3.3 V
Introduced:
05/1994
Manufactured:
week 36/1996
Made in:
Japan
L1 Cache:
16+16 KB
Package Type:
Ceramic
Goldcap
PGA-447
Â
Fig.1.1 A MIPS R4400 microprocessor made by Toshiba.
The first RISC (Reduced Instruction Set Computer) processor was made in the late 1970’s and worked as the name says on a reduced instruction set, which was to be quite faster than the older CISC architecture (complex instruction set computer). The point of RISC was using more registers and less load/store instructions. The improvement of RISC was pipelines. Pipelining enables the ability of the processor to run one instruction while it is executing the other, and hence minimizes total execution time of the instructions.
In 1981, a team led by John L. Hennessy at Stanford University started the work on what would become the first MIPS processor. MIPS focused almost entirely on the pipeline. Although pipelining was already in use, several features of the MIPS chip made its pipeline faster.
Embedded markets
In 1990’s the MIPS architecture was widely adopted by the embedded market, including for use in computer networking/telecommunications, video games, home video game consoles, computer printers, digital set-top boxes, digital televisions, DSL and cable modems.
Synthesizable Cores for Embedded Markets
In recent years, most of the technology used in the various MIPS generations has been offered as IP (semiconductor intellectual property)-cores as building-blocks for embedded processor designs. Both 32-bit and 64-bit basic cores are offered, known as 4K and 5K respectively, and the design itself can be licensed as MIPS32 and MIPS64. These cores can be mixed with add-in units such as FPUs (Floating Point Units), various input/output devices, etc.
MIPS based Supercomputers
One of the more interesting applications of the MIPS architecture is its use in massive processor count supercomputers. Silicon Graphics Interface (SGI) refocused its business from desktop graphics workstations to the High Performance Computing (HPC) market in the early 1990s. The success of the company’s first foray into server systems, the Challenge series based on the R4400 and R8000, and later R10000, motivated SGI to create a vastly more powerful system. The introduction of the integrated R10000 allowed SGI to produce a system, the Origin 2000, eventually scalable to 1024 CPUs. The Origin 2000 began the Origin 3000 series which topped out with the same 1024 maximum CPU count but using the R14000 and R16000 chips up to 700 MHz. Its MIPS based supercomputers were withdrawn in 2005 when SGI made the strategic decision to move to Intel’s IA-64 architecture.
1.1 AIM OF THE PROJECT
The aim of this project is to design a 16-bit RISC processor whose architecture is based on the MIPS architecture, which possesses 4-stage pipelining and has 32 instructions. Code is written so as to implement each of the instruction using Verilog, an industrially accepted Hardware Description Language and simulation is done using ModelSim.
1.2 PROBLEM FORMULATION AND METHODOLOGY
To obtain the 16-bit RISC processor based on MIPS architecture, each individual functional unit is to be designed. After implementing each of these units, they have to be integrated. Code is to be written for each individual functional unit in Verilog and simulation using Modelsim.
1.3 LITERATURE SURVEY
The project was an effort to design a 16-bit RISC processor based on MIPS architecture and implement the design using Verilog.
We needed to know the architecture and instruction set of MIPS processor. We could get this information from the website http://en.wikipedia.org/wiki/MIPS. While designing the processor we needed to know the basic concepts regarding a microprocessor which has been explained in detail in the book “MICROPROCESSOR ARCHITECTURE AND PROGRAMMING-8085” by Ramesh Gaonkar.
The information about pipelining has been obtained from the text book “COMPUTER ORGANISATION”, fifth edition by Carl Hamacher, Zvonko Vranesic and Safwat zaky.
For simulation of the design, we required to know the basics of how to simulate systems using hardware description languages. This information has been obtained from the book “Verilog HDL,” by Samir Palnitkar.
In addition to the above mentioned papers and books we could get more information from different websites. With all the above background information available, we could successfully execute our project.
1.4 ORGANIZATION OF REPORT
The report has been organized as follows:
Chapter 1 provides brief overview of MIPS, aim of the project, literature survey and finally project report organization.
Chapter 2 contains description about the processor architecture and gives the detailed explanation about each of the components of the processor architecture.
Chapter 3 provides information about various instruction formats and types of instructions present in the processor.
Chapter 4 focuses on implementation of the design.
Chapter 5 presents simulation results for a set of instructions.
Chapter 6 gives conclusion of project and future scope.
2. MIPS-16 INSIGHT
2.1 Architecture
Fig. 2.1 Block Diagram of Processor Architecture
The architecture of the processor that we have designed is as shown in the figure 2.1 which depict the following components.
REGISTER FILE: The MIPS has eight general purpose registers to store the 16-bit data; these are identified as R0, R1, R2, R3, R4, R5, R6 and R7. The registers R1, R2, R3, R4, R5 and R6 are used to store and copy data. The register R0 is hardwired to zero. The register R7 is a link register that stores the current PC address in case of jump and link conditions.
SPECIAL PURPOSE REGISTER: The Program Counter (PC) is a 16-bit special purpose register which at a given time, stores the address of the next instruction to be fetched. Program Counter acts as a pointer to the next instruction. The incrementing of the Program Counter depends on the nature of the instructions.
ARITHMETIC AND LOGIC UNIT: The MIPS’s ALU performs arithmetic and logical functions on 16-bit operands. The arithmetic unit performs fundamental arithmetic operations such as addition and subtraction. The logic unit performs bitwise logical operations such as AND, OR and EX-OR.
BARREL SHIFTER: A Barrel Shifter is a digital circuit that can shift a data word by a specified number of bits in only one clock cycle.
MEMORY: Memory refers to a storage device. There are two separate memories for the storage of program and data. Program memory is a 256 X 16-bit memory. There are 256 memory locations; each of them is 16-bit wide. It is Read Only Memory or ROM. Data memory is also a 256 X 16-bit memory. There are 256 memory locations each of them is 16-bit wide. It is a Random Access Memory or RAM.
ZERO/SIGN EXTENDER: The zero/sign extender extends the 5-bit immediate data to 16-bit data depending on the MSB-bit. For arithmetic operations the MSB-bit is extended to all the other bits and for logic operations the remaining bits are filled with zeros.
MULTIPLEXERS: Two 4X1 multiplexers are used to select the output data from different blocks like ALU, Barrel Shifter, and Data Memory to be written into the register file.
2.2 Functional Components
REGISTERFILE
The register file has a total of 8 registers designated as R0, R1, R2, R3, R4, R5, R6 and R7. The register R0 is hardwired to zero. Its value always remains constant. The register R7 is a link register, it is used to store the current PC address when there is a procedure call in the program. The register file is as shown in the figure 2.2.1
[2:0] [2:0] [2:0]
[15:0]
REGISTER FILE
[15:0]
[15:0]
BUS A
BUS B
REGOPA REGOPB REGDEST
CLK
WEN
REGDATA
FIG. 2.2.1: REGISTER FILE
Table 2.2.1 ADDRESS OF THE REGISTERS
REGISTER
ADDRESS
Register 0
000
Register 1
001
Register 2
010
Register 3
011
Register 4
100
Register 5
101
Register 6
110
Register 7
111
The 3-bit control lines REGOPA and REGOPB provide the address of the registers whose contents are to be written into Bus A and Bus B respectively. The table 2.2.1 gives the address of the registers.
The 3-bit REGDEST gives the address of the destination register in which the result is stored. The result from various control blocks is written into the register by the REGDATA line which is 16-bit wide. When the control line WEN=1, the data is written into the destination register.
PROGRAM COUNTER BLOCK
The Program Counter provides the address of the next instruction to be fetched from the instruction memory. The Program Counter block is as shown in the figure 2.2.2.
Figure 2.2.2: Program Counter block
The PC value is incremented once for each instruction, but when branch and jump instructions are encountered it has to perform a different operation. In immediate addressing the next address is obtained by adding the sign extended immediate value with the current PC address. In jump instruction the lower 7-bit immediate address specified in the instruction is concatenated with the higher 9-bits of the current PC address. In the instructions involving procedure call, the register specified in the instruction contains the 16-bit address of the new location where the control is passed.
ARITHMETIC AND LOGICAL UNIT
Fig. 2.2.3 Arithmetic and Logic Unit
The ALU performs arithmetic operations such as addition, subtraction and logical operations such as AND, OR, XOR etc. The ALU is as shown in the figure 2.2.3.
The two inputs to the ALU block are given by BUS A and BUS B. The ALUSRC is selected as 1 if the immediate data is one of inputs else it is 0 if the inputs are from the general purpose register. The ALUCTRL determines the operation to be performed. The table 2.2.2 gives the list of operations performed by ALU for a given 3-bit control word.
Table 2.2.2 OPCODES FOR ARITHMETIC OPERATIONS
ALUCTRL
OPERATION
000
SUBTRACT
001
ADD
010
NOR
011
OR
100
NAND
101
AND
110
XNOR
111
XOR
There are two flags which are affected by the ALUOUT. They are zero and sign flags.
SIGN FLAG- This flag is set if after the execution the bit-16 of the result is 1.
ZERO FLAG- The zero flag sets if the result of operation in ALU is zero.
SHIFT BLOCK
Fig 2.2.4 BARREL SHIFTER
The barrel shifter is used to shift the bits in the register. The barrel shifter shifts the data by any number of bits in a single clock cycle. The shift block is as shown in the figure 2.2.4.The 16-bit data to be shifted is provided to the shift block by BUS A. The number of shifts to be done is specified by the lower 4-bits in the BUS B. The RL signal line decides the type of shift. If RL=0 then the shifter performs the left shift else if RL=1 the shifter performs the right shift. The SHFTSRC is the select line for the multiplexer which has two inputs; the data on BUS A and the sign extended immediate data. Its function is to select one of these inputs depending on the binary value assigned to it. If SHFTSRC=0 the data on BUS A is selected for shift operation else if the SHFTSRC=1, the load upper immediate operation is done i.e. the immediate data is stored in upper most 8-bits of the register specified by the instruction.
ZERO/SIGN EXTENDER
The zero/sign extender extends the 5-bit immediate data into 16-bit data. The z/s extender first checks the MSB of the 5-bit immediate data and depending on it, fills 1’s or 0’s in the remaining bits. If MSB=1, the remaining higher 11-bits are filled by 1’s. If the MSB=0, the remaining higher 11-bits are filled by 0’s.
Example: If the 5-bit immediate data 11011 is the input to the z/s extender the output is 1111111111111011.
If the 5-bit immediate data 01110 is the input to the z/s extender the output is 0000000000001110.
MULTIPLEXERS
Fig. 2.2.5 MULTIPLEXER
Two 4×1 multiplexers are used to select the output data from different blocks like ALU, Barrel Shifter, Data memory to be written into the register file. The two multiplexers are shown in figure 2.2.5.The first multiplexer has a select line MTR; it selects one of the outputs and provides the input to the other multiplexer. The table 2.2.3 gives the output selected by Multiplexer for the select line MTR.
Table 2.2.3 SUMMARY OF MULTIPLEXER 1 OPERATION
The second multiplexer has a select line RSRC; it selects one of the outputs and provides the input to be written into the register. The table 2.2.4 gives the output selected by Multiplexer for the select line RSRC.
Table: 2.2.4 SUMMARY OF MULTIPLEXER 2 OPERATION
MEMORY
Fig. 2.2.6 INSTRUCTION MEMORY
Fig. 2.2.7 DATA MEMORY
There are two separate memories for storing the instruction and data. The instruction memory has 256 locations each of 16-bit wide. The block diagram of the instruction memory is as shown in the figure 2.2.6.
The instruction memory contains the instruction to be executed by the processor. The address from the PC is used to fetch the instruction. The output of the instruction memory is given to the control unit for further decoding.
The data memory also has 256 locations each of 16-bit wide. The block diagram of the data memory is as shown in the figure 2.2.7. The address of the location in which the data is to be written or data is fetched out is given by the instruction. Only load and store instructions refer to the data memory. The DATA-IN gives the data to be written into memory and DATA-OUT is the data fetched out from memory. When WE=1 the data is written into the memory and when WE=0 the data is read out from the memory.
CONTROL UNIT
To fetch and execute instructions, control signals must be generated in a specific sequence to accomplish the task; data selectors must have control signals as inputs. Each register has an input control line, which when activated will cause a new value to be loaded into the register. The ALU needs control signals to specify what operation it should perform-for example, add or subtract. The memory needs control signals to specify when a read or write operation is to be performed. The register file needs a control signal to specify when a value should be written into the register file. All of these control signals come from the control unit. Suffice it to say that such a control unit can be implemented in hardware to produce a sequence of signals to control the fetching of instructions from memory, and the execution of these instructions.
Control unit is responsible for setting all the control signals so that each instruction is executed properly. Control unit’s input is the 16-bit instruction word. Most of the signals can be generated from the instruction opcode alone and not the entire 16-bit word.
Fig. 2.2.8 CONTROL UNIT
The Control signals generated by the control unit and their corresponding functions are given below.
INST: This control signal corresponds to the last 5 bits of the instruction word which is actually the opcode of a particular instruction. It indicates which operation has to be performed on the operands present in the instruction.
REGDEST: It is a 3-bit control signal which indicates that, in which register among all the general purpose registers present in the register file the data has to be written.
REGOPA: This control signal is there to indicate, from which register the data has to be latched to the data Bus A. It is of 3-bits wide.
REGOPB: This is for indicating that, from which register the data has to be latched to the data Bus B It is also of 3-bits wide.
ALUCTRL: A 3-bit control signal ALU, which indicates the operation to be performed on the ALU input operands.
ALUSRC: Single bit, it’s a input to data selector, makes the selection between BUSB data and the immediate value depending on the binary value assigned on it.
RSRC: It is a 2-bit selection line to the multiplexer block which has four various inputs.
WEN: Write enable signal is a single bit input to the register file of MIPS, which is to indicate that whether the operation to be performed is read or write.
SELECT_PC: A 2-bit control input to the data selector present in the PC block.
MTR: Memory to register control signal is of 2-bit wide, it acts as a select line for a mux in selecting one of the four various inputs.
RLS: A single bit signal, the binary value on which indicates that the 16-bit data to be shifted right or left.
IMMVALUE: This signal is 5-bit wide, when an immediate data has to be given as input to the ALU or Barrel shifter, this control signal is to be generated by the control unit.
J_IMM: This signal is 8-bit wide and provides the last 8 bits of the address to jump.
SFTSRC: A single bit selection input to a MUX for selecting either the immediate data or the data on BUSA for shifting operation in Barrel shifter.
ENB: This is the enable signal, which is to enable or disable the Control Unit.
SF: This signal is set when the MSB of the result is high else it is reset.
ZF: This signal is set when the result is zero.
2.3 PIPELINE STAGES
The speed of execution of programs is influenced by many factors. One way to improve performance is to use faster circuit technology to build the processor and the main memory. Another possibility is to arrange the hardware so that more than one operation can be performed at the same time. In this way, the number of operations performed per second is increased even though the elapsed time needed to perform anyone operation is not changed.
Pipelining is a particularly effective way of organizing concurrent activity in a computer system. The basic idea is very simple. It is frequently encountered in manufacturing plants, where pipelining is commonly known as an assembly-line operation.
How Pipelining Works
Pipelining: A standard feature in RISC processors is much like an assembly line. Because the processor works on different steps of the instruction at the same time, more instructions can be executed in a shorter period of time.
A useful method of demonstrating this is the laundry analogy. Let’s say that there are four loads of dirty laundry that need to be washed, dried, and folded. We could put the first load in the washer for 30 minutes, dry it for 40 minutes, and then take 20 minutes to fold the clothes. Then pick up the second load and wash, dry, and fold, and repeat for the third and fourth loads. Suppose we started at 6 PM and worked as efficiently as possible, we would still be doing laundry until midnight.
Fig. 2.3.1 DEMONSTRATION WITHOUT PIPELINING
However, a smarter approach to the problem would be to put the second load of dirty laundry into the washer after the first was already clean and whirling happily in the dryer. Then, while the first load was being folded, the second load would dry, and a third load could be added to the pipeline of laundry. Using this method, the laundry would be finished by 9:30.
Fig. 2.3.2 DEMONSTRATION WITH PIPELINING
RISC Pipelines
A RISC processor pipeline operates in much the same way, although the stages in the pipeline are different. While different processors have different numbers of steps, the processor which we are designing has 4 stages of pipelining.
Pipelining is concerned with the following tasks:
Use multi-cycle methodologies to reduce the amount of computation in a single cycle.
Shorter computations per cycle allow for faster clock cycles.
Overlapping instructions allows all components of a processor to be operating on a different instruction.
Throughput is increased by having instructions complete more frequently.
This processor processes each instruction in four steps, as follows:
F Fetch: read the instruction from the memory.
D Decode: decode the instruction and fetch the source operand(s).
E Execute: perform the operation specified by the instruction.
W Write: store the result in the destination location.
4 Stage Pipeline
During 1st stage ‘Fetch’ operation is performed. In this operation opcode is fetch from the memory by PC block.
During 2nd stage ‘Decode’ operation is performed. In this operation opcode is sent to the control unit for decoding the instruction.
During 3rd stage ‘Execution’ operation is performed. The Processor performs the operation specified by the instruction.
During 4th stage ‘Write’ operation is done. The result is finally written in the registers.
Fig. 2.3.3 INSTRUCTIONS EXECUTED IN FOUR STEPS
The sequence of events for this case is shown in Figure 2.3.3. Four instructions are in progress at any given time.
PIPELINING HAZARDS
In practice, however, RISC processors operate at more than one cycle per instruction. The processor might occasionally stall as a result of data dependencies and branch instructions.
A data dependency occurs when an instruction depends on the results of a previous instruction. A particular instruction might need data in a register which has not yet been stored since that is the job of a preceding instruction which has not yet reached that step in the pipeline.
For example:
ADD R3, R2, R1
ADD R5, R4, R3
In this example, the first instruction tells the processor to add the contents of registers R1 and R2 and store the result in register R3. The second instructs it to add R3 and R4 and store the sum in R5. We place this set of instructions in a pipeline. When the second instruction is in the second stage, the processor will be attempting to read R3 and R4 from the registers. Remember, though, that the first instruction is just one step ahead of the second, so the contents of R1 and R2 are being added, but the result has not yet been written into register R3. The second instruction therefore cannot read from the register R3 because it hasn’t been written yet and must wait until the data it needs is stored. Consequently, the pipeline is stalled and a number of empty instructions (known as bubbles go into the pipeline. Data dependency affects long pipelines more than shorter ones since it takes a longer period of time for an instruction to reach the final register-writing stage of a long pipeline.
MIPS solution to this problem is forwarding. By adding the necessary forwarding logic and buses, the result of an ALU operation can immediately be used by the following instructions without waiting for the data to be written in the register file, the data dependency can be eliminated.
Branch instructions are those that tell the processor to make a decision about what the next instruction to be executed should be based on the results of another instruction. Branch instructions can be troublesome in a pipeline if a branch is conditional on the results of an instruction which has not yet finished its path through the pipeline.
For example:
Loop: ADD R3, R2, R1
SUB R6, R5, R4
BEQ R3, R6, Loop
The example above instructs the processor to add R1 and R2 and put the result in R3, then subtract R4 from R5, storing the difference in R6. In the third instruction, BEQ stands for branch if equal. If the contents of R3 and R6 are equal, the processor should execute the instruction labeled “Loop”. Otherwise, it should continue to the next instruction. In this example, the processor cannot make a decision about which branch to take because neither the value of R3 or R6 has been written into the registers yet.
NOP- No operation instruction is used after the branch instructions in order to avoid it from executing the following instructions.
3. INSTRUCTION SET
The processor supports the following instruction formats:
R Type: The instruction has 5 fields which are defined as follows:
Opcode: It is of 5-bits wide and denotes which operation has to be performed.
RS1, RS2: These indicate the source registers which are of 3-bits wide.
Rd : This represents the destination register field, which is of 3-bits.
I-Type: The instruction has 4 fields which are defined as follows
Opcode: It is of 5-bits wide and denotes which operation has to be performed.
RS : This indicates the source register which is of 3-bits wide.
Rd : This represents the destination register field, which is of 3-bits.
Immediate: This is a 5-bit immediate value.
J-Type: The instruction has 2 fields which are defined as follows
Opcode: It is of 5-bits wide and denotes which operation has to be performed.
Target: This is 11-bit immediate address where the program control has to go.
Table 3.1 Instruction set
OPCODE
INSTRUCTION
EXAMPLE
MEANING
COMMENTS
00000
SUBTRACT
SUB R1,R2,R3
R1=R2-R3
TWO’S COMPL
00001
ADD
ADD R1,R2,R3
R1=R2+R3
TWO’S COMPL
00010
NOR
NOR R1,R2,R3
LOGICAL
00011
OR
OR R1,R2,R3
LOGICAL
00100
NAND
NAND R1,R2,R3
LOGICAL
00101
AND
AND R1,R2,R3
LOGICAL
00110
XNOR
XNOR R1,R2,R3
LOGICAL
00111
XOR
XOR R1,R2,R3
LOGICAL
01000
SUBTRACT IMM
SUBI R1,R2,7
R1=R2-7
SIGN-EXTND IMM
01001
ADD IMM
ADDI R1,R2,7
R1=R2+7
SIGN-EXTND IMM
01010
NOR IMM
NOR R1,R2,7
ZERO-EXTND IMM
01011
OR IMM
OR R1,R2,7
ZERO-EXTND IMM
01100
NAND IMM
NAND R1,R2,7
ZERO-EXTND IMM
01101
AND IMM
AND R1,R2,7
ZERO-EXTND IMM
01110
XNOR IMM
XNOR R1,R2,7
ZERO-EXTND IMM
01111
XOR IMM
XOR R1,R2,7
ZERO-EXTND IMM
10000
SHIFT LEFT LOGICAL
SLLV R1,R2,R3
R1=R2<<R3
ZERO FILL
10001
SHIFT RIGHT LOGICAL
SRLV R1,R2,R3
R1=R2>>R3
ZERO FILL
10010
SHIFT RIGHT ARITHMETIC
SRAV R1,R2,R3
R1=R2>>R3
SIGN FILL
10011
LOAD UPPER IMMEDIATE
LUI R1,7
R1=7 X 256
SIGN EXTND
10100
SET LESS THAN
SLT R1,R2,R3
IF {R2<R3}R1=1ELSE R1=0
TWO’S COMPL
10101
SET LESS THAN IMMEDIATE
SLTI R1,R2,7
IF {R2<7}R1=1 ELSE R1=0
UNSIGNED
10110
BRANCH EQUAL
BEQ R1,R2,7
IF {R1=R2} PC = PC+1+7
SIGN-EXTND IMM.
10111
BRANCH NOT EQUAL
BNEQ R1,R2,7
IF {R1!=R2}PC=PC+1+7
SIGN-EXTND IMM
11000
BRANCH GREATER THAN 0
BGTZ R1,5
IF {R1>R0} PC=PC+1+5
SIGN-EXTND IMM
11001
BRANCH LESS THAN 0
BLTZ R1,5
IF {R1<R0} PC=PC+1+5
SIGN-EXTND IMM
11010
JUMP
J128
PC=PC[15:11]|28
CONCATENATE
11011
JUMP AND LINK
JAL 128
R7=PC+1; PC=PC[15:11]128
PROCEDURE CALL
11100
JUMP REGISTER
JR,R3
PC=R3
PROCEDURE RETURN
11101
JUMP AND LINK REGISTER
JALR R3,R7
R7=PC+1;PC=R3
DYNAMIC PROCEDURE CALL
11110
LOAD WORD
LW R1,7{R2}
R1=MEM[7+R2]
SIGN-EXTND IMM.
11111
STORE WORD
SW R1,7{R2}
MEM [7+R2]=R1
SIGN-EXTND IMM.
The instruction set consists of the following group of instructions.
I. Arithmetic instructions
II. Logical instructions
III. Shift instructions
IV. Conditional Instructions
Branch instructions
Data Transfer Instructions
Notations used are as follows:
Rd – Destination register.
Rs1, Rs2 – Source register.
PC- Program counter.
I. Arithmetic instructions
1. SUB Rd, Rs1, Rs2
Operation: Rd ƒŸ Rs1- Rs2
This instruction subtracts the content of the specified source register Rs2 from the specified register Rs1, and stores the result in the destination register specified Rd.
2. SUBI Rd, Rs1, IMM
Operation: Rd ƒŸ Rs1- IMM
This instruction subtracts the 5 bit immediate data from the specified register Rs1, and stores the result in the destinations register specified Rd.
3. ADD Rd, Rs1, Rs2
Operation: Rd ƒŸ Rs1+ Rs2
This instruction adds the content of the specified source register Rs2 with the specified register Rs1, and stores the result in the destinations register specified Rd.
4. ADDI Rd, Rs1, IMM
Operation: Rd ƒŸ Rs1+ IMM
This instruction adds the 5 bit immediate data with the specified register Rs1, and stores the result in the destinations register specified Rd.
II. Logical instructions
1. AND Rd, Rs1, Rs2
Operation: Rd ƒŸ Rs1.Rs2
This instruction logically ANDs the contents of the specified register Rs1 with the contents of the register Rs2 and stores the result in the destination register Rd. Each bit in the register Rs1 is logically ANDed with the corresponding bit in register Rs2.
2. ANDI Rd, Rs1, IMM
Operation: Rd ƒŸ Rs1.IMM
This instruction logically ANDs the contents of the specified register Rs1 with the immediate value and stores the result in the destination register Rd. Each bit in the register Rs1 is logically ANDed with the corresponding bit in the zero extended immediate value.
3. OR Rd,Rs1,Rs2
Operation: Rd ƒŸ Rs1+Rs2
This instruction logically ANDs the contents of the specified register Rs1 with the contents of the register Rs2 and stores the result in the destination register Rd. Each bit in the register Rs1 is logically ANDed with the corresponding bit in register Rs2.
4. ORI Rd, Rs1, IMM
Operation: Rd ƒŸ Rs1+IMM
This instruction logically ORs the contents of the specified register Rs1 with the immediate value and stores the result in the destination register Rd. Each bit in the register Rs1 is logically ORed with the corresponding bit in the zero extended immediate value.
5. NAND Rd, Rs1, Rs2
Operation: Rd ƒŸ (Rs1.Rs2)’
This instruction logically NANDs the contents of the specified register Rs1 with the contents of the register Rs2 and stores the result in the destination register Rd.
6. NANDI Rd, Rs1, IMM
Operation: Rd ƒŸ (Rs1.IMM) ‘
This instruction logically NANDs the contents of the specified register Rs1 with the immediate value and stores the result in the destination register Rd. Each bit in the register Rs1 is logically NANDed with the corresponding bit in the zero extended immediate value.
7. NOR Rd,Rs1,Rs2
Operation: Rd ƒŸ (Rs1+Rs2)’
This instruction logically NORs the contents of the specified register Rs1 with the contents of the register Rs2 and stores the result in the destination register Rd.
8. NORI Rd, Rs1, IMM
Operation: Rd ƒŸ (Rs1+IMM) ‘
This instruction logically NORs the contents of the specified register Rs1 with the immediate value and stores the result in the destination register Rd. Each bit in the register Rs1 is logically NORd with the corresponding bit in the zero extended immediate value.
9.XOR Rd,Rs1,Rs2
Operation: Rd ƒŸ Rs1+Rs2
This instruction logically XORs the contents of the specified register Rs1 with the contents of the register Rs2 and stores the result in the destination register Rd.
10. XORI Rd, Rs1, IMM
Operation: Rd ƒŸ Rs1+IMM
This instruction logically ANDs the contents of the specified register Rs1 with the immediate value and stores the result in the destination register Rd. Each bit in the register Rs1 is logically ANDed with the corresponding bit in the zero extended immediate value.
11. XNOR Rd,Rs1,Rs2
Operation: Rd ƒŸ (Rs1+Rs2)’
This instruction logically XNORs the contents of the specified register Rs1 with the contents of the register Rs2 and stores the result in the destination register Rd.
12. XNORI Rd, Rs1, IMM
Operation: Rd ƒŸ (Rs1+IMM) ‘
This instruction logically ANDs the contents of the specified register Rs1 with the immediate value and stores the result in the destination register Rd. Each bit in the register Rs1 is logically ANDed with the corresponding bit in the zero extended immediate value.
III. Shift instructions
1. SLLV Rd,Rs1,Rs2
Operation: Rd ƒŸ Rs1<<Rs2 [3:0]
This instruction shifts each bit in the specified source register Rs1 to its left. In each shift operation a zero is appended to the LSB & MSB bit will be lost. The value stored in register Rs2 determines how many shift operations are to be performed (only lower 4 bits are used).
2. SRLV Rd,Rs1,Rs2
Operation: Rd ƒŸ Rs1>>Rs2[3:0]
This instruction shifts each bit in the specified source register Rs1 to its right. In each shift operation a zero is appended to the MSB & LSB bit will be lost. The value stored in register Rs2 determines how many shift operations are to be performed (only lower 4 bits are used).
3. SRAV Rd,Rs1,Rs2
Operation: Rd ƒŸ Rs1>>Rs2 [3:0]
This instruction shifts each bit in the specified source register Rs1 to its right. In each shift operation LSB bit will be lost and sign bit is appended to the MSB & is shifted to its right. If sign bit=0, a zero is appended to MSB, else a ‘1’ is appended to the MSB . The value stored in register Rs2 determines how many shift operations are to be performed(only lower 4 bits are used).
4. LUI Rd, IMM
Operation: Rd ƒŸ 16 bit value whose upper byte is loaded with immediate value.
In this instruction, the destination register is loaded with a 16 bit value, whose upper byte is loaded with immediate value, & the lower byte is loaded with zeroes. i.e., The immediate data bits are shifted 8 times (or multiplied by 256d) to the left in this operation.
IV. Conditional Instructions
1. SLT Rd,Rs1,Rs2
Operation: If (Rs1<Rs2) R d=1
If (Rs1>Rs2) R d=0
This instruction compares the contents of the two source registers Rs1 & Rs2. If the contents of Rs1 is less than that of Rs2, then sets the destination register R d to ‘1’(LSB), else R d is set to ‘0’.
2. SLTI Rd,Rs1,IMM
Operation: If (Rs1<IMM) R d=1
If (Rs1>IMM) R d=0
This instruction compares the contents of the source registers Rs1 with the immediate value. If the contents of Rs1 is less than the immediate value, then sets the destination register R d to ‘1’(LSB), else R d is set to ‘0’.
V. Branch instructions
1. BEQ Rs1,Rs2,IMM
Operation: If (Rs1=Rs2) PC=PC+1+IMM
Else PC=PC+1
This instruction compares the contents of the two source registers Rs1 & Rs2. If the they are equal, then PC value is incremented once & then added with the immediate value specified in the instruction, and finally branches to the address corresponding to the new PC value.
2. BNEQ Rs1,Rs2,IMM
Operation: If (Rs1!=Rs2) PC=PC+1+IMM
Else PC=PC+1
This instruction compares the contents of the two source registers Rs1 & Rs2. If the they are not equal, then PC value is incremented once & then added with the immediate value specified in the instruction, and finally branches to the address corresponding to the new PC value.
3. BGTZ Rs1,IMM
Operation: If (Rs1>R0) PC=PC+1+IMM
Else PC=PC+1
This instruction compares the contents of the source registers Rs1 with R0(contents of R0 register is always zero). If Rs1>R0, then PC value is incremented once & then added with the immediate value specified in the instruction, and finally branches to the address corresponding to the new PC value.
4. BLTZ Rs1,IMM
Operation: If (Rs1<R0) PC=PC+1+IMM
Else PC=PC+1
This instruction compares the contents of the source registers Rs1 with R0 (contents of R0 register is always zero). If Rs1<R0, then PC value is incremented once & then added with the immediate value specified in the instruction, and finally branches to the address corresponding to the new PC value.
5. J IMM
Operation: PCƒŸ PC [15:11] || IMM
In this instruction, the PC will be concatenated with the 8-bit immediate value. The jump takes place to the memory location whose address is specified by this updated PC value obtained.
6. JAL IMM
Operation: R7ƒŸ PC+1; PCƒŸ PC [15:11] || IMM
In this instruction two operations takes place. The address of the instruction next to the current executing instruction is stored in register R7 then the PC value is updated by concatenating the
7. JR Rs1
Operation: PCƒŸ Rs1
The content stored in the specified source register is loaded into PC, which specifies the address of the memory location where jump has take place.
8. JARL Rd, R7
Operation: R7ƒŸPC+1; PCƒŸ Rd
After the execution of this instruction, the address of the instruction next to this current executing instruction is loaded into the register R7, the PC is then updated with a new value which is loaded from the register R3. Later, the fetching of instructions takes place from the memory location pointed by the updated PC value.
VI. Data Transfer Instructions
1. LW Rd , IMM {R2}
Operation: RdƒŸIMM {R2}
After the execution of instruction, content of register R2 is added with the immediate value, to get the sum which points the data memory location, whose data is loaded to the specified destination register.
2. SW Rs, IMM {R2}
Operation: IMM {R2}ƒŸ Rs
After the execution of instruction, content of register R2 is added with the immediate value, to get the sum which points the data memory location, to which the contents of specified source register is stored.
4. IMPLEMENTATION
The code for the Project is written in Verilog, an industrially accepted Hardware Description Language.
4.1 Importance of HDLs:
HDLs have many advantages compared to traditional schematic-based design.
Designs can be described as a very abstract level by use of HDLs. Designers can write their RTL description without choosing a specific fabrication technology. Logic synthesis tools can automatically convert the design to any fabrication technology. If a new technology emerges, designers do not need to redesign their circuit. They simply input the RTL description to the logic synthesis tool and create a new gate-level netlist, using the new fabrication technology. The logic synthesis tool will optimize the circuit in area and timing for the new technology.
By describing designs in HDLs, functional verification of the design can be done early in the design cycle. Since designers work at the RTL level, they can optimize and modify the RTL description until it meets the desired functionality, most design bugs are eliminated at this point. This cuts down design cycle time significantly because the probability of hitting a functional bug at a later time in the gate-level netlist or physical layout is minimized.
Designing with HDLs is analogous to computer programming. A textual description with comments is an easier way to develop and debug circuits. This also provides a concise representation of the design, compared to gate-level schematics. Gate-level schematics are almost incomprehensive for very complex designs.
HDL-based design is here to stay. With rapidly increasing complexities of digital circuits and increasingly sophisticated EDA tools, HDLs are now the dominant method for large digital designs. No digital circuit designer can afford to ignore HDL-based design.
4.2 Popularity of Verilog HDL:
Verilog HDL has evolved as a standard hardware description language. Verilog HDL offers many useful features for hardware design.
Verilog HDL is a general-purpose hardware description language that is easy to learn and easy to use. It is similar in syntax to the C programming language. Designers with C programming experience will find it easy to learn Verilog HDL.
Verilog HDL allows different levels of abstraction to be mixed in the same model. Thus, a designer can define a hardware model terms of switches, gates, RTL, or behavioral code. Also, a designer needs to learn only one language for stimulus and hierarchical design.
Most popular logic synthesis tools support verilog HDL. This makes it the language of choice for designers.
All fabrication vendors provide verilog HDL libraries for post logic synthesis stimulation. Thus, designing a chip in verilog HDL allows the widest choice of vendors.
The Programming Language Interface (PLI) is a powerful feature that allows the user to write custom C code to interact with the internal data structure of verilog. Designers can customize a verilog HDL simulator to their needs with the PLI.
4.3 Algorithm for Software Implementation
1. In the design of MIPS the basic blocks like Register file, ALU, Shifter, Data memory, Instruction memory are separately implemented by writing the software program for these blocks, using Verilog language. These programs are compiled separately to check their proper functioning.
2. Next, the Control Unit is designed. It comprises of a case block having the control signals, required for every operation defined by the opcodes, starting from 00000 to 11111. Again control unit is compiled and simulated to see whether it generates proper control signals to the corresponding opcode.
3. Finally the individual blocks including Control Unit are instantiated in a file which is named as TOPMIPS in our design. The virtual wires are connected in an authentic manner in order to establish the interconnection between these blocks for the flow of data.
By the compilation and simulation of this file using ALTERA 6.3 simulator we get the final implementation of the MIPS.
4.4 Demonstration of MIPS Processor
In this MIPS processor the instructions to be executed are written into the instruction memory. As explained before instruction memory has 256 locations each of 16-bit width. Hence the instructions are also 16-bit. They are entered one after the other in continuous memory locations.
When the TOPMIPS file is simulated the clock signals are generated according to which the functioning occurs. If the reset is high then all the signals are inactive. When reset is in its active low state, the signals are enabled. The first instruction in the instruction memory is fetched; the higher 5-bits represent the opcode which signifies the operation to be performed. The control unit decodes this opcode to initiate the operation to be performed and correspondingly generate the required signals which are sent to the respective blocks for the further functioning.
Fig. 4.4.1The RTL schematic of the designed Processor
5. SIMULATION RESULTS
For example consider the execution of following few instructions:
1. ADDI R1, R0, 8 : R1 ƒŸ (R0+8)
2. ADDI R2, R0, 9 : R2 ƒŸ (R0+9)
3. XOR R3, R1, R2 : Contents of R1 & R2 are XORed and the result is stored in register R3=1.
4. SLLV R4, R2, R3 : Shifts the contents of R2 = 09 left by R3 number of times and store the result in R4.
5. SLT R5, R1, R4 : Compares R1 & R4 if R1 < R4. Then sets R5 = 1
6. BEQ R5, R1, 7 : If R5 = R1 then control goes to PC + 1 + 7; R5! =R1 hence
executes the next instruction.
7. NOP : No operation.
8. NOP No operation.
9. SW R1, 7 {R5} : Stores the content of R1 to the data memory at the location
[R5] + 7
10. LW R6, 7 {R5} : Loads the content of data memory at address (7+ [R5]) into
Register R6.
11. JR R6 : Jumps to the location pointed by the register content.
Table 5.1 Example of instructions
Consider the first instruction, here the contents of R0 and immediate data 8 is added which gives the sum 8. The result is stored in the R1 register. The output for the first instruction is obtained at 4th machine cycle. But, there after we can obtain the output at every machine cycle corresponding to the instruction being executed. This is facilitated by the 4 stage pipelining that we have implemented in our design.
Consider the instruction 2 and 3. Here we can notice that register R2 is the destination register in instruction 2 and then the source register in the 3rd instruction.
In this kind of situation there occurs interlocking in general purpose the processor. The STALL signal halts the processor unless stage-2 gets the data needed. Only after receiving acknowledgement it continues with further execution.
But in MIPS as the acronym itself suggests ‘Microprocessor without Interlocked Pipelined Stages’ the interlocking doesn’t occur, the processor doesn’t need to wait. The data is instantly available for the third instruction. The snapshots of simulation result is as shown in the following figures.
Fig 5.1: SNAPSHOT OF THE REGISTER BLOCK
FIG 5.2: SNAPSHOT OF THE SHIFTER, ALU AND THE MULTIPLEXER BLOCK
FIG 5. 3: SNAPSHOT OF THE CONTROL UNIT AND INSTRUCTION MEMORY
FIG 5.4: SNAPSHOT OF THE PROGRAM COUNTER AND DATA MEMORY
FIG. 5.5: TIMING ANALYSIS REPORT FOR CYCLONE II FPGA
FIG. 5.6: COMPILATION REPORT-FLOW SUMMARY FOR CYCLONE II FPGA
6. CONCLUSION AND SCOPE FOR FUTURE WORK
MIPS is popularly used processor in the embedded systems such as Cisco routers, Windows CE (Embedded Compact), Residential device, video games etc. Verilog is one of the hardware description languages accepted by the industry. The block diagrams helped us to recognize the signals and implement the architecture. We have been successful in implementing the MIPS using Modelsim-Altera 6.3g_p1 (Quartus II 8.1) as the tool with the codes written in Verilog.
We have designed 16-bit MIPS based RISC Processor and we have successfully executed all the 32 instructions and obtained the results for the same. Further we can implement 32 and 64-bit MIPS based RISC Processor.
From the synthesis report we can conclude that an FPGA implementation is possible. For example, using CYCLONE II, the processor can be implemented using 12,480 logic elements at a speed of 71.74MHZ.
Order Now