Cache Memory Plays A Lead Role Information Technology Essay
Answer: Cache (prominent and pronounced as cash) memory is enormously and extremely fast memory that is built into a computer’s central processing unit (CPU) or located next to it on a separate chip. The CPU uses cache memory to store instructions that are repeatedly required to run programs, improving overall system speed. It helps CPU to accessing for frequently or recently accessed data.
C:UsersraushanPicturespage36-1.jpg
References: http://www.wisegeek.com/what-is-cache-memory.htm
Reason for Cache Memory:
There are various reasons for using Cache in the computer some of the reason is mentioning following.
The RAM is comparatively very slow as compared to System CPU and it is also far from the CPU (connected through Bus), so there is need to add another small size memory which is very near to the CPU and also very fast so that the CPU will not remain in deadlock mode while it waiting resources from main memory. this memory is known as Cache memory. This is also a RAM but is very high speed as compare to Primary memory i.e. RAM. In Speed CPU works in femto or nano seconds the distance also plays a major role in case of performance. Cache memory is designed to supply the CPU with the most frequently requested data and instructions. Because retrieving data from cache takes a fraction of the time that it takes to access it from main memory, having cache memory can save a lot of time.
Whenever we work on more than one application. This cache memory is use to keep control and locate the running application within fraction of nano seconds. It enhances performance capability of the system.
Cache memory directly communicates with the processor. It is used preventing mismatch between processor and memory while switching from one application two another instantaneously whenever needed by user. It keeps track of all currently working applications and their currently used resources.
For example, a web browser stores newly visited web pages in a cache directory, so that we can return promptly to the page without requesting it from the original server. When we strike the “Reload” button, browser compares the cached page with the current page out on the network, and updates our local version if required.
References: 1. http://www.kingston.com/tools/umg/umg03.asp
2. http://www.kingston.com/frroot/tools/umg/umg03.asp
3. http://ask.yahoo.com/19990329.html
How Cache Works?
Answer: The cache is programmed (in hardware) to hold recently-accessed memory locations in case they are needed again. So, each of these instructions will be saved in the cache after being loaded from memory the first time. The next time the processor wants to use the same instruction, it will check the cache first, see that the instruction it needs is there, and load it from cache instead of going to the slower system RAM. The number of instructions that can be buffered this way is a function of the size and design of the cache.
The details of how cache memory works vary depending on the different cache controllers and processors, so I won’t describe the exact details. In general, though, cache memory works by attempting to predict which memory the processor is going to need next, and loading that memory before the processor needs it, and saving the results after the processor is done with it. Whenever the byte at a given memory address is needed to be read, the processor attempts to get the data from the cache memory. If the cache doesn’t have that data, the processor is halted while it is loaded from main memory into the cache. At that time memory around the required data is also loaded into the cache. When data is loaded from main memory to the cache, it will have to replace something that is already in the cache. So, when this happens, the cache determines if the memory that is going to be replaced has changed. If it has, it first saves the changes to main memory, and then loads the new data. The cache system doesn’t worry about data structures at all, but rather whether a given address in main memory is in the cache or not. In fact, if you are familiar with virtual memory where the hard drive is used to make it appear like a computer has more RAM than it really does, the cache memory is similar.
Lets take a library as an example o how caching works. Imagine a large library but with only one librarian (the standard one CPU setup). The first person comes into the library and asks for A CSA book (By IRV Englander). The librarian goes off follows the path to the bookshelves (Memory Bus) retrieves the book and gives it to the person. The book is returned to the library once its finished with. Now without cache the book would be returned to the shelf. When the next person arrives and asks for CSA book (By IRV Englander), the same process happens and takes the same amount of time.
Cache memory is like a “hot list” of instructions needed by the CPU. The memory manager saves in cache each instruction the CPU needs; each time the CPU gets an instruction it needs from cache that instruction moves to the top of the “hot list.” When cache is filled and the CPU calls for a new instruction, the system overwrites the data in cache that hasn’t been used for the longest period of time. This way, the high priority information that’s used continuously stays in cache, while the less frequently used information drops out after an Interval. Its similar to when u access a program frequently the program is listed on the start menu here need not have to find the program from the list on “all programs” u simply open the start menu and click on the program listed there, doesn’t this saves Your time.
Working of cache Pentium 4:
Pentium 4:
L1 cache (8k bytes, 64 byte lines, Four ways set associative)
L2 cache (256k,128 byte lines,8 way set associative)
References:
http://computer.howstuffworks.com/cache.htm
http://www.kingston.com/tools/umg/umg03.asp
http://www.zak.ict.pwr.wroc.pl/nikodem/ak_materialy/Cache%20organization%20by%20Stallings.pdf
Levels of Cache
Level 1 Cache (L1): The Level 1 cache, or primary cache, is on the CPU and is used for temporary storage of instructions and data organised in blocks of 32 bytes. Primary cache is the fastest form of storage. Because it’s built in to the chip with a zero wait-state (delay) interface to the processor’s execution unit, it is limited in size.
Level 1 cache is implemented using Static RAM (SRAM) and until recently was traditionally 16KB in size. SRAM uses two transistors per bit and can hold data without external assistance, for as long as power is supplied to the circuit. The second transistor controls the output of the first: a circuit known as a “flip-flop” – so-called because it has two stable states which it can flip between. This is contrasted to dynamic RAM (DRAM), which must be refreshed many times per second in order to hold its data contents.
Intel’s P55 MMX processor, launched at the start of 1997, was noteworthy for the increase in size of its Level 1 cache to 32KB. The AMD K6 and Cyrix M2 chips launched later that year upped the ante further by providing Level 1 caches of 64KB. 64Kb has remained the standard L1 cache size, though various multiple-core processors may utilise it differently.
For all L1 cache designs the control logic of the primary cache keeps the most frequently used data and code in the cache and updates external memory only when the CPU hands over control to other bus masters, or during direct memory access by peripherals such as optical drives and sound cards.
http://www.pctechguide.com/14Memory_L1_cache.htm
ever_s1
Level 2 Cache (L2): Most PCs are offered with a Level 2 cache to bridge the processor/memory performance gap. Level 2 cache – also referred to as secondary cache) uses the same control logic as Level 1 cache and is also implemented in SRAM.
Level 2 caches typically comes in two sizes, 256KB or 512KB, and can be found, or soldered onto the motherboard, in a Card Edge Low Profile (CELP) socket or, more recently, on a COAST module. The latter resembles a SIMM but is a little shorter and plugs into a COAST socket, which is normally located close to the processor and resembles a PCI expansion slot. The aim of the Level 2 cache is to supply stored information to the processor without any delay (wait-state). For this purpose, the bus interface of the processor has a special transfer protocol called burst mode. A burst cycle consists of four data transfers where only the addresses of the first 64 are output on the address bus. The most common Level 2 cache is synchronous pipeline burst. To have a synchronous cache a chipset, such as Triton, is required to support it. It can provide a 3-5% increase in PC performance because it is timed to a clock cycle. This is achieved by use of specialised SRAM technology which has been developed to allow zero wait-state access for consecutive burst read cycles. There is also asynchronous cache, which is cheaper and slower because it isn’t timed to a clock cycle. With asynchronous SRAM, available in speeds between 12 and 20ns,
(http://www.pctechguide.com/14Memory_L2_cache.htm)
976
http://www.karbosguide.com/books/pcarchitecture/images/976.png (picture)
L3 cache – Level 3 cache is something of a luxury item. Often only high end workstations and servers need L3 cache. Currently for consumers only the Pentium 4 Extreme Edition even features L3 cache. L3 has been both “on-die”, meaning part of the CPU or “external” meaning mounted near the CPU on the motherboard. It comes in many sizes and speeds.
The point of cache is to keep the processor pipeline fed with data. CPU cores are typically the fastest part in the computer. As a result cache is used to pre-read or store frequently used instructions and data for quick access. Cache acts as a high speed buffer memory to more quickly provide the CPU with data.
So, the concept of CPU cache leveling is one of performance optimization for the processor.”
http://www.extremetech.com/article2/0,2845,1517372,00.asp
The image below shows the complete cache hierarchy of the “Shanghai” processor. “Barcelona” also has a similar hierarchy except that it only has 2MB of L3 cache.
L3_Cache_Architecture
http://developer.amd.com/PublishingImages/L3_Cache_Architecture.jpg (picture)
Cache Memory Organisation
In a modern microprocessor several caches are found. They not only vary in size and functionality, but also their internal organization is typically different across the caches.
Instruction Cache
The instruction cache is used to store instructions. This helps to reduce the cost of going to memory to fetch instructions. The instruction cache regularly holds several other things, like branch prediction information. In certain cases, this cache can even perform some limited operation(s). The instruction cache on UltraSPARC, for example, also pre-decodes the incoming instruction.
Data Cache
A data cache is a fast buffer that contains the application data. Before the processor can operate on the data, it must be loaded from memory into the data cache. The element needed is then loaded from the cache line into a register and the instruction using this value can operate on it. The resultant value of the instruction is also stored in a register. The register contents are then stored back into the data cache. Eventually the cache line that this element is part of is copied back into the main memory. In some cases, the cache can be bypassed and data is stored into the registers directly.
TLB Cache
Translating a virtual page address to a valid physical address is rather costly. The TLB is a cache to store these translated addresses. Each entry in the TLB maps to an entire virtual memory page. The CPU can only operate on data and instructions that are mapped into the TLB. If this mapping is not present, the system has to re-create it, which is a relatively costly operation. The larger a page, the more effective capacity the TLB has. If an application does not make good use of the TLB (for example, random memory access) increasing the size of the page can be beneficial for performance, allowing for a bigger part of the address space to be mapped into the TLB.
Some microprocessors, including UltraSPARC, implement two TLBs. One for pages
containing instructions (I-TLB) and one for data pages (D-TLB).
An Example of a typical cache organization is shown below:
Cache Memory Principles
• Small amount of fast memory
• Placed between the processor and main memory
• Located either on the processor chip or on a separate module
Cache Operation Overview
Processor requests the contents of some memory location
The cache is checked for the requested data
If found, the requested word is delivered to the processor
If not found, a block of main memory is first read into the cache, then therequested word is delivered to the processor
When a block of data is fetched into the cache to satisfy a single memory reference, it is likely that there will be future references to that same memory location or to other words in the block – locality or reference rule. Each block has a tag added to recognize it.
Mapping Function
An algorithm is needed to map main memory blocks into cache lines. A method is needed to determine which main memory block occupies a cache line. There are three techniques used:
Direct
Fully Associative
Set Associative
Direct Mapping:
Direct mapped is a simple and efficient organization. The (virtual or physical) memory address of the incoming cache line controls which cache location is going to be used. Implementing this organization is straightforward and is relatively easy to make it scale with the processor clock. In a direct mapped organization, the replacement policy is built-in because cache line replacement is controlled by the (virtual or physical) memory address. Direct mapping assigned each memory block to a specific line in the cache. If a line is all ready taken up by a memory block when a new block needs to be loaded, the old block is trashed. The figure below shows how multiple blocks are mapped to the same line in the cache. This line is the only line that each of these blocks can be sent to. In the case of this figure, there are 8 bits in the block identification portion of the memory address.
Consider a simple example-a 4-kilobyte cache with a line size of 32 bytes direct mapped on virtual addresses. Thus each load/store to cache moves 32 bytes. If one variable of type float takes 4 bytes on our system, each cache line will hold eight (32/4=8) such variables.
http://csciwww.etsu.edu/tarnoff/labs4717/x86_sim/images/direct.gif
The address for this broken down something like the following:
Tag
8 bits identifying line in cache
word id bits
Direct mapping is simple and inexpensive to implement, but if a program accesses 2 blocks that map to the same line repeatedly, the cache begins to thrash back and forth reloading the line over and over again meaning misses are very high.
Fully Associative:
The fully associative cache design solves the potential problem of thrashing with a direct-mapped cache. The replacement policy is no longer a function of the memory address, but considers usage instead. With this design, typically the oldest cache line is evicted from the cache. This policy is called least recently used (LRU). In the previous example, LRU prevents the cache lines of a and b from being moved out prematurely. The downside of a fully associative design is cost. Additional logic is required to track usage of lines. The larger the cache size, the higher the cost. Therefore, it is difficult to scale this technology to very large (data) caches. Luckily, a good alternative exists.
The address is broken into two parts: a tag used to identify which block is stored in which line of the cache (s bits) and a fixed number of LSB bits identifying the word within the blocks.
Tag
word id bits
Set Associative:
Set associative addresses the problem of possible thrashing in the direct mapping method. It does this by saying that instead of having exactly one line that a block can map to in the cache, we will group a few lines together creating a set. Then a block in memory can map to any one of the lines of a specific set. There is still only one set that the block can map to.
Tag
word id bits
Order Now