How Memory Works

SDRAM

SDRAM has a unique (more spin of the 1990's) way of measuring speed. 15nS SDRAM is the length of time for of a complete clock signal (66Mhz). The speed no longer refers directly to the access time. It takes one clock cycle to access a memory location in burst mode (in addition to the set up instructions you must send to the SIMM) so a 15nS SDRAM is roughly equivalent to or slower than a 60nS DRAM that is interleaved (see more about interleaving below) as with most standard DRAM controllers. (Yeah, the marketing people got a hold of the SDRAM specifications.)

A SDRAM has most of the memory controller circuitry on chip, so the system has to talk to it just as it would a memory controller. This helps clean up timing problems that we saw in the past. If you are just going out to get one memory word it can be slower than comparable DRAM. This is because there is two dead cycles every time you setup to get data from the memory. This setup time is needed to initiate all memory access cycles no matter what the size being transferred. Thus a 1, 2, 8, or 512 bytes transfer all have the same 3 cycle overhead to setup the SDRAM whether in burst mode or not. The popular bust-mode-page-sizes are 512 and 256 (pretty small) so there is additional set up time between pages. The small page size may have to do with keeping the capacitance of memory lines down so to squeak out more speed in future versions.

Interleaving

Interleaving is a scheme where the system gets its memory accesses with two or more banks of memory. While in the second half of reading the first bank the second bank is already in the first half of it's read cycle. Thus, we have odd and even banks of memory. This cuts the access time in halve IF the memory accesses are sequential as in a burst mode request. If you are just going for a single word you still have to wait for the full access time or more (more due to setup time and the fact that you may have to wait for the right bank of memory to be available). Double data out SDRAM does interleaving on chip.

Now, the sharp eye might notice that the speed of the actual DRAM cell hasn't really changed that much of late. The true speed of a random word access may actually be slower with the new systems. These limitations are really the reason that the fast new processors don't really seem that much faster, unless you are running code that always hits inside the on processor chip cache, and there is no cache thrashing. (Cache thrashing happens when the cache if filled with one block of memory only to be re written with another, back and forth.) One more thing that makes cache trashing happen is multitasking. If we switch back and forth between programs, we guarantee cache misses once we get enough tasks running. The work around for this is to make each task time slot long enough so that the cache filling is not a significant time requirement.

Caching and the future

Caching schemes usually grab the whole page when there is a cache miss (where the memory byte(s) needed are not in cache). Thus we no longer have Random Access Memory (RAM) in operation in the new computers; we end up getting a whole page into cache every time we get any memory. All the new computer specifications are for sequential memory operations; more of a serial memory interface in a way than what is traditionally thought of as RAM. (Reminds me of the very old drum memory)

These trends are even more evident with the new RAMBUS standard. RAMBUS gets its speed increase in two ways:

Voltage level The voltage level or swing is much lower 0-3.3 Volts on Sdram and 0-2.2 Volts on Rambus. It takes less time to charge( or discharge) the DRAM cell data storage capacitor to 2.2V than 3.3V
Proprietary Caching Scheme RAMBUS uses a proprietary caching scheme that allows it to anticipate what memory location(s) are going to be needed next. This does not always work and the software and hardware must be adjusted for it to be a real increase in speed. It is still a workaround tying to avoid the speed limits of the DRAM cell. RAMBUS comes with still more overhead operations to set up a memory to cache transfer than SDRAM.

RAMBUS is an example of how Interleaving can be carried on to more levels. 4 banks of interleaved memory, instead of 2 will once again double the speed of a sequential memory access and there is no reason not to go up to 8,16, or even 32 banks.

One limiting factor is the memory cell where stray capacitance in data lines slows the bus down - so look for the trend to a system module where the RAM die and processor are paced in a cartridge as is the Pentium II is with it's cache memory.

For users who want more and more speed, it turns out that the one thing that can make a huge difference is the software machine code itself. Newer compilers need to be more memory system aware. The trend to huge sloppy 'C' programs has made this a bigger problem than it needs to be. "C" programs are 4 to 10 times larger and slower than a hand crafted and packed assembly code program. I know of one outfit, that is streaming video off of hard-drives and out as HDTV at 80MB/S with an ordinary computer. They do this by careful pairing of instructions and hand packing of assembler code. If Windows was similarly crafted, the hardware of today would be much more than we need for most desktop applications. With the perishable nature of software today I doubt if we will see such a trend; the best we can hope for is the development of sophisticated compilers that take much more of this into account.

Other Helpful Hints

Do not mix speeds when you don't have to. Be sure that all your SIMMs are the same speed within 10 ns. If you have SIMMs that test out at 80ns and one that test at 100ns, try replacing it with a 80ns. Use the slower SIMMs in a slower computer.
Let memory testers run at least 3 or more Passes. This allows the RAM to warm-up internally and find possible refresh problems. Better yet, warm them up to the specified hi temperature before testing.
The speed of a SIMM is determined by the slowest chip on the strip. If all of the chips on a strip pass at 80ns but one only passes at 95ns, then the strip has an access time of 100ns not 80ns or 90ns.

About the author: Dr. Ah Clem lives in the subterranean labs at Transtronics.

Top Page

wiki Index

Disclaimer

This information may have errors; It is not permissible to be read by anyone who has ever met a lawyer.
Use is confined to Engineers with more than 370 course hours of electronic engineering for theoretical studies.
ph +1(785) 841-3089

Email inform@xtronics.com