|
High-end K6-2 processors and overclocking Written by ludicrous (Aaron Vienot) Contents: Introduction | AMD in Time | The K6-Series | K6-2 Clocking Considerations Processor-Specific Notes | The Test Setup | The Testing Methodology The Testing | In Closing | Further Reading * This article has been updated in June and July of 2001. Introduction The title may predispose the reader to consider this article outdated, even absurd, given that the K6-2 has been relegated to ultra-low-end status by AMD's own Athlon/Duron products. The K6 line's only successors, the K6-2+ and K6-III+ CPUs - both based on the design of the discontinued K6-III but manufactured at 0.18µ - are targeted for mobile applications although they employ a standard Socket 7 interface and some units have found their way into OEM sales. However, a while ago I posted a brief response on the messageboard to a K6-2/500 overclocking question, and was startled to receive four e-mail inquiries regarding the same in the two following days. The questions were fundamentally the same, What were your experiences with the K6-2/500 and what should I try? Apparently, many people still use these but are finding a dearth of material that addresses their hardware. First, the standard disclaimer: The contents herein include information on overclocking the AMD K6-2 processor. AMD does not condone running your processor beyond its rated speed, and doing so voids any processor warranty and possibly the warranty of your entire computer system if such exists. As author, I emphasize that while I have overclocked K6-2 processors, such action can damage your processor, other portions of your computer system, and possibly cause other unspecified harm to person or property. The data herein are given solely for informational purposes and the reader bears full responsibility for any results, including damages whether incidental or consequential, resulting from applying this information. Second, I owe a word of thanks to a few CPU-Central messageboard participants who, in various ways, made this article possible. First I owe thanks to baldy, who this past spring sold me a K6-2/500 processor and Alpha heatsink at a reasonable price; second to LED, who supplied the DIMM presently in my Super 7 system; and third especially to MS, the well-versed proprietor of LostCircuits, who took time out of a busy life to evaluate this article prior to publication. And finally, I owe thanks to webmaster Tony Dao for printing this. This article aims to provide some CPU history, a background on K6-series processors, and then a focus on the higher-speed K6-2 models. This article is NOT a strict performance evaluation of the K6-2/500 processor or the K6-2 series. Hardware sites with far more resources than myself have done that aspect to death long ago and I cannot compete with that, and if I were trying this article would be more than a year too late. The testing in the eighth section is performed for the sake of observing general trends. If you are eager to get straight to the clocking specifics, skip to the middle of third section; but you will miss out on some intriguing information :-) AMD in Time Sunnyvale, California-based AMD has been in business since 1969, and it along with Cyrix and a few other companies hotly contested Intel's market dominance during the era of 386 and 486 processors with x86-compatible products that, in performance, matched or surpassed the Intel offerings. However, Intel almost instantly relegated all competitors to low-end status with the release of the Pentium processor. The Pentium was the first superscalar CPU architecture (meaning it could process multiple instructions simultaneously) and it also offered excellent floating point performance along with several proprietary enhancements. This sent Cyrix into a technology tailspin from which it never fully recovered up through its recent acquisition by chipset maker VIA Technologies, and AMD was likewise set back severely. While Cyrix was attempting to re-gain lost market share with the 6x86, AMD was developing the K5, its first in-house design for the Pentium-class market. The K5 was decent in its own right, but ran hot, under-performed, and came late to the market, thus failing to make a significant place for itself. Like Cyrix's 6x86, though, it did garner lower-end sales by virtue of costing less than Intel's offerings. AMD, realizing they would see very little mileage from the K5, purchased NexGen - a CPU company that designed the Nx586 CPU and had just finished the core design for the next-generation Nx686. The Nx686 core became the heart of the long-lived K6-series processor family, with AMD enhancing it to include Intel's MMX, and later AMD's own 3Dnow! SIMD enhancements to allow for double processing per clock cycle of single-precision FPU operations that make up the majority of 3D triangulations. The K6-series sold well and continues to do so, comprising roughly one-third of AMD's production as of this writing. Meanwhile, AMD developed a spectacular seventh-generation core, the K7, which came to market as the Athlon in late 1999 and began a successful AMD effort to regain market share from Intel. Those wanting more CPU history than what I have provided should read the PC Processors Guide at x86.org. It is an older article and only covers through the Pentium II and Medocino-core Celeron, but is still good reading. The K6-series The K6 series includes the K6-MMX, the K6-2, the K6-III, and most recently the K6-2+ and K6-III+. These span three process levels (0.35µ, 0.25µ, and 0.18µ) and five major core versions. These five include the K6, the K6-2, and CXT-core K6-2, the K6-III, and the K6-x+. The K6 processors were first released at 0.35µ and speed grades of 166, 200, and 233MHz; 233MHz were achieved mostly by raising the core voltage to 3.1v, with the nasty side effect generating excessive heat. Later the core was produced at 0.25µ, allowing for cooler operation, lower core voltages and higher operating frequencies, and released at 233, 266, and 300MHz. The K6-2, which added 3DNow!, came next, and was ultimately offered to the desktop market in speed grades of 266, 300, 333, and 350MHz. The CXT-core K6-2 followed with small enhancements and comprises speed grades of 380, 400, 450, 475, 500, 533, and 550MHz (although some CXT cores were later down-binned to lower grades). The K6-III core added 256kB of on-die Level 2 cache memory. The K6-x+ processors appear very similar to the K6-III, but are manufactured on 0.18µ technology and add AMD's PowerNow! power-saving technology (already integrated in the CXT core but not enabled according to AMD). As we are focusing on higher-end K6-2 processors, it would be helpful to view a table of CXT-core models. Note that this information is available on processor data sheets available on AMD's web site, but in this case I obtained the information from a Tom's Hardware Article which conveniently lists data for most current processors. PROCESSOR | BUS SPEED | MULTIPLIER | CORE V | K6-2/380 AFR | 95 | 4.0 | 2.20 | K6-2/400 AFQ | 100 | 4.0 | 2.20 | K6-2/400 AFR | 100 | 4.0 | 2.20 | K6-2/450 AHX | 100 | 4.5 | 2.40 | K6-2/450 AFX | 100 | 4.5 | 2.20 | K6-2/475 AHX | 95 | 5.0 | 2.40 | K6-2/475 AFX | 95 | 5.0 | 2.20 | K6-2/500 AFX | 100 | 5.0 | 2.20 | K6-2/533 AFX | 97 | 5.5 | 2.20 | K6-2/550 AGR | 100 | 5.5 | 2.30 | A | Socket 7 format | F, G, H | Specified core voltage (2.20v, 2.30v, 2.40v) | Q, R, X | Maximum operating temperature (60, 70, 65oC | Information included in this article may be broadly applicable to the entire line of K6-2 processors, but will specifically refer to the listed CXT-core models. K6-2 clocking considerations Before moving on to the test setup and some charts, we need to cover some important notes that should be considered before clocking is attempted. (a) CPU cooling is critical. Higher-speed K6-2 processors tend to run hot, and anything beyond the most conservative overclock will require better cooling than what a basic Socket 7 fan/heatsink can offer. GlobalWin and Alpha-branded heatsinks are perennial favorites; check websites such as StepThermodynamics, CoolerGuys, and 3DfxCool to view some options. Be prepared to spend $15-30. (b) Case cooling is very important. Heat dissipation occurs most rapidly when there is a large temperature difference between the CPU heatsink and surrounding air. It is pointless to invest in a good CPU cooler and then allow heat to build up in the case. Heat also shortens the life of other components, in particular the video card and hard disk drives, so keep the air well circulated. If your only ventilation fan is that in the power supply, the addition of a second fan is a first priority - many cases are designed to accept a 60mm intake fan at the bottom front. (c) Avoid clocking proprietary machines. Proprietary systems, e.g. Compaq, Gateway, etc., tend to be poor clocking candidates. It sometimes works, but such systems frequently don't offer the necessary adjustments. When they do offer them, stability issues - a common overclocking artifact - can be magnified by the situation that many major components may be integrated on the motherboard. (d) Understand the importance of bus speed. High-end K6-2 performance doesn't scale very well with clock speed; much of the reason is that the Level 2 cache SRAM runs at the motherboard bus speed regardless of the processor clock. General memory bandwidth is also important. This points to an obvious answer - clock the bus speed UPWARD whenever possible. We will come back to this in the testing section. (e) Monitor the CPU temperature. Many motherboards can report the temperature in the BIOS. Some also include interface software for reporting vital statistics in Windows, allowing you to actively monitor the temperature. If your motherboard includes a BIOS readout but no Windows utility, try using Motherboard Monitor. Avoid running a CXT-core K6-2 higher than 45-49C (+/-); keep it cooler if possible. (f) Be aware of the CXT-core multiplier remap. The CXT-core K6-2 processors have a multiplier remap of 2.0x = 6.0x. What this means is that while most Super7 motherboards peak at a 5.5x multiplier, 6.0x is in fact available; just set the motherboard to 2.0x and the CPU will respond with 6.0x. (g) Know your core voltage limitations. A K6-2 can usually be run safely in the range of 2.20-2.60v with adequate cooling. 2.70-2.90v is thin ice, and 3.00v+ is NOT recommended. Some people have run them at 3.00v and higher, but this places extreme stress on the CPU and requires a well-designed cooling solution. Now let us look at some model-specific notes for the CXT core processors. Processor-specific notes This section will examine each CXT speed grade and offer notes on different core revisions. The previous CPU table should be referenced as necessary. Suggested speeds are exactly that; your mileage may vary. K6-2/380: This CPU is designated AFR and requires a 2.20v supply to the core. 400 (100 bus) is an easy target, 428 (95 bus) is possible. A good unit may even attain 448 (112 bus) or 450 (100 bus). K6-2/400: There are two core revisions for this processor, AFQ and AFR. Both require 2.20v to the core, but the AFR is rated to a higher temperature. 500 (100 bus) and 504 (112 bus) are about the highest these can go; a few folks have run them higher, but 448 (112 bus), 450 (100 bus), and 475 (95 bus) are more attainable goals. K6-2/450: There are also two versions of this processor, AHX and AFX. The AHX requires 2.40v; the AFX, a standard 2.20v. An AHX may prove a poor overclocker, as it was released earlier than the AFX and the higher voltage indicates that the core was reaching its limits. Regardless, 475 (95 bus) is an easy goal and 500(100) is always a possibility. The AFX bears the possibility of a 475 (95 bus), 500 (100 bus), 504 (112 bus), and sometimes even higher. K6-2/475: This processor also shares the same two core revisions as the 450, AHX (2.40v) and AFX (2.20v). The same general notations apply; the AHX version probably will not go much beyond stock, while the AFX may more easily hit 500 (100 bus), 504 (112 bus), and possibly higher. K6-2/500: This core comes in the 2.20v AFX flavor only. At roughly this point the series starts to run noticeably on the warm side. 504 (112 bus) is a nice speed for this CPU if it won't clock higher; other possibilities include 533 (97 bus), 550 (100 bus), 560 (112 bus), and 570 (95 bus). A really good CXT core peaks around 600MHz, so 600 (100 bus) and 617 (112 bus) are long shots, and excellent cooling will be required. K6-2/533: This core also comes exclusively in 2.20v AFX form. Its 97MHz bus rating exists because the processor was intended only for OEM applications. Boards not offering 97 bus can be set to 95 for a 525MHz clock. Possibilities include the 550, 560, and 570 options listed for the K6-2/500, although a 533 CPU that simply will not clock (or else overheats) will perform nicely at 504 (112 bus). K6-2/550: This CPU is offered only as the 2.30v AGR. Note, as with the AHX 450 and 475 models, the higher core voltage indicates the CPU series is reaching its limits and overclocking will be limited. Again as with the K6-2/500, 560 and 570 are possibilities; 600 and 617 are long shots. Let us move on to the testing stage and see what kind of trends a K6-2/500 will show. The test setup The system used in testing consists of the following hardware configuration: AMD K6-2/500 AFX Alpha Socket7 fan/heatsink EPoX EP-MVP3C (VIA MVP3 chipset, Award BIOS, 512kB L2 cache, AT form factor with both AT and ATX power supply connectors) 64MB PC-100 single-bank SDRAM DIMM (112MHz capable @ CL2) Diamond Viper V550 AGP (NVidia RivaTNT, 90MHz clock, 16MB SDRAM) Creative Labs Ensoniq AudioPCI sound Maxtor DiamondMax 8.4GB (5400RPM/512kB buffer, primary master drive) AT mid-tower case 300W ATX power supply Basic keyboard and PS/2 two-button mouse This combination is far from stellar and experienced users will quickly see memory, sound, and graphics upgrade possibilities. However, I believe the given hardware is a reasonably "fair" representation of what an average K6-2 user might have. Software and drivers-wise, the following were installed at test time: VIA Technologies 4-in-1 drivers version 4.24a NVidia Detonator 5.22 reference drivers Creative Labs Retail-box CD driver for Ensoniq AudioPCI Microsoft Windows 95 OSR 2.5 (4.00.950C) Microsoft DirectX 7.0a The 8.4GB system hard drive is partitioned into four 2.1GB FAT16 partitions. Drive C: hosts Windows 95 and a 512MB fixed-size swap file. DMA transfers are enabled for the hard drive. The AGP aperture BIOS setting is at the default 64MB. The SDRAM is set to CAS Level 2 in the BIOS. The testing methodology All tests performed are courtesy of SiSoft Sandra 2000 Standard, Distributed.net's RC5 64-bit encryption cracking client, and id Software Quake2 timedemos using 3Fingers' Crusher demo. The RC5 release used here was version v2.8009-460-CTR-00060723; Quake2 was updated to version 3.20 and the AMD 3DNow!-optimized patch was used; timedemos were run using 512 x 384 resolution, the 3DNow!-optimized default OpenGL driver, full screen ON, maximum screen size, highest texture quality, 8-bit textures ON, sound disabled, and sync every frame turned OFF. These benchmark choices deserve further comment. First, Sandra is purely synthetic and attempts to test optimal performance. Actual performance may vary significantly between applications. Sandra primarily yields comparison numbers, with the benefit of being very convenient. RC5 likewise provides nice comparison numbers, and the Crusher demo for Quake2 has traditionally provided a standard for measuring gaming performance, bolstered by highly repeatable results. Quake3: Arena timedemos are NOT being used because the program chokes when only offered 64MB of RAM, adversely affecting the results; also the K6-2 in tandem with an old RivaTNT is not a happy Quake3 combination. The situation would change with a Voodoo3 or TNT2 and another 32MB of RAM. Testing at each speed grade was performed as follows: First, the system was booted, unnecessary background applications were killed via the Task Manager, and Sandra was launched. Two consecutive runs of the CPU benchmark would be made. Next RC5 would be launched, allowed to run a cycle and report its results, then closed. After that Quake2 Crusher timedemos were performed, eight times per speed grade since typically at least four runs were required for the system to stabilize and return consistent numbers. Finally, the system was rebooted, same methodology as before, and Sandra memory benchmarking would be performed, again twice. I should note that a fresh OS install was NOT performed for this test due to time constraints. However, the OS was freshly installed mere weeks ago and has been used little since then, so it should be reasonably "clean". Update: I was remiss in not pointing this out originally, but the reader is advised to note that all graphs that follow in the testing section have been cropped in the x-axis to show realtive differences, and hence do not show actual scale. The testing Since these tests are only intended to show general trends in clock speed, the CPU has been set to every possible speed from 400 to 570MHz. Only the 380 setting was not tested. A table of the clocking speeds follows: CLOCK | FSB | MEMORY | MULTIPLIER | 400 | 66 | 66 | 6.0 | 400 | 100 | 66 | 4.0 | 400 | 100 | 100 | 4.0 | 428 | 95 | 95 | 4.5 | 448 | 112 | 112 | 4.0 | 450 | 100 | 100 | 4.5 | 475 | 95 | 95 | 5.0 | 500 | 100 | 100 | 5.0 | 504 | 112 | 112 | 4.5 | 525 | 95 | 95 | 5.5 | 550 | 100 | 100 | 5.5 | 560 | 112 | 112 | 5.0 | 570 | 95 | 95 | 6.0 | Note that the 400MHz grade was tested thrice, once at 66/66 (FSB/Memory), once at 100/66 via the DIMM=AGP jumper, and once at 100/100. More on this later. The following table summarizes data acquired in all tests. Please be aware that the Quake2 Crusher score for 570MHz is estimated, from an "initial run", as the system was very unstable at that speed and I could not complete eight timedemos. CPU | SANDRA | SANDRA | D-NET | QUAKE2 | SANDRA | SANDRA | FREQ. | ALU | FPU | RC5 RATE | CRUSHER | ALU/MEM. | FPU/MEM. | (MHz) | (MIPS) | (MFLOPS) | (kkeys/sec) | (fps) | (MB/sec) | (MB/sec) | 400 | 829 | 498 | 687.79 | 20.7 | 92 | 93 | 400 | 880 | 499 | 688.31 | 21.6 | 91 | 92 | 400 | 880 | 499 | 688.31 | 24.3 | 126 | 129 | 428 | 927 | 534 | 730.60 | 23.9 | 120 | 122 | 448 | 985 | 559 | 770.96 | 26.3 | 134 | 137 | 450 | 961 | 561 | 774.50 | 25.1 | 126 | 128 | 475 | 1009 | 594 | 819.54 | 25.0 | 121 | 123 | 500 | 1059 | 624 | 860.35 | 26.3 | 127 | 129 | 504 | 1090 | 629 | 867.35 | 27.5 | 134 | 136 | 525 | 1090 | 653 | 900.65 | 25.8 | 121 | 123 | 550 | 1146 | 686 | 946.69 | 27.3 | 127 | 129 | 560 | 1187 | 699 | 961.31 | 28.3 | 134 | 137 | 570 | 1187 | 713 | 984.05 | 26.9 | 122 | 123 | The following graph shows Sandra scores according to clock speed. Bear in mind that the three 400MHz entries are portrayed here, and elsewhere in this section, in the same order that they are listed on the earlier table - 400 (66/66), 400 (100/66), and 400 (100/100). What trends do we see in these? Consider the following: What this chart shows are the scatters of integer (ALU) and FPU performance. Note that the integer performance is dependent on bus speed, whereas the FPU relies solely on the actual clock. Regardless, both the ALU and FPU scaled by identical amounts, 43.2% from the base 400MHz to the peak 570MHz. For now these serve primarily as interesting observations. We will return to them later. Note that for the above chart, as well as several that follow, the two lower 400MHz settings are neglected. RC5 Performance: What this indicates is that RC5 performance is scaling linearly with processor speed. Since RC5 operates primarily ' inside' the processor, this is logical. What happens in situations heavily dependent on resources outside of the processor? For this we refer to the Quake2 Crusher benchmarks: Now things get interesting, and the importance of bus speed shows up in a big way. Remember on the first chart how integer and FPU performance scaled upward with speed? That performance increase is worthless if the system cannot get data back and forth fast enough. In this case, assuming 100MHz clocks are 'normal', we then see spikes at the three speeds tested on 112MHz bus and dips at the four speeds tested on 95MHz bus. Even the 100MHz bus settings tended to outperform faster settings based on 95MHz bus; 400 slightly outperformed 428, 450 just barely crawled past 475, 500 handily surpassed 525, 550 just barely beat out 570. Does this mean that 95MHz bus users are getting a raw deal? Not necessarily, this is just one benchmark at one point in time on one system, with a maximum framerate difference of 4.4 FPS. It does illustrate, though, that bus speed is important. To make that point more clear, let us bring back the two 400MHz settings that I have been ignoring and look at all three of the settings in isolation. First, a look at measured processor performance for each setting: The Integer performance is a bit lower at the 66/66 setting; FPU performance is lower by only one point. The other two settings yield identical processor benchmarks. However, now that the discrepancy in bus speed has become large enough, RC5 begins to show some differences, even between the 100/66 and the 100/100 settings: Update: In preparing my data for the K6-3+ overclocking article, I discovered that in multiple runs RC5 results can vary by several tenths at any given speed, while here I only reported first-run results. However, after re-testing the K6-2 on a different platform, I found that while the average may vary, the spread of numbers shown in the above graph is still a fair representatation. The most convincing results, however, come from the Quake2 benchmark. Once again the benefits of bus speed come through. The system performs with comparative sluggishness at the 66/66 setting. The 100/66 setting, which allows the FSB to run at 100MHz while the memory bus stays at 66MHz (available on some boards and intended as a performance improvement for users who are running PC-66 DIMMs on a 100MHz system), shows a frame-rate increase of 4.3% over 66/66. However, at 100/100 we get a very nice jump of 12.5% over the 100/66 setting and a full 17.4% over the 66/66 setting. Considering that all three settings employ a 400MHz processor clock with a similar theoretical performance at each setting, a Crusher frame rate difference of 3.6 FPS should be adequate proof that bus speed is a very important factor in system performance. To get a better feel for why this is, consider the average bandwidth measured for each of the four bus speeds utilized in our testing: Not surprisingly 66MHz falls very low while 112MHz rounds off very high. Bandwidth is by analogy a shipping conveyor; a faster conveyor can move more product in the same amount of time. It does not matter how fast the system processor can handle tasks if it cannot communicate with the memory and other system components at a high enough rate. Does this mean 66MHz bus should be avoided in favor of higher speeds? Yes, if at all possible. If your situation is such that you MUST work with PC-66 DIMMs, by all means use a split-setting for different FSB/Memory speeds if one is available. It is not the ideal solution, but it helps. Should 95MHz bus be ignored entirely? Not necessarily; if your system does not offer 112MHz bus OR your PC-100 memory will not run stable at that speed, there may be situations where a 95MHz-bus-based overclock will still give you a speed boost over a nearby 100MHz-bus-based setting. To quote a colloquialism, The proof is in the pudding; test your system with benchmarks that approximate your 'typical' use of the machine (gaming, office applications, etc.) and proceed from there. Usually, though, if you can get a nice 112MHz-bus-based overclock it will be a preferable setting. In Closing Overclocking remains a chief entertainment for many, and while I currently play with an Athlon-based machine I certainly had my fun with the K6-series. With a modest cooling investment and the patience to tweak and test, most K6-2 users will be able to increase the performance of their system. If you have any further questions, feel free to drop by the messageboard where a number of folks who enjoy overclocking will be happy to offer suggestions. For any questions, comments, or concerns specific to the article, please drop me an email correspondence. Further Reading This section includes articles referenced in the text, and other relevent sites: AMD -- The place to find technical documents and product information for AMD's K6-2 processor. DDJ Microprocessor Resource Center (x86.org) -- This site contains a number of processor resources including Robert Collins' PC Processors Guide. LostCircuits -- The LostCircuits website is the handywork of MS and Maus, with numerous articles written by MS and other LC community members. In particular, page 4 of the Shuttle HOT597 review contains interesting bus speed benchmarks, page 5 of the FIC PA-2013 review thoroughly demonstrates the performance difference of DIMM=CPU vs. DIMM=AGP, and for the more technical minded page 4 of the AMD K6-2/400 review shows the performance advantage of the write-combining buffers that were added as part of the CXT core. More AMD processor reviews can be found in the LostCircuits CPU archive. Tom's Hardware -- The processor tables article that I referred to earlier can be found here. Copyright Notice. All original information and presentation thereof - as well as the presentation of non-original information - contained herin is (c) September 2000 and July 2001 by Aaron Vienot and may not be reprinted without the express permission of the same except in the form of brief quotations for citation purposes, news briefings, or reviews of this article. AMD, 3DNow!, Athlon, Duron, K6, K6-2, K6-III, K6-2+, and K6-III+ are trademarks and/or product names belonging to Advanced Micro Devices. MMX is a trademark of Intel Corp. Cyrix is a trademark of VIA Technologies. All other names and trademarks are the property of their respective owners. CPU-Central has been granted permission to freely reprint this article on the CPU-Central website.
|
|