I think he’s referring to Bulldozer’s unique approach to SMT. Sadly empirical evidence shows that the approach in Bulldozer isn’t anywhere near as efficient as traditional designs on a per core (module in Bulldozer parlance) basis. If that’s bugginess is the code or a fundamental flaw, it’s not for me to say, but what we know is this:
Bulldozer’s performance quite a mixed bag. The 4-module, 8-thread 3.6GHz FX-8150 (“Zambezi”) generally leads the 3.3GHz 6-core, 6-thread Phenom II X6 1100T (“Thuban”) K10—but it generally trails the 3.3GHz 4-core, 4-thread Core i5-2500 (“Sandy Bridge”).
Certain tests pick up specific differences. For example, Anandtech’s N-Queens test—though of little value as a general benchmark—is extremely branch-heavy and will stress branch predictors and highlight penalties imposed by long pipelines. Bulldozer shows a significant reduction in single-threaded performance relative to the Phenom II. So great is the drop that even when run in multithreaded mode, the eight concurrent threads on Bulldozer can’t keep up with the Phenom II’s six threads, or even the Intel chip’s four.
Moving on to more realistic tests, we see something similar in floating point-intensive multithreaded Cinebench rendering, run by both Anandtech and Tech Report. Bulldozer’s single-threaded score trails Phenom II’s slightly, and is at a huge disadvantage relative to Intel’s processors. The eight threads make up for the discrepancy this time, as Bulldozer pulls ahead of Thuban in the multithreaded benchmark.
These benchmarks in many ways show the two extremes of Bulldozer’s performance. In single-threaded workloads, it struggles to keep pace with either its predecessor or Intel’s competitor. But when all eight threads run simultaneously, Bulldozer can more than make up for this weaker single-threaded performance. Games tend to lie between the two extremes—they spawn some threads, but rarely as many as eight—with Bulldozer sometimes beating the Phenom II (for example, in Anandtech’s Civilization V test or Tech Report’s F1 2010 benchmark), other times falling behind (as seen in Anandtech’s tests of Dawn of War II and Crysis: Warhead).
These are bad results for AMD. The FX-8150 is more expensive to buy than the Phenom II X6 1100T, yet in typical desktop workloads its performance is no better, and sometimes even worse. The scores are not even altogether surprising: the limited per-thread execution resources, longer pipelines, and slower memory subsystem made inferior performance in these workloads almost inevitable.
The newer Trinity line has improved things somewhat, but even he top of the line trinity only makes 6th place on PCMarks 7 when compared to the competing i7s and i5s scoring around 3325 compared to the “king of the hill” i7-3720s 6695. In the computation sub-mark test, it really, really gets trounced. The i7-3720s score is 27261 while the Trinity comes in at 4323.
As an Austinite I love AMD and historically I’ve owned dozens of them, but they just aren’t up to the muscle of the Intels and muscle is what you’re after for CPU rendering. Keep in mind, none of this means that the AMDs are bad. In fact, they’re still quite solid. For what you pay for them they are quite efficient on a cost-to-performance ratio. The point of this was only that if ultimate performance is the goal and money is no object then Intel currently holds the crown.
For what it’s worth DS will see all 12 of the cores on my system (24 if you include hyperthreaded cores).