免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 3317 | 回复: 0
打印 上一主题 下一主题

Niagara signals shift for business computing [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2005-12-12 01:48 |只看该作者 |倒序浏览
I constantly tell myself I’m not nuts. And yes, it’s disconcerting when I answer myself, but when some seemingly inconceivable theory I’ve concocted turns into reality years later, I become grounded again. It took almost 10 years, but Sun Microsystems made the first movements of my most insane vision real.


The vision? That the ideal computer for business apps would be built not with one or two or 32 blazingly fast CPUs in an SMP arrangement, but with a large array of very simple, comparatively slow processors.


By way of illustration, I proposed 64 Zilog Z80 microprocessors. The key to the design would be that memory and I/O buses would match the master CPU clock speed as closely as possible. The ideal implementation -- it seems unreachable, but who knows? -- would create a system in which, under typical load, RAM would operate at the processors’ clock speed. Nirvana would be achieved by synchronising the CPUs like pistons on a crankshaft, with no two attempting to access the same bank of RAM at the same time. With that structure in place, access to external memory would be fast enough to shrink the size of on-chip L2 (Level 2) cache to the point of eventual elimination.


What non-gearhead knows or cares about L2 cache and CPU complexity? I liken it to passenger space. A third to a half of the interior of an AMD Opteron CPU is not available for computing. In a car, L2 cache and deep CPU pipelines are the bucket seats, design contours, armrests, air bags, centre console, cup holders, and over-sized boot. If you were to gut your four-door saloon down to its outer shell and rebuild it with nothing but bench seats, you could carpool seven comfortably, nine if everyone in the cabin practices good hygiene. Everybody gets to work on time, and the energy savings are enormous.


A massively parallel system built with slow processors on a fast bus could carry several throughput-constrained tasks to completion, simultaneously, without most of the round-robin stop and go that slows entry to midlevel SMP systems. Oodles of slow CPUs that never wait for RAM? I want that. Peripherals that work asynchronously, managing queues of requests, and move data directly to and from memory -- I want that, too.


Although my ideal remains distant, I see more than a silhouette in Sun’s Niagara. Instead of a cluster of discrete CPUs, Niagara burns eight SPARC cores onto one chip, each capable of executing four threads simultaneously. In ideal operation -- again, unreachable, but who knows? -- 32 execution engines can all pump at once with very fast pathways to RAM and peripherals. It’s the culmination of years of design around Sun’s “throughput computing,” a brilliant concept hampered by a necessarily unglamorous execution. If the cores run too fast or get too fancy trying to predict what a thread will do next, parallelism suffers, and the ideal is lost.


But Sun’s got the right idea. And if you want confirmation from someone other than yours truly, ask Intel; Niagara foreshadows Intel’s strategy. When Intel reminds us that dual core is just its first take on multi-core, it’s telling us that lots of pokey cores on one CPU may not look all that thrilling on paper, but we’ll lose no ground in net performance with current apps. And as developers get serious about multi-threading apps, the parallelism benefits of such architectures will take off.


An armada of slow CPUs? I knew I wasn’t hallucinating
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP