Friday, October 19, 2007

Behind Reading Papers

I promised to read several papers and make comments here (and in person) for a couple people. I am waaay behind on this. Seth, David, this is a public apology. I am going to point my readers to their work, however, as I try to cram into my puny "Tween" brain the works of people that are quite bright.

Seth is a CS professor at CMU. The papers I have been reading are about the Claytronics project. I am about 50% done, but I am still haven't touched the most recent of them, so any opinions I have so far are not chronologically up to date as far as their research goes. However, one item that keeps bouncing around my brain is that the programming model that they are going to face reminds me - though it is not a directly applicable one - to what we are starting to think we are going to face in HPC.

We are starting to look at future where the chips that are being made are going to have many cores. Well, duh, many of you say, we're going multicore already. Well, yes, but how many cores do most chips have these days? 4? 8? bah. That's nada. We're discussing in our HPC discussions about something on the order of 1024 cores per chip. 1024. Perhaps as many as 16,384. They're simplified, not as extensive as far as instructions as the current chip cores. Then if you consider we're not giving up the paradigm of massive parallel CPU's either. Now consider the complexity of trying to efficiently use all those cores of a single chip and the massively parallel nature of HPC codes (some, but not many, scaling now to 8,000 CPUs or more): that would possibly be as many as 131,072,000 cores you'd have to code for.

If you state, "well, the compiler will..." I'll start giggling, snorking, and probably repeat the grape juice out the nose incident of 4th grade right after moving to Los Alamos. During the 1990s there was a lot of the "We'll hand off responsibility for optimizing X to the compiler because we need very good coders to handle X" and this, frankly, failed more often than not. Magitech compilers are not here. There are improvements, to be fair. Compilers have truly come a long way. Yet, they still have serious issues optimizing especially on HPC platforms and we are talking 16k CPU/cores, maybe, if you have a Blue Spleen, ahem, Gene a lot more but exceedingly few codes scale to that level. I know of, honestly, only one and that was a hero effort to get it to use all of LLNL's BG. Even so, that's a "mere" 130 odd kilocores, not 131 odd megacores. Truthfully, we don't even have a functional model on how to code for such a beast, but, honestly, we have some viewgraphs and a few ideas.

So the first question after reading the above is "Why do that if its going to be such a pain?" The reason is that you can reduce the power requirements for CPUs by doing so. VASTLY. People mumble and babble about the Coming Singularity and the ever increasing amount of 'matter devoted to computing,' but they sorely neglect the problem of how you can get the energy for that. LLNL is going to put in, for their near term HPC center, more electrical power than what is used by the entire East SF Bay city utilities combined. This is supporting their near term petaflop systems. Now think about that. What happens we are talking about exaflop systems? uh huh. Dedicated powerplants? Sorry, I don't think so! So, we're looking for technologies that reduce our energy expenditures. I was just in a vendor brief this week about the amount of electricity consumed by this vendors next set of chips: it's NDA, so I can't reveal very much, but let's say even the vendor is distressed and is looking for alternate technologies for the future. Can't say more, but I think you can understand that: vendors and centers are now looking at flops/watt as VERY important measures for HPC tech. if we don't find a way around this, we're going to see a top out at most of a world HPC community tapping out at an exaflop or so, but only having a very small handful of centers to do this.

Now be warned, when I first came back to HPC in 2001 after doing my stint playing with mongo death rays the 'in thing' were PIMs: processors in memory. The idea was that you stop separating CPUs from memory in the silicon because the CPU often sits ideal because the memory subsystems are much too slow compared to the CPUs. However, very little of that research effort at here, Stanford, and elsewhere came to, well, anything other than white papers and simulations. Very little silicon was even, erm, bent? Almost nothing went into a commercial product. It was, probably still is a good idea, yet it remains only of academic interest and one that's apparently passe. Massively manycored CPUs may go the same way.

Now that I have had to digress (or wanted to too much) about manycore tech, how does it apply whatsoever to Seth's Claytronics project? Consider that each of their catoms is going to be vaguely similar to a core in that gobsmacking number in that theoretical HPC platform I outlined above. Both would need to have simple, limited instructions. Both would have the need for some very involved, and possibly similar algorithms to make them work. The language and compilers are going to have some common themes. At least from the 10k ft level. At least so far. I'll see more as I read.

Switching topics, David is someone that I met via my wife. David works for Sun. His wife met my wife at a party we went to at a mutual friend's. They came over for dinner and we all started talking. David's Russian like his wife Sasha. Lyuda and Sasha ended up talking in Russian a lot to each and David and I talked shop: he was part of Sun HPCS team. We talked politics of HPC and it shook him up a bit. I am really surprised by that, but anyhow. He pointed me to his papers, here, that he's worked on. I'm just starting with them, but I thought some of you would find them interesting. 17 patents. hrmph. Wish I could say anything like that. Ah well.

No comments: