Saturday, November 14, 2009

8 cores, 8 times as slow

We all know that most workloads don't scale linearly as you add more cores. But you know you've really screwed something up when your times go UP as you add more cores!

This is how I managed to achieve this:
In Albion's new renderer, I split the view up into tiles of say 256x256 pixels, and render the tiles on separate threads - regular graphics stuff. When all is going perfectly, the threads don't need to communicate with each other at all, and you get pretty much linear speedup. But one of the times when threads do need to communicate is when they're using shared resources - and fonts are one of those.
It's simplest for me just to have one font cache for the entire process, but obviously you need to synchronize access to this cache. When I originally created this font cache, I wasn't really thinking of synchronization, and when it came time to make it multithreaded, I just added a huge lock around every entry point, and thought I'll make those finer when I need to. Yesterday I definitely needed to.
In the particular scene I was looking at, I was zoomed in close, so the spatial culling was basically making all tiles render the same objects. As it turns out, I had 8 threads doing something approaching 8 times as much work, but unable to run in parallel. I don't think it gets worse than that!
This was really easy to find - you just hit pause on the debugger while it's running, and you see all your threads stalled at the same wait point.