Saturday, February 13, 2010

Write the inner loop first

This seems so obvious now, I feel stupid saying it:

If you're profiling a possible solution, write the inner loop first. If you can't get the inner loop to perform, you've saved yourself a lot of work.

I was enamoured with the idea of using this method to do antialiased lines. Basically, you use four edge functions (max[d1..d4]) and lookup your distance into a table, which gives you your coverage per pixel. It's beautiful because its an implicit function, and it looks as though it maps well onto SSE2 instructions. Should be no branches, the vector width is a perfect match, etc.
Alas, the inner loop winds up being a lot more instructions than I expected. Maybe I just didn't try hard enough at the assembly, but the principle holds. My biggest mistake here though, was that the inner loop was the last thing I wrote. Before I'd written the inner loop, I wrote the entire algorithm setup in assembly, only to find out at the very end that I couldn't manage to get the inner loop fast enough.

Wednesday, December 2, 2009

When in doubt, use 0 instead of -1 for null

I use 16 bits in Albion to represent linetypes, which are looked up in a table. Since the dawn of time, I've had one special value for the 'continuous' linetype, which is -1. While unable to sleep in the summer heat and mosquitoes, I've realized that it would be perfect if I could repurpose the top 2 bits to encode the line cap style (butt, square, round, ?). But there is just no clean way to do this and still have -1 represent the 'continuous', aka default linetype.
I think the reason this problem exists is because in C, -1 is just two characters to type (and read), and so it is a concise way to represent null. If we were writing integer constants in binary, we wouldn't do this. The bits are likely to be useful in a way that is aligned to their positions, if only because a CPU is just so good at that, and the concept is so universal and easy to implement and understand.
So, long story short: All things being equal, use 0 for null, and not -1. From the point of view that an on bit represents information, zero is better aligned to 'nothing'.

Saturday, November 14, 2009

8 cores, 8 times as slow

We all know that most workloads don't scale linearly as you add more cores. But you know you've really screwed something up when your times go UP as you add more cores!

This is how I managed to achieve this:
In Albion's new renderer, I split the view up into tiles of say 256x256 pixels, and render the tiles on separate threads - regular graphics stuff. When all is going perfectly, the threads don't need to communicate with each other at all, and you get pretty much linear speedup. But one of the times when threads do need to communicate is when they're using shared resources - and fonts are one of those.
It's simplest for me just to have one font cache for the entire process, but obviously you need to synchronize access to this cache. When I originally created this font cache, I wasn't really thinking of synchronization, and when it came time to make it multithreaded, I just added a huge lock around every entry point, and thought I'll make those finer when I need to. Yesterday I definitely needed to.
In the particular scene I was looking at, I was zoomed in close, so the spatial culling was basically making all tiles render the same objects. As it turns out, I had 8 threads doing something approaching 8 times as much work, but unable to run in parallel. I don't think it gets worse than that!
This was really easy to find - you just hit pause on the debugger while it's running, and you see all your threads stalled at the same wait point.



Tuesday, April 14, 2009

WPF rendering on Vista is ugly and blurry

UPDATE: I discovered the problem. It was the NVidia drivers for my 8600 GT. Previous drivers were 78.13 (7813). New drivers are 82.50 (8250). Running Vista x64 with .NET 3.5 SP1

On my Vista machine, WPF rendering is nasty. I don't know what the engine is doing. My DPI setting is default (96 DPI). Aero is turned on.


On Vista, witness the nastiness:




On XP, the kind of quality we've come to expect:





I can't puzzle it out. Incredibly odd that I can't find any mention of this on the WWW.

Thursday, October 2, 2008

VC 2008 needs /Ox

Some of my database test functions picked up that a field read had gone from about 500 clocks to 1700 clocks, when upgrading from VS 2005 to VS 2008. I don't have time to investigate exactly what causes this, but changing the optimization setting from /O2 to /Ox fixes it (and actually makes 2008's code a bit faster than 2005 if I'm not mistaken).

Bottom line: Use /Ox for all release builds. The executable also seems to be smaller. Maybe it's the omission of the frame pointers.. I don't know.

Friday, September 5, 2008

Shameless plug

This is a shameless plug for the GF's artwork. I slapped the website together. Apparently if I say the words Gina Heyer it works better. viva la vida.

Friday, April 18, 2008

A nickel for every for loop

I was reading this short essay by Mike Vanier. At the end he says "if I had a nickel for every time I've written "for (i = 0; i != N; i++)" in C I'd be a millionaire". Hmm.. it looks like some people make a lot more off their for loops. The word 'for' appears 13527 times in the Quake3 source. At one nickel for every for loop, that would earn JC a cool $676.35. woot! He can finally afford that Voodoo 6.