Andrew

July 9, 2008

The id Tech 4 Script Compiler Sucks Hard!

Filed under: Games,Optimization,Programming — floodyberry @ 3:23 pm
Tags: , ,

Whoever did most of the work on the id Tech 4 Script Compiler, I’m calling you out! I’ll grant that you managed to write a major component of a successful commercial engine, but… it’s just so bad. What confounds me even more is that all of the engines after DooM III did almost nothing (effective) to try and fix it: If you check out the Quake IV, Prey, and ET:QW SDKs, they all have the same basic compiler with a couple things bolted on. The ET:QW guys did do a bit of work on it and tried to speed it up a bit, but “glacially slow” doesn’t seem like much of an improvement on “geologically slow”.

I first noticed how bad it was when I was doing the ET:QW -> Tribes stuff and started playing around with the scripts. Being so use to Tribes style scripting, two things hit me right off:

  1. You have to exit the mission and recompile every script if you want to update a script you just edited. Ok, pretty annoying, but I’m just playing around so it should get easier.
  2. On my AMD64 3200+, recompiling was “damn slow” (~20 seconds) in Release mode and “I’m going to read a book” (~60 seconds) in Debug. Issue #1 just got a lot more annoying.

How did they manage to develop on this for longer than 10 minutes before going crazy, let alone create an entire game? Apparently I was the first to get fed up enough about it, so I went to check out the compiler and see if I could find any hot spots.

Putting Out Dead Fires

What is most depressing about looking at the source is that you can tell someone tried to optimize it, but failed miserably.

  • There are static lists for some elements (strings, functions, globals, statements, objects). idList grows linearly, not exponentially, so I assume someone noticed adding XXXX elements was really slow, knew it must be idList resizing itself far too many times, but didn’t know of an actual solution. A static list also means the code will break if you add too many items.
  • A Hash is used for idVarDefs (a catch all struct to hold scope/info for variables, functions, objects, and constants), but it is based on name only, i.e. every time it looks up a variable named “i”, it gets a linked list of every variable named “i” and has to check every one to see if they match the scope of the “i” you want.

    Even better, the list of constants is kept in the idVarDef list named “<Immediate>”, which means a lookup to see if a constant value exists requires that you iterate over every string constant, numeric constant, stack return constant, etc.

  • Splash Damage did add a Hash for the idTypeDef list in ET:QW, but only for the user defined types i.e. objects. The default types were still checked for with a cascaded if statement.

    They also added blockAllocators for a couple elements, but allocations weren’t the main problem.

Active Volcanos

So what were the (major) problems?

Script_Compiler.cpp:

  • idCompiler::FindImmediate lumped all constants under a single name (<Immediate>), leading to a linear search every time one was looked up
  • idCompiler::CheckType checked for default types (string, boolean, virtual, etc) with a cascaded if statement.
  • idCompiler::GetExpression did a linear search through the opcode list
  • idCompiler::ParseReturnStatement did a linear search through the opcode list to find the “=” opcode

Script_Program.cpp

  • idVarDef was stored in a hash by name only. Each idVarDef is a unique pair of name & scope, so the entire list of idVarDefs would have to be searched for the proper scope member
  • The idVarDefs were also stored in a weird linked list thing that required constant maintenance
  • idProgram::FindType used a linear search even when the hash table was available
  • idProgram::MatchesVirtualFunction used a linear search through the virtual functions

The generous explanation is that nobody got around to storing things in hash tables properly because the idHashTable implementations suck. They’re hardcoded to be key’d on a string so you can’t construct custom keys (such as an object containing a name and a scope for idVarDefs).

What is especially ironic (maybe only to me) is that Carmack ran in to performance problems due to linear searches before with qcc! I have no idea if he contributed at all to the compiler, but it still tickles me.

Postmature Optimization

The main fix for this type of problem should be obvious: more hash tables! It took me around a week and a half or so to figure out what exactly was going on, identify the hotspots, and fix them. This involved lookups for the compiler default types, the opcode table, and the global virtuals table, as well as creating an idScopeName object to key the list of idVarDefs on.

To cope with the bloat the hash tables might introduce, I also used an idStrPool for all the strings inside idProgram.cpp. This had the added bonus of allowing pointer hashing and comparisons on the idPoolStr* since they are guaranteed to be unique.

If you remember that I said the idHashTable implementations were awful, they are! Luckily the Splash Damage guys created sdHashMapGeneric so I didn’t have to make my own or figure out what exactly the weird id stuff is doing (I still don’t understand it).

There were some structural changes that had to be made to accommodate all of this, but nothing earth shattering. I chopped out some useless classes and created new ones such as idTypeDef_Static (idTypeDefs need an idPoolStr* name now, but you can’t statically create one), etc.

Did It Get Faster?

Yes! I don’t really have a wide assortment of test systems, but these are the improvements I saw:

                           Default    Optimized     Speedup
FeaRog's Laptop(Debug):  ~180-300s         ~12s      15-25x
AMD64 3200+(Release):         ~20s          ~1s         20x
AMD64 3200+(Debug):           ~60s          ~4s         15x
E8400@3.6ghz(Release):       ~1.5s        ~0.3s          5x
E8400@3.6ghz(Debug):        ~32.5s        ~2.3s         14x

Memory usage did go up by ~0.5mb, but that’s really nothing when the compiler is using 6-7mb as it is. I also spiffed up idProgram::CompileStats to produce more detailed stats so you can see exactly where the memory is going (and the overhead on the various containers). I would have fixed idList to resize exponentially and gotten rid of the awful .setGranularity calls, but editing anything in idLib always forced nearly the entire solution to recompile so I let it go.

Fin

Download the source (for ET:QW SDK 1.5) and try it out! Simply unzip it to your ET:QW SDK source directory and re-compile, it should work flawlessly unless you’ve made conflicting modifications to the 1.5 SDK.

2 Comments »

  1. maybe they do read a book, you never know.

    Comment by makc3d — February 17, 2009 @ 1:28 am | Reply

  2. […] My goal is more difficult. I want to not only dump the profiling results at any time, I want to be able to reset the results at any time. When I’m profiling games, I often want to profile a specific action or sequence of actions. I don’t want to include the startup time and whatever is required to get to that specific thing I want to profile, e.g. running a timedemo or analyzing a horrible script compiler. […]

    Pingback by High Performance C++ Profiling « Andrew — October 7, 2009 @ 10:40 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a comment

Create a free website or blog at WordPress.com.