When Good AI Goes Bad: Tools, Techniques, and Strategies for Testing and Debugging AI

Phil, John, Alex

A panel chaired by Alex J. Champandard, with Phil Carlisle and John Abercrombie all describing how to debug – early, often and with the right set of tools, from basic logs to more complex options, such as automated testing. Examples were worked through for Worms and Bioshock, and other games were described besides. Good information for developer and QA testing.


Alex – The early stages, how to start the project?

Phil – Get everyone on early, especially the QA so they know what the features they will be testing.
John – Start on day 1 listing things, and making tools.

Alex – Starting in the design phase, like the Big Daddies and Little Sisters, how do you think of debugging tools?

John – Wasn’t thinking of testing and debugging and just getting it working to start with, had to go back and write documentation and tools to debug it. Had to ask QA to see what they thought would happen. Having that shared understanding is highly important for debugging the AI.

Alex – We’ve not got to the maturity in the design level, Phil?

Phil – If we’re doing iterative design, we have to do iterative testing too. Need the designers to have testing included in the documentation.
John – And that’s the difficult thing with iterative design since the documentation can’t keep up so it is sometimes better to wait until a feature is made to be complete then documenting it for testing.
Phil – With 140 weapons we just threw things in there. It is something of a responsibility to tell people what should happen.

Alex – What about merging of teams?

Phil – Having QA integrated was a huge fight when trying to get them into the same room as the designers in some capacity.
John – Agreed, need to get QA on the team.
Phil – Throwing it to a publisher gets bugs back you just can’t comprehend.
John – Get a few internal testers in that situation if possible, it is very important.

Alex – How about the Matrix idea from Crysis designer?

Phil – It’s a good idea to have a list of matrix entries which are updated when a designer changes something or an engineer added something.
John – It’s a living document as you make your game that gets updated.

Alex – What about the high level marketing and production such as stating the amount of lines of code?

Phil – I bet producers really like matrix boxes to have checklists to tick off.

Alex – What about test plans, do you use them?
Both – Yes

Alex – When do you use them?

John – Too late, need to get them done early.
Phil – Used to get 500 page design documents then throw them away.
John – With more then 5 people then you have too many people contributing.
Phil – Works better in scrum.

Alex – Scaling up is a problem. How big was your team Phil?
Phil – 4 people in the first game, then it got much larger in later projects.
Alex – How did you manage the scaling up?
Phil – Painfully. Lots of mistakes, just had to work through them.

Alex – adding scripting languages to a game, the tools are far behind. John?
John – use a lot of logs to start with. Somewhat of an art to use it. If you don’t use a debugger you are in a dangerous problem with the code.
Phil – You are winning a lot with scripting since you can open the code to people so they can work on it.

Alex – It takes a long time to build up a scripting language to a usable point to get that level.
Phil – There is some kind of binary switch between designers who can script and those that can’t. You don’t want to be giving designers who don’t know what they are doing access, like in games like Call of Duty.
John – you give designers enough rope but not enough to hang themselves.

Alex – on Call of Duty the scripting bar is very high with so many lines of code.
Phil – Had a scripting language wrote without documentation written by one person on the team and designers actually did get stuff done with it.
John – With BioShock there was a scripting set with little documentation or consistency. It would have been better off with standards and documentation.

Alex- Have you worked writing tools for verifying scripts?
Phil – We had a LUA debugger built into the game. I think building these solid debugging tools for the scripting to use because we use them ourselves.
John – Documentation thing there too. We have the programmers review the changes and check what is done (and vice versa with designers seeing the programmers code) rather then have the designers just go off and only come back with problems.

Alex – So how good is building tools?
Phil – Building a tool is very worthwhile. Building lots of tools for other people to use are very productive, such as getting a behaviour tree editor made.

Alex – Yes, the tree view tool is very common and allows you to see the designers changes to the tree.
John – You also allow designers to see the behaviour trees even if they can’t edit them.
Alex – Gets everyone on the same page.

Alex – On the visual editing, is it a possibility to have in game editors like the Unreal Engine has for shaders?
John – For scripting I’m not sure. It is sometimes better to have a visual editor for it. It is a lot easier to do prototypes – but it makes it more difficult from the programmer side. Debugging through the visual representation is difficult.

Alex – You get a clash between the C++ debugger and script debuggers.
John – you tend to learn where to add the breakpoints into the script debuggers.

Alex – On Automated testing, you did a lot of this phil?
Phil – Yes, we made an effort to have a way to play the game with no graphics but just the gameplay and AI – no platform dependancy. Testing a huge amount of game testing using AI vs AI code. Testing the AI, but also the landscape destruction and so forth. Headless version of the game that can churn games is very useful.
John – On Bioshock, a little bit more low tech then automated, but can run warmaps – every AI hates every other AI and getting tons of the things shown – the player jumps around and it checks for crashes.
Phil – Substituting AI for testers, good value I think.

Bioshock Debug Log (he luckily could filter it)

Alex – Debugging logs (example shown from Bioshock). John?
John – You can log specific areas and get logs just from those sections. The example shows everything cranked up to 11, where you can see everything that is going on.

Alex – It seems the first place you go is the logs, Phil?
Phil – I think logs are fair enough, but I think there are better ways to check for the problems then using the logs. Hard to pick out an individual element in logs relative to a crash.
John – What whenever got done was a Csharp tool to filter logs by objects.

Alex – At a entity logging is good. John, you start checking logs with CTRL + F right?
John – Yes, just about, which is why that tool would have been good. Would have been easier with that kind of tool.

Alex – how do yo start debugging?
John – Can start with logs if included, or check the screenshot provided if any, or get the tester to show at my desk.

Alex – What about asserts?
John – Assets if it isn’t fatal isn’t as good as providing a warning. A content error for AI really, it won’t be fatal really.
Phil – The problem with asserts in general, if you assert then continue you get into a pattern of ignoring the asserts since it doesn’t stop the game.
John – That is a huge problem since designers just click through. Hopefully your bug system can track it and so there is feedback even if it is being skipped over.

Alex – There is a broken window theory with asserts. There are some teams who change the levels of alerts over time – just to get them visible.
Phil – Shoving numbers in asserts is terrible, need some descriptive text.
John – Putting someones name is a good joke, but I get my ass kicked for it.

Bioshock throwing visual debug

Alex – Not all AI bugs are crash bugs. How do we fix behaviour bugs? What in game controls do you have for checking?
Phil – We had all sorts of visualisations. The AI checking the direction and shots of weapons can be visualised so you can check to see why the AI is deciding to do something.
John – Debugging in Bioshock visually, there is some lines connecting the Big Daddy and Little Sister, with the pathfinding and some animation information displayed. View of locomotion and lines, debug cylinders around a guy, or projected weapon tests and geometry testing. Making sure the AI is testing the grenade is the same as the game’s throw. There was also the ability to jump and watch a player – can also pause the game, or move the player relative to the camera.
Phil – Being able to pause the AI is very useful.
John – Especially for screenshots 🙂

Phil – We implemented views as the features got added.
John – You also want to see multiple frames and see it play out.

Alex – It takes time to build them, do you choose your battles?
John – Certainly. There are expectations on the designers to see some debugging – such as view cones. Doing debugging to prove a problem is solved, especially visual things like animation is a good idea too.

Alex – On tuning the game…
Phil – Checking why the AI doesn’t choose a weapon and having that hard data, and capturing the data is a good thing to do to check for data (if a weapon is too bad, can’t be selected or the AI is broken).
John – you have to design for it.
Phil – Yes, you need to design it into the code early on. You can build a lot of the systems in to be automatic. Access functions for variables and so forth.

Alex – On Build and deployment. How do you make it easier to identify as you’re building?
John – Had feedback from the editor from the content adders, so the warnings are brought back to the programmers and the errors are fixed by them.
Phil – You build some verification tools before building. Some automatic test runs on the overnight builds too. CPU time is essentially free, making automated test systems is a good engineering solution. I really buy into the journaling – I try and architect things.

Alex – Goes nicely into QA tools and finding bugs?
John – If it wasn’t a crash bug, a log, screenshot and save game. For save games, the AI might load after the passes of graphics losing the AI bug if it skips a frame. Pausing the game then saving so it can be unpaused when the shaders are loaded can allow the AI bugs to be captured.
Phil – To get around that problem we used timers and events, and so there can be a replay file.

Q – Do you suppose test driven development can help? – Alex – For the unit testing it provides good code modularity but doesn’t catch a lot of bugs. You can’t do automatic subjective bugs and needs a lot of setup to provide test cases for this.
Phil – I buy into the unit test idea.
John – Because the AI is being unit tested, it allows bugs related to AI since it is so into every part of the game can find a bug immediately rather then right away.

Q – Suggestion – format the log so it can be parsed by Excel so things can be filtered and sorted in columns.

Q – We have emails being sent by exception handlers so the designers and the entire company sends bug reports. Is greatly helpful, can even have specific handlers having AI asserts going to specific people.
John – so you get the AI asserts?
Q asker – There is a small config file to use for that for people to add their name and email next to a certain type of event.
Phil – You can post bugs by asserts directly into the bug system by recording the call stack.

Q – What about documentation of things?
Phil – Having no documentation for small teams of 3-5 guys, have daily meetings and perhaps just a small amount of general documentation.
John – Documentation is proportional to distance – across countries. So for the AI Strike team in the same room
Q – What about this visual tool – do you think it removes the need for some documentation?
Phil – If you’re in a small enough team you’re working on the same problem generally. Small team in the same office is important.
Alex – Depends on the visual tool as well.

Q – Logging the data directly to a SQL database with a huge history which works really well.
Phil – Like for MMO’s which need to log online data which gets dumped offline too.

Q – Similar to that question, you can dump to trees of logs which help work.

Leave a Reply

Your email address will not be published. Required fields are marked *

Website and journal of Andrew Armstrong