July 22, 2014

How I deal with data, part one

A Maze of Stupid Little Languages, All Alike

Advance notice: Nothing here is terribly new or revolutionary. Someone out there may find it useful. This is the sort of stuff that often comes up in game development but that people rarely write about, at least, in my experience.

I'm in the process of designing and building a game with attention-management, simulation, and Sim elements. In it, the player will directly and indirectly affect the happiness of Little Computer People through decisions made in the attention management part of the game, backed by a richer simulation of an audience for that performance. That's all kind of hopelessly abstract, I realize, but it's kind of the level at which I'm comfortable discussing it in a public Internet forum at the moment. (I do discuss more specifics in person with people. So ask me when you see me.)

Lately¹ I've been taking my ugly, rough, hard-coded test data prototype to a series of data files that can help me more fully flesh out the game play and create content that is more reflective of how I expect the final game to actually work and feel, both in terms of quantity and quality. My intent is to have a "campaign" of fully authored data with a procedural approach for people who want to continue playing beyond that, and these articles address the hand-authored content specifically, though I expect the description of procedural stuff to follow along similar lines. After I feel like I have the data thing nailed down enough to build, balance, and play a few (again, super-ugly) levels that have the right feedback loops to the player, I'll be looking into the actual visuals that will support the final product (characters, environment, etc).

Anyway, because I'm a programmer, I tend to go straight to text for data. It's easy to manipulate (by hand if need be, which is where I'm starting from right now) and easy to read. In so doing, I tend to come up with very simple little context-specific languages to deal with my data. But for performance and sanity reasons, I work with binary when I'm actually loading this stuff into my game², mostly because I hate parsing stuff in C++. Granted, I could use a well-known format like JSON or what-have-you, but I haven't at that point really solved anything, I've just offloaded parsing the text into a library (with its own run-time performance concerns, usually having to do with memory allocation), and then I need to walk the built-up data structure to look for stuff, and handle failure conditions. Bleh. Instead, I prefer to build a custom format that can be directly mapped simply to data structures in memory, allowing for a single allocation to read in the file once, a little walking over that data to map structure pointers (typically by converting offsets to pointers). The data can be then deleted in a single deallocation later and the structure cleared out.

It's also worth noting that I could, at a later date, write tools that sit on top of that simple format, and I may well do so to support post-ship modding of the game or the Steam workshop (should I be on Steam), or whatever. For some of the data I expect to author, it'll be useful to be able to visualize the data in terms of frequency of certain occurrences, but I could certainly do that by writing additional simple tools that spit out different data (images, comma-delimited text files, whatever) from

I would almost certainly choose differently if I were working on data that I expected many people to have their hands in at any given time. At that point, I'd probably want a database of some kind and things like that. Solo development makes this sort of thing much easier. In this case, I'm designing development tools for myself -- a person who is super-comfortable with text files and actually can visualize stuff from text files pretty well in his head. Whenever you're developing, you should think first of the client for your tools. Also: if I worked against a database, later steps of this process would be largely the same, I just wouldn't be parsing the data in the same way.

Here's what my languages tend to look like, with some real text that I've moderately fudged to obscure actual subject matter (in other words, not a game about outlaws). Just to give the flavor. There are no lawyers, guns, or butter in my game. I may add butter. ;)

## Josie Wails
:littlecomputerperson JWails "Josie Wails"
:lcpshort JWails "Former outlaw, now reformed"
:lcplong JWails "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et"
:lcplong JWails " dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
:lcparrives JWails 3.3
:lcpskill JWails lawyers 16
:lcpskill JWails guns 32
:lcpskill JWails money 16
:lcpskill JWails butter -8
:lcpappears JWails initial
:lcpinitialHappy JWails 32
:lcpHappy JWails and gte ammunition .3 lte butter 0 3
:lcpHappy JWails gte ammunition .6 3
:lcparrives default 0.0

Stuff to note:


  • Simple command structure. Basically, every line is a command, and they all follow a pretty simple structure that adds a specific bit of data to a particular main container item. I declare the LCPs with friendly names (which will be displayed in-game) but then use identifiers from that point on to prevent the need for global search and replace later.

    This has some great benefits, in particular that I can separate out data however I like -- I might want to keep all the skills descriptions in one place so that I can see at a glance that I don't have anyone who has a positive "butter" skill or I might put all the descriptions in one place at the bottom since I tend to write them once and then not think about them again. If I were working with a format like JSON, my natural inclination would be to keep all the data for a particular LCP together in one place. (It would probably also be significantly more verbose.)

  • Minimal data per line. As I'm not going to ever ship this stuff, I just go ahead and put only one bit of data per line. It's all "this command tells you this about this data object".
  • No multi-line stuff. As you can see in the :lcplong command, it would sometimes be easier to deal with this stuff on multiple lines, but that just makes parsing complicated. Instead, whenever I see a long description, I look for the LCP id in the appropriate Python dictionary, and if it's there, I append.
  • Prepended : for commands. On occasion I have need to rename a command because I've changed what it means and already have data. It's helpful for find-and-replace to have the : there and it doesn't add appreciably to typing. Originally I had named the command "lcplong" as "long" but ended up using the same command in multiple data files. In case I want to merge them all together later, it's helpful to have distinct commands and searching and replacing ":long" with ":lcplong" meant I didn't have to worry about stepping on the word "long" in descriptive text.
  • Not shown: documentation in the file. At the top of each file I explain the syntax to myself for each command, because I expect I'll forget things. Often, it's enough to see the commands already in use, but it's helpful to have something to refer back to in case it's not clear. In a pinch, I can always examine the Python parser, but I'd prefer not to have to.
  • Also not shown: extensibility. Adding an "include" directive is super easy and doesn't complicate the resulting Python all that much, it just means encapsulating the parser loop in a function that can be called in a re-entrant fashion. Because commands are all basically additive, the "state" will be just held in Python data structures and anything that needs to be shared and included in multiple files can be easily handled.
  • Handling default behavior. Ultimately, I'm going to want to have reasonable defaults for things to fall back on, so there's a dedicated keyword that can't be used as the id of any specific LCP, which is "default". All of my parsing commands will notice this, as you'll see tomorrow.
  • Comments. Comments are, like Python, begun with the octothorpe³ and are single-line only.

Tomorrow I'll come back to this and show how simple languages like these are easily handled in Python and output to a binary.




¹Since shipping Sixty Second Shooter about a month ago, which had occasioned a six month break from my game or so.

²Note that I say game and not game engine. My game is currently hand-coded C++ on top of some cross-platform libraries.

³How great is that word.

Posted by Brett Douville at 02:30 PM | Comments (0)