Saturday, December 31, 2011

Replicating Game States - revised

The last time I talked about my Undo system, which should also allow multiplayer through its structure of storing actions that can be transmitted over the network, was a long time ago. I don't want to directly talk about this today; you'll see.

Especially one thought has gone through my mind in the last weeks: Replicating a state across the network properly is nothing specific to Magic - so why do I implement it in the Core of Laterna Magica? I thought about it, and there doesn't seem to be a specific reason for doing so. Another result of that thought was that there are probably complete libraries out there which do exactly what I want. Well, that's unfortunately not the case, so I'm stuck/blessed with implementing it myself.

I haven't really gotten into development yet. I have a rough skeleton, but no really fancy functionality yet. But I'm having my thoughts. I think stepping away from the application's needs and looking at the general requirements of such a library help in developing a better piece of software. I haven't done this for my current undo system, which I find kind of disappointing. Normally, I don't have problems with spotting such generalizations.

Okay, back to topic: what does a library for state replication need?
  • There needs to a notion of what has happened on a higher, application level: Transactions
  • Transactions consist of atomic modifications, which are the core of the state
    • Optionally, they may also consist of other sub-transactions
  • Transactions and modifications must be transmittable over the network. If the whole state is represented as a tree of subtransactions under a root-transaction, then there's of course the need to transmit unfinished transactions. Otherwise, it might be enough to transmit finished transactions. I think that the first is generally more flexible. It might be a good idea to give the user a chance to mark transactions as atomic so that it's only transmitted when it is finished.
  • Probably the most interesting requirement is that, of course, after transmitting the state, both/all states have to be the same!
  • The system should be able to detect that there are state conflicts and that a transmitted transaction can't be carried out locally.
    • When we're already speaking of it, it's probably nice (but also probably impossible, so see that as an utopic remark) to resolve/merge such conflicts.
Now guess what I have an example for! Of course for the "most interesting" one in my list! It might seem at first that by defining modification, and making them granular enough to catch all actions in a system, there is no way that states may get out of sync. That's unfortunately not true. Let's take a simple case. This case actually only has a theoretical problem, but it's easier to illustrate and serves  the purpose. Our state consists of two variables, foo and bar. The user may only set foo, and when he does so, the program automatically changes bar. Now consider this:

When the state replication system transmits these two modifications, the receiver does its own set operation on bar, because foo has changed. Then, the original set operation is transmitted, setting bar another time (fortunately to the same value).

Until now, I thought that this particular case is just me wanting to be clean. Now that I wrote this, I even see more problems, real ones: If we're able to detect conflicts, then this shouldn't even run through, because the last modification should be the second but really is the third. And even if we don't: this second modification in State 2 needs to be transmitted to State 1. Even if that doesn't result in errors, which is likely, it will waste network bandwidth at least.

Now for the real problem I spotted here before: Let's replace "bar = 1" with "put a token onto the battlefield". Setting a number multiple times is redundant, but creating an Object is not. There is a very definite difference between one and two tokens, you know!

Now, let's solve these problems. There are basically two ways that can lead to this problem: a complex setter, and events.

In the first case, the setter for foo includes code that calls the setter of bar. The problem here is that our replication library can't know that setting foo doesn't really do this: it actually sets foo and bar. So, refactor foo's setter so that it does only what it says, and not more. Hide this from the user by adding another method doSomething() that does what he previously did by setting foo.

Second is events. In this case, the change to foo fires an event that leads to bar being changed as well. In this case, the simple answer that is not so simple to implement is: don't fire events while replicating the state. The more complex answer that is simpler to implement is that this is similar to the first case: Don't write events that fire on state changes and perform state changes. Instead, fire a new Event in doSomething() that triggers the change. As the state replication won't call doSomething(), everything is fine.

Both solutions shown here require changes in the application, not in the library. This might seem bad, but it's actually logical: The library may sanely assume that variables change and objects are created, and that it may do that by itself as well. If the state does not fulfill the contract that it just contains values, but instead employs complex logic that the library can't predict, the task is impossible. The library needs some way to change values without triggering application logic.

Thanks for staying wiht me, and a happy new year to all of you!

    No comments: