Tuesday, January 10, 2012

Entities and values

As so often in programming, one big question arose when I worked on my Undo/Replication framework: What is an entity, and what is a value?

I use the word entity to make clear that I mean something different from objects. An object is something identified by an address pointer, whereas for a primitive value, its bytes in memory directly represent the value instead of an address. What I mean with entity/value is best explained by an example from databases:

Suppose you have a database table storing the data of a company's employees. There is a surname, a prename, a birth date, and so on. Besides others, there's an address. The simplest approach is to handle each of these items as a table column: Surname and prename are strings, the birthdate uses a special date column type. The address can also be simply taken as a string, but there are other ways:

An address is composed of a street name, a number, probably also an appartment number, country, state etc. It could be beneficial to store all these data in a table on its own, give it an ID (the database way of saying address), and only reference this ID in the employee table.

Don't forget the third way! You could use all those columns, but put them directly into the employee table, so there would be prename, surname, birth date, street, stree no., appartment no., etc.

What's the best? Of course, it depends! Variant one is easy to use, but lacks intrinsic validity checks: When the database doesn't know that something should be an (appartment) number, how should it check it's even a number? Variant two can handle a very specific case "better": two workers have the same address. In this case, the same ID can be referenced multiple times, so that both employees not only have equal addresses but also have the same one. In this particular case, the benefit is not too big, but there are use cases where it's important that something can be referenced. About the third, it really combines the downsides of the both above... I personally see no benefit in using it. It there are, I'm sorry I didn't see them.

Now, what should have triggered your attention was the word "ID". As soon as something has an identity, it's an entity. Note how the prename is a String, which is an Object type in Java, but only a value. On the other hand, the address in case 2 has an identifier and is an entity.

And now we're back: What I do is to assign IDs to the objects I want to replicate. The objects have values which can be changed, and I transmit changes like "property XY of object 01 changed from 'ab' to 'cd'."

And now the big question: what about collections? Should it be "element 'ab' was added to list 01" (i.e. they are entities) or "element 'ab' was added to list XY of object 01?"

No comments: