Controlling Complexity in the API

Controlling Complexity

(In the API)

What is complexity?

The natural result of organically grown systems

Interconnectedness.

Mushrooms

Mushrooms?!

Yes. Mushrooms.

Tasty

Poisonous if it's the wrong kind

Grows on rotting stuff

Actually grows beneath the surface, popping up just when it's ripe

An example

1
2
3
4
5
class User {
   function getName() { ... }
   function getAddress() { ... }
   function getMap() { ... }
}

An example

1
2
3
4
5
6
7
class User {
    static function loadFromDatabase($id) { ... }
    function saveToDatabase() { ... }
    function getName() { ... }
    function getAddress() { ... }
    function getMap() { ... }
}

An example

1
2
3
4
5
6
7
8
9
10
class User {
    static function loadFromDatabase($id) { ... }
    function saveToDatabase() { some conversion }
    function saveToMemcache() { ... }
    function getName() { ... }
    function setName() { validation here }
    function setNameByAdmin() { not validating here }
    function getAddress() { some formatting logic }
    function getMap() { ... }
}

Mushrooms.

If you never have to care what's in those methods, great...

... but you do have to reason about the system as a whole

Granularity

Do you operate on a given type of object field-by-field?

1
2
3
4
5
6
class User {
    static function loadForDisplay($id) { ... }
    static function loadForEdit($id) { ... }
    function setName($name) { update the database }
    function setAddress($address) { update the database }
}

Lots of micro-optimizations. Small mutations are the key actions.

Or do you operate on it as a whole?

1
2
3
4
5
class User {
    static function load($id) { ... }
    function validate() { ... }
    function save() { update the database just once }
}

Bigger pieces. You work with the whole thing all at once.

Granularity

Field-by-field

1
2
3
4
5
try {
  $user->setName('bob');
} catch (validationException $e) {
  ...
}

Whole object

1
2
$user->name = 'bob';
if (!$user->validate()) { ... }

Granularity

Field-by-field

1
2
3
4
5
6
7
8
9
10
11
try {
  $user->setName('bob');
} catch (validationException $e) {
  ...
}
 
try {
  $user->setAddress('123 Main St.');
} catch (validationException $e) {
  ...
}

Whole object

1
2
3
$user->name = 'bob';
$user->address = '123 Main St';
if (!$user->validate()) { ... }

Representation

Field-by-field

1
2
3
function toJSON() {
  return json_encode(array('name' => $this->getName(), 'address' => $this->getAddress()));
}

Everything in the object is locked up tight. Everything takes explicit conversion.

Whole object

1
2
3
function toJSON() {
  return json_encode($this);
}

We're working with the canonical representation.

Representation

Field-by-field

1
2
3
4
function loadFromDatabase($row) {
  $this->setNameNoValidate($row['name']);
  $this->setAddressNoValidate($row['address']);
}

You have to extend the API to handle invalid data. What you've got inside your object is (assuming no bugs) good, but now you have to do validation before you put data in.

Whole object

1
2
3
4
5
function loadFromDatabase($row) {
  foreach($row as $key => $value) {
    $this->$key = $value;
  }
}

You can actually load invalid data and work with it in its own domain. No need to have an intermediate representation for not-yet-validated data.

1
if($user->validate()) { ... }

Lifecycle

Does our User object last milliseconds? Hours? Days?

In PHP, the answer is almost always 'milliseconds'. The entire object structure is torn down for each request. All state saved must be explicit.

In a system like a router's firmware, where the same code will have long-lived instances for objects like route table entries, the lifecycle is very different. Objects may live for years. Those are state objects, and any change to their internal representation affects the system as a whole.

State Transitions

Bugs happen any place the state of the system can change.

To reduce the amount of code that has to be analyzed to see a change's effect, we define points where the state can change, and enforce those.

In our user object example, the system as a whole is affected only when the object is persisted. Our PHP doesn't talk much to external systems.

State Transitions

1
2
3
4
5
6
7
8
9
10
class User {
    static function loadFromDatabase($id) { ... }
    function saveToDatabase() { some conversion }
    function saveToMemcache() { ... }
    function getName() { ... }
    function setName() { validation here }
    function setNameByAdmin() { not validating here }
    function getAddress() { some formatting logic }
    function getMap() { ... }
}

State changes in line 3. And 4. And 6. And 7. And maybe in 8. Line 8 looks like a fantastic place to update our stored copy if we discover it's in an old format.

There be dragons.

State Transitions

1
2
3
4
5
6
class User {
    public var $name;
    public var $address;
    function validate() { ... }
    function save() { ... }
}

State changes only when we save the object.

But how are you sure?

If this were Erlang, we'd match only valid objects. If this were OCaml, validated objects would be a different type, and we could declare that most of our functions only work with the persisted, validated state.

This is PHP. We'll have to assert things are in the right state.

1
2
3
4
5
6
7
8
9
10
function frobnicateWithMyUser($user) {
  assert('$user->getName()');
  assert('$user->getAddress()');
  assert('... the list goes on ...');
 
  // I sure hope I didn't miss one. I hope someone changing this
  // code didn't add a new one elsewhere.
 
  // I can't be sure. I'll add more validation here.
}
1
2
3
4
5
6
function frobnicateWithMyUser($user) {
  assert('$user->validate()');
 
  // This function is extra picky. I'll go add more validation
  // in the user's validate method.
}

The state we really care about is "is it valid?". That's it. Better to define that in one place.