The UML class diagram: you are probably using it wrong

As I have mentioned before I learned VB6 with the help of a book. The book introduced me to the basic control flow structures and then to databases as a form of storage. It touched a little on the relational theory and how the structure of the database would affect the structure of the data as represented in the software and therefore the need to start modeling the database before anything else. So, for years I followed that pattern. The data model became my de facto starting point. Even when I learned C++ and Java, 2 OOP languages, my analysis approach remained the same. For a long time…

The confusion

A common approach to database design is to use the entity relationship diagram. It shows tables and the way these are related to each other. It’s pretty simple actually and easy to reason about. I believe that’s the reason for their popularity. So when I was starting into OOP (as opposed to only learning OOP languages) and I was introduced to the Class diagram I immediately mapped it to the ER diagram and hence the thinking behind it. So for a long time I would still doing data modeling, but instead of using an ER diagram I was now using a Class diagram.

The UML diagrams classification

Suppose you are tasked with creating a robotic arm to crack eggs open. Where would you start?

Probably you would start by trying to figure out the movements used to crack an egg open. You would do this by looking an expert and maybe trying to learn the trick yourself. Then you would start thinking how to replicate that movement. This may lead you to identify a key components and the interactions between them. You may make some sketches. And from here you probably would start experimenting with different configurations and maybe tweaking the pieces a little bit until your experiment is successful. And then you would create some blueprints so some one else can build your piece of art anytime.

Software development is not different.

Static diagrams

Just like the blueprints for creating a robotic arm, the code we write is the blueprint used by the computer to create the software artifacts of our application. This static aspect of software defines the structure of the components that are going to be created and the UML static diagrams such as class diagrams are used to model this.

Dynamic diagrams

Like the interaction between the different parts of a robotic arm, the UML dynamic diagrams describe the interaction between parts of the software. This is dictated by the messages send back and forth between objects.

Relational vs OOP

So while the ER diagram it’s used to model the database structure, the class diagram it’s used to model the application structure. One is for data storage and the other for the application code. More often than not, the structure used to store data in relational databases is not the best for an OOP application. The opposite is also true. This is due to the approach taken for each paradigm: the relational school makes emphasis on avoiding data repetition using a technique called normalization, while the OOP side stands for avoiding code repetition using inheritance, composition and other techniques. This difference is known as object-relational impedance mismatch.

Spotting the differences

While at a glance the ER and Class diagram look similar the process by which each it’s created it’s very different.

The ER construction process

The process to the ER diagram is like creating the blueprints for construction. Since it’s not so easy to change a relational database once you start putting data on it, it’s better to make it right from the start. To this end we identify the entities on our domain space (the business industry) and then try to identify the relevant information for each from the data flow of the application. An a lot of time we guess some of the information that may be used in the future (at least I have).

The Class diagram construction process

On the other side the Class diagram construction process is more like the construction of a robotic arm: it takes a lot of iterations and you start by modeling (sketching) the mechanisms (dynamic aspects) and then do a lot of experimentation until you get it right. I often use an interaction diagram to model how the objects (not classes) are going to interact. When you have a set of objects and a set of messages send between those objects, the Class diagram is almost a natural step: you already know how those pieces interact, now just have to figure out how they are to be assembled together.

Using the class diagram: a better way

As Allen Hollub points out, we do object oriented programming, no class oriented programming. Classes are just that, classes of objects, or in other words, groups of objects that share some traits. You can only start to classify objects when you already have a bunch of them.

Start with the dynamic side of things: find your key objects and the messages they send to each other.
Experiment using a testing framework so you can see if it’s feasible in the shortest possibly time. Adjust as necessary.
Create a Class diagram and start sketching everything you got so far, trying to identify and remove any duplicated code and to identify and decouple the parts that may change often than the rest. Don’t let the worries about how you are going to store your data affect your Class diagram. Try to delay dealing with persistence as much as possible.
Only after you have a set of stable objects use the class diagram to aid you in the creation of your ER diagram. This is not a 1:1 mapping, don’t be afraid to optimize your ER diagram to your RDBMS.
Use your class diagram to explain the structure of your system to other developers, and the interaction diagram to explain, well, the interactions between your objects. They can serve you to demonstrate the implementation of certain patterns in your application. After all, diagrams are all about communicating ideas.