I'm Writing an OGM part 2

Entities, Nodes and Relationships

Nodes and relationships are what makes a graph a graph -- simply put, nodes are the things that you're concerned with and relationships describe how and why those things are connected. In this OGM, in any Object *Data Mapper, we'll need an easy way to define and use the objects that represent our data, our things and our connections. Luckily, I've already come up with an object structure and this post is here to walk you through my thought process.

An Object Graph Mapper's (OGM) main purpose is to make your life as a developer a bit easier. This is achieved by putting the common plumbing needed to CRUD a datasource behind some easy-to-use and understand objects. In most cases this will be a boon for productivity and will allow for faster iterations from the prototyping stage to production-ready code. A good OGM, which is what we're building, of course, will help with code reuse allowing you to define business rules once and use them in different contexts within your applications.

The Entity

Objects, the O in OGM, are what make working with the database a bit more bearable. The O is a collection of objects working in concert to fulfill the dream. Some of these objects represent the data in your datasource while the others have responsibilities like managing db connections, or mapping the data objects back and forth with the datasource, some could even be tasked with communicating state to the outside world. There is a lot of possibility in that "O," but the objects that this post will focus on are the dumbest part of the OGM, the Entity object.

Entities have one job and that is to be a simple container that we store data. But being dumb does not exclude them from being complex mindBlown.pngm. Our entities -- Relationship and Node -- will allow for both structured and unstructured parameter and relationship definitions. Let me slow down before we get too deep into how everything works and explain some things.

Let's Get Started

Building a small OGM is a big task. There are a lot of components to consider and a lot of thought will go into how they all work together and we have to think about how it will fit into existing and new applications. So after deciding to build a new OGM, the next step was to determine what additional tech to use. Our goal is a simple one: get the data in and out of the Neo4j database via an easy-to-use and beautiful (this is a must) interface.

One of the most important pieces of the puzzle may be how the connection is made to the graph and how we utilize that connection. I found two options; a very cool Python 3.6 async-only solution that would work well for asyncio-based apps, and the official Neo4j Python package. Luckily for us both of these packages offer great interfaces with their own comprehensive object structures. Since we already decided that async is out back in part one of this series, the official Neo4j Python package will do the heavy lifting for us.

Exploring Neo4j-Driver

Using the official driver is pretty simple, I wont go into how to set it up and and how to send queries to the database, just believe me when I say that it is a good piece of code. So lets go ahead and talk about the responses we get when querying the graph with it. After a successful query we're given a list of Record objects. Each Record will have metadata associated with the return statement and could contain Node and Relationship objects. Those objects contain many useful members: id, properties, and type (for relationships) or labels (for node). That is great news for us as we'll be able to take that information and build out our own Node and Relationship entities. This finna be a breeze!

Entity Structure

Entities, both Node and Relationship objects, contain basically the same information: an id, a label/type to classify it, and properties to store its data. With that in mind we can create a base object for both types to subclass and we can define any unique behavior in the respective class definitions.

We know that we'll need a way to map specific nodes and relationships returned from queries to our defined entities. To accomplish this we will create a mapping of the entity's label to the Entity class. Let's use a metaclass for this.

Metaclasses define how objects are created, things defined in their methods are run when the code is read by the interpreter. This will give us the opportunity to read the LABELS property of the entity and create that labels => Entity link that we were talking about.

ENTITY_MAP = {}

class _Entity(type):
    def __new__(cls, name, bases, attrs):
        labels = attrs.get('LABELS', 'NO_LABEL_DEFINED')
        labels = ':'.join(labels) if isinstance(labels, list) else labels
        cls = super(_Entity, cls).__new__(cls, name, bases, attrs)
        ENTITY_MAP[labels] = cls
        return cls


class Entity(object):
    __metaclass__ = _Entity


class Node(Entity):
    pass


class User(Node):
    LABELS = 'User'

The code in the __new__ method accomplishes what we want to do, tie the labels to the class. This will be useful when we get to creating objects from query responses that we'll talk about in the next part of this series. If you were to print out ENTITY_MAP you will see something like {'User': <class '__main__.User'>, that is what we're looking for, this magic is the basis of our whole OGM!

Entity Properties

By default our Entity objects should allow for random properties to be set and retrieved, but sometimes we'd want some structure in our data definitions, we must account for both of these scenarios. We'll start with the unstructured Entity and use some magic in order to get this done.

If our goal is to have the ability to do this:

class User(Node):
    PROPERTIES = {
        'username': String(default='Mark'),
        'password': String()
   }

me = User()
me['username'] = 'mark.was.here'

We'll need to do two things: update that __new__ method to find and process the PROPERTIES member, and define Property objects to hold the actual data for our entities.

class Property(object):
    def __init__(self, value=None):
        self._value = value

    def get_value(self):
        return self.convert_value(self.value)

    def set_value(self, value):
        self._value = value

    value = property(get_value, set_value)

    def convert_value(self, value):
        return value

Our Property objects will be simple ways to convert a value from one type to another. We can, and will, do a lot with this as the application grows, but this is the core of what we need it to do.

class PropertyManager(object):
    def __init__(self, properties=None):
        self.properties = properties or {}

The PropertyManager will be a way for our Entity objects to keep track of which properties they own and their current values. But how do we get from the PROPERTIES dictionary defined in the entity to the PropertyManager class? We'll do that in the __new__ method.

class _Entity(type):
    def __new__(cls, name, bases, attrs):
        labels = attrs.get('LABELS', 'NO_LABEL_DEFINED')
        labels = ':'.join(labels) if isinstance(labels, list) else labels
        cls = super(_Entity, cls).__new__(cls, name, bases, attrs)
        ENTITY_MAP[labels] = cls

        # add the properties to the entity instance
        properties = attrs.get('PROPERTIES', {})

        def _define_properties(self):
            props = {k: copy.deepcopy(v) for k,v in properties.items()}
            self.properties = PropertyManager(props)

        setattr(cls, '_define_properties', _define_properties)

        return cls

class Entity(object):
    __metaclass__ = _Entity

    def __init__(self):
        self._define_properties()

As you can see, during the creation of Entity object, we add a method called _define_properties. That method simply uses the object's PROPERTIES attribute to create a new PropertyManager attribute._define_properties is then called in the Entity.__init__ method, this illustrates how we can use Python's metaclasses to define desired behavior in our resulting objects. Be responsible, even if I'm not.

PropertyManager Expanded

Let's take a few minutes and give our Entity objects the ability to set their properties during instantiation. What should happen is that the Entity gets a dictionary of key => value pairs where the keys are strings and the values are Python primitives, it should then pass it to its PropertyManager member to be processed.

class Entity(object):
    __metaclass__ = _Entity

    def __init__(self, properties=None):
        self._define_properties()
        self.hydrate(properties)

    def hydrate(self, properties=None):
        self.properties.hydrate(properties)

Now we'll add a hydrate method to the PropertyManager.

class PropertyManager(object):
    def __init__(self, properties=None):
        self.properties = properties or {}

    def hydrate(self, properties=None):
        properties = properties or {}

        for name, value in properties.items():
            self.properties[name].value = value

        return self

This hydrate method will set the .value attribute on any Property objects that it manages, but what if we attempt to set a property that it doesn't manage? Well, lets check to see if it exist and if it doesn't, we need to figure out what type of Property should be added.

def hydrate(self, properties=None):
    properties = properties or {}

    for name, value in properties.items():
        if name not in self.properties:
            if isinstance(value, str):
                self.properties[name] = String()

            self.properties[name].value = value

    return self

That was simple, but note that it only checks for str types and creates String Property objects. We can flesh that out later as we figure out what datatypes this OGM will allow to be passed back and forth.

One More Thing

We want the Entity instances to access data via a dict-like interface. That means that we must define custom __getitem__ and __setitem__ methods it which would simply proxy to its PropertyManager object.

class Entity(object):
    __metaclass__ = _Entity

    def __init__(self, properties=None):
        self._define_properties()
        self.hydrate(properties)

    def hydrate(self, properties=None):
        self.properties.hydrate(properties)

    def __getitem__(self, property):
        return self.properties[property]

    def __setitem__(self, property, value):
        self.properties[property] = value
        return self

class PropertyManager(object):
    def __init__(self, properties=None):
        self.properties = properties or {}

    def hydrate(self, properties=None):
        properties = properties or {}

        for name, value in properties.items():
            self.__setitem__(name, value)

        return self

    def __getitem__(self, property):
        return self.properties.get(property, None)

    def __setitem__(self, property, value):
        if property not in self.properties:
            if isinstance(value, str):
                self.properties[property] = String()

        self.properties[property].value = value

        return self

Check out our PropertyManager now, so much SRP going on. This is going to make unit testing easier for us, which would in turn ultimately make management and continued development of the whole package something that we can do with a certain level of confidence.

Wrapping This Up

In this post we covered the process of taking a look at an abstract idea -- I'm going to build an OGM -- and broke down how we'll tackle it. We took at look at a single piece, the Entity and further discovered what it needed to support the interface that we desired. That lead to the creation of testable, independent objects that are allowed to perform one job.

Thanks to some of the meta-magic in Python, our Entity objects can do a lot of things with very little code. This will make for a great foundation to build the rest of the OGM on top of and you will see some of these patterns repeated in the codebase moving forward.

There is still a lot more for us to do with regard to Entity objects in this OGM. We still need to:

  • Allow for multiple labels for Node objects
  • Set the id for an Entity
  • Set the start and end nodes for Relationship objects
  • Allow Entity.PROPERTIES to inherit from parent classes
  • Define a way for us to get all of the data out of our Entity objects
  • Limit data setting to pre-defined properties on Entity definitions
  • Write unit tests for every object and integration tests where those objects interact with one another
  • Define an object that will store data that isn't either a Node or a Relationship, maybe Generic or Response, we'll see

All of those things are worthwhile additions to the library and they will get done, but I will leave them as things to do outside of this blog post that you can read about in the source code. I really wanted to show my thought process as I attempt to tackle this problem and build something from scratch. And I hope that the, still un-named project, will be a bit easier to use based on these build-with-me posts.

What's Next?

The next, and possibly final, post in this series will cover the Mapper and Query objects -- basically how we turn these Entity objects into Cypher queries and how the OGM will convert the returned values into Node and Relationship and Generic or Response objects.

That next post may take a while because I am writing the OGM while penning these posts. That has proven to be a good approach as I changed quite a bit of code for the better while writing about how entities will work in this system and that was even after the unit tests were completed. So keep an eye out for that one, it will be a pretty lengthy post.

Tweet me @emehrkay your comments and suggestions.