Take This Journey With Me

If you know anything about me you know that I am currently enjoying graph databases. I employ them to solve just about any problem that I have and I even try to force them onto my friends.

I have a pretty storied history with graph dbs, in the past I've used OrientDB, Neo4j, Tinkerpop's Gremlin, and even looked into offerings like Caley and Twitter's now defunct FlockDB. I wasn't only looking for the right tool to solve my problem, but I was also looking for one that was the right fit for me and the tech that I use in order to get things done.

Each stack has its own set of strengths and weaknesses, that's a given. What isn't a foregone conclusion are things like community support, or tooling, or training, or product awareness, or even job opportunties available to those who choose to invest time and effort into learning that tech. These are a few of the areas in which Neo4j shone above the rest.

I'm Building Tools for Neo4j

I recently released Pypher, a Cypher query builder for Python. This was a crucial project for me as it gave me an opportunity to better understand the Cypher query language and get familiar with the Neo4j community, which is the richest out of all of the graph systems that I previously listed. I even wrote an introductory article about Pypher for the Neo4j community blog, check it out.

I like writing tools that make my job easier. I am a programmer and trying to do less work while feeling creative is roughly 90% of what we do to feel alive. The next thing that I am building in hopes of simplifiying workflow, when it is done of course, will be a Neo4j Object Graph Mapper (OGM) in Python. This should be interesting.

Why Tho?

Well, because I want to. And, if I'm being honest will yall, there is a slight tinge of N.I.H. that exists within my hands and fingers, which I'm open enough to admit. I also want to do things a bit differently than what is currently offered, you know, do them my way. Let's go over some of my initial thoughts and ideas on how I want this OGM to work.

Idea 1 - Async

My initial approach was to go all the way async via Python's asyncio module. With that we'd get access some non-blocking querying just in case your queries take a while to process. That's a dumb reason because I don't plan on using this library to write long-running queries and you shouldn't eiher, refactor your data model and thought process, man. Plus async makes things messy, some things need to be await'd, some don't. It would evolve into a confusing mess. And best of all is that async for database operations fail to offer any real benefit over synchronous calls.

Async is OUT

Idea 2 -- Query First

People dislike O.R.M.s, partly, because they make it difficult to express your complex query needs with the given set of objects and the interfaces that they provide. Let's stand that on its head and simply write some queries and figure out how to map the results to our objects later instead of the inverse. This is made a bit easier becuause Neo4j's responses are structured pretty well and describe what kind of content the data represents. Thanks for that, Neo4j devs.

Querying first IINNN

Idea 3 -- Data Mapper

I love Love LOVE when I write some code that is portable and it is usable in multiple situations. I may have a need to employ a set of business rules in a web application and have the same data models and rules available in a CLI app -- you think I want to write and maintain business rules in two locations? Answer: I don't.

A lot of O[G|R]Ms are created using an Active Record pattern where each instance maps directly to whatever datasource and the instance handles all of its own CRUD operations. This makes it difficult to write custom and, more importantly, reusable rules around your entities. It is definitely possible, but I find that the Data Mapper pattern gives me a bit more control over how things work and how I model my rule sets. It's like when you ask someone to explain why driving stick (manual for those outside of the United States) is better and they give you some wishy-washy answer about more control. This is like that.

My goal with this OGM is to create a simple system where each object has a defined singular role. Here are some of the objects that I will be building:

  • Entity -- This will be the root of both Node and Relationship objects. These are kinda dumb, think of them as pure POPOs -- their job is simply house data. You, as the developer, can add functionality here to make your lives easier, I wont stop you. I might even join in.
  • Property -- These will be actual value objects that Entity objects will reference. Their responsibly will be to convert values from Python to Neo4j and back. This will also house special Relationship properties that will allow these things to do graph things.
  • Mapper -- This is the main workhorse in the system. It will be used to translate entity objects to Cypher queries that are eventually passed to the Neo4j server.
  • EntityMapper -- These are custom mappers for Entity objects. This is where your business rules will live, this is the magic. The main Mapper object will do all of the work of determining which EntityMapper to use for which Entity, put ya feet up.
  • Query -- This object will do the actual conversion of Entity objects to Cypher queries.
  • Connection -- This will run the query and return:
  • Result -- This will be a result from the Neo4j server. This will use a Collection object to build Entity objects for use in your application. (The jury is still out on if Collection objects should be custom based on the entities contained, probably not though)

Data Mapper is a Major GO /Flossy filter

Idea 4 -- Entities are Dict-like

Python has some pretty cool things built into the language, one of them is Descriptors. Descriptors allow you to define properties for objects and and share a single instance of that property for multiple objects that implement it. If you look at the source code in some popular Python ORMs, you'll see that Descriptors are used to manage the data in models. I am unable to use this magic protocol though because Neo4j allows for property names to be anything that makes a valid string and that wont fly for Python object attributes. The solution is to make the Enity objects' properties accessible like you get members form a dict.

The goal is to write code that looks like this:

user = User()

user['name'] = 'Mark'

user['some crazy property!!!'] = 'some value'

...

Dict-like Entities are a Must

Follow Along

I will be documenting this whole process, I don't expect it to be more than three posts so it wont be too big of an investment for you. I just want to share some of my thoughts as I build something cool for myself and others to use. I also hope that this process will help some people see how the magic is made, because I be writing magic code.

What should I call this project? Holla at me on the bird @emehrkay.