Getting Started with TinkerPop's Gremlin Server & Gizmo (Python)

I recently became a hug fan of (read obsessed with) graph databases. I've dabbled in quite a few of existing solutions out there, but ultimately settled on TinkerPop's Gremlin for use in my personal and professional projects. With Python being the language that I go to first to solve web-related problems, I naturally created an Python-based O.G.M. (Object Graph Mapper) for use with the Gremlin server called Gizmo.

In this post I will show you: how to quickly configure the Gremlin server and get it running and how I would utilize Gizmo and Tornadoweb to create a blogging engine.

The Gremlin Server is A LOT to take in, if you are not familiar with how things work you will quickly be in over your head. However, the community is helpful and the documentation does a good job of covering the core concepts.

Setting up the Gremlin Server

ThinkerPop Gremlin is a Java application that relies on Java8, so make sure you have that installed before moving forward. The next step is to go to TinkerPop's homepage and download the Gremlin Server archive.

Installation is simply unzipping the files and running the executables

When you unzip the Gremlin Server archive you will be presented with a structure that will look something like this:

directory structure of the unzipped Gremlin Server archive

We will be editing three files located in the conf and scripts directories. However, since TinkerPop covers a wide range of graph databases, configuration may be slightly different depending on the vendor. Please check their manuals.

Step 1:

Add a configuration file for your graph. I like to follow the naming convention in the config files which seems to be $GRAPH_VENDOR-$GRAPH_NAME.properties, so lets make a tinkergraph-blog.properties file and fill it with:

gremlin.graph=org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerGraph
gremlin.tinkergraph.vertexIdManager=UUID
gremlin.tinkergraph.edgeIdManager=UUID

These are very simple settings. The first line tells the Gremlin Sever that we will be using an in-memory TinkerGraph instance. The second and third lines define how the graph should store its ids for its vertices and edges respectively (we chose UUID, but there are other options like LONG, or INTEGER -- see the other configuration settings).

Step 2:

When you start the Gremlin Server it will load gremlin-server.yaml by default, so we'll edit that file to expose our newly configured graph to the world. Edit the graphs section to look like this:

graphs: {
  graph: conf/tinkergraph-empty.properties,
  blog_graph: conf/tinkergraph-blog.properties}

We added blog_graph with a path to our configuration file. This names our graph blog_graph so that if you were to connect to the server, any action taken against our graph would have to start with blog_graph.

Step 3:

Our final Gremlin Server configuration step is to expose the Traversal for our blog_graph so that Gizmo can use it. If you were to look at the gremlin-server.yaml file, you'd see a section that is gremlin-groovy.scripts that points to a file scripts/empty-sample.groovy. That file is loaded every time the Gremlin Server is loaded and it is useful for adding pre/post loading hooks and defining global variables that will be available for every request (this is what we're modifying it for!).

Our plan is to simply expose the blog_graph's Traversal object as blog. Open scripts/empty-sample.groovy and add this line at the very bottom:

globals << [blog : blog_graph.traversal()]

Starting the Server

Now you are ready to start the Gremlin Server. Open up your terminal and navigate to the folder where you unzipped the Gremlin Server archive and type:

./bin/gremlin-server.sh

console output from starting the gremlin server

You can see from the output that blog_graph has been loaded, the blog Traversal object has been set as a global, and is running on port 8182.

Understanding Gizmo

Gizmo is a full-fledged O.G.M that relies heavily on Gremlin-Groovy and incorporates a Data Mapper pattern instead of Active Record. There is a lot to unpack in that sentence: what is a Data Mapper and how does it benefit me and wtf is Gremlin-Groovy?

Gremlin-Groovy

Well Gremlin-Groovy is a language used to describe and traverse the graph. Gizmo simply records actions taken against Python objects and converts them into Gremlin/Groovy scrips that will be executed on the Gremlin Server.

Data Mapper

I chose the Data Mapper pattern because of the expressiveness of Gremlin-Groovy. Gremlin-Groovy allows you can write full Groovy programs including conditionals, imports, throw exceptions -- basically whatever you want to do in a JVM language. I wanted the Gizmo O.G.M. to be as flexible as possible so it doesn't try to match any interfaces found in the Gremlin Server and rewrite them as Python methods forcing the user to learn and switch context between the "Gizmo" way and the Gremlin-Groovy way. Gizmo has a small interface and if you need to go deeper, you write Gremlin-Groovy.

The Data Mapper pattern also provides a lot of flexibility when it comes to managing your entities (things like Users or Posts) by not only giving them custom methods, but also allowing for overriding of common actions like create or delete ar save.

Gizmo works its magic via a few easy to understand objects that work in a very liner fashion.

Lets See Some Code

This is why you're here, this is the good stuff. We'll employ Gizmo and Tornadoweb along with the Gremlin Server to create a very simple blogging engine. Lets start with the models.

Models.py

In this models.py file we will define the connection to the graph, the entities that we'll use, and any custom mappers (in a bigger project you'll want to separate these things into their own modules).

import asyncio
from gizmo import Request, Vertex, Edge, Mapper, String
from gremlinpy import Gremlin

# setup the connection
request = Request('localhost', 8182)
gremlin = Gremlin('blog')
mapper = Mapper(request=request, gremlin=gremlin)

Gizmo has a dependency on a project that I author called Gremlinpy. Gremlinpy simply allows you to express Gremlin-Groovy with Python syntax.

In the snippet above we created a Request object which points to the Gremlin Server, a Gremlin object that stores our graph name blog, and our main mapper wich will facilitate any conversions from Python objects to graph and back.

Thinking about the schema for a simple blogging engine, we'll need a User, Post, and a Tag object. Lets define them.

from gizmo.entity import Vertex, Edge
from gizmo.field import String, Boolean, DateTime

class User(Vertex):
    name = String()

class Post(Vertex);
    title = String()
    slug = String()
    content = String()
    published = Boolean(values=False)
    date = DateTime()

class Tag(Vertex):
    tag = String()

This is very straight-forward stuff. You may notice that we have a weird values argument for the Boolean field, that is done intentionally to match the structure of the data that comes back from the Gremlin Server.

This looks like a good start, but we are not utilizing the power of the graph which lies in the connections (Thee Edges!). Lets define some of those edge objects and get our graph graphin`.

class Author(Edge):
    pass

class HasTag(Edge):
    pass

Edge entities can have properties, but we don't need them in this example.

Cool. Cool, we're getting somewhere. Now it's time to start digging into some of Gizmo's cooler features. We'll start with something simple like limiting one Tag object per whatever it is connected to by defining a custom mapper for it.

from gizmo.mapper import EntityMapper

class HasTagMapper(EntityMapper):
    entity = Tag
    unique = 'both'

This HasTagMapper simply says that it anytime a Tag object sent through the Mapper we defined above, it will take over and its settings and methods to handle the object. The unique 'both' property says that anytime this edge is saved, it will check both directions and ensure that there is not existing connection between the entity and the Tag object before saving. If there is an existing edge, it will be returned, if there isn't one will be created and then returned.

class HasAuthorMapper(EntityMapper):
    entity = Author
    unique = 'both'

The HasAuthorMapper object will do the exact same as the HasTagMapper with regard to uniqueness and entity management. We're done with our edges.

class UserMapper(EntityMapper):
    entity = User
    unique = ('name')

class PostMapper(EntityMapper):
    entity = Post
    unique = ('slug')

class TagMapper(EntityMapper):
    entity = Tag
    unique = ('tag')

The remaining custom mappers all use the unique field, but since they map Vertex objects it will ensure that whatever is defined as unique will be unique in the graph, ie you cannot have two User entities with the name 'emehrkay'.

Gremlin Server Data Structure

Gremlin Server can be configured in a way were each property can have multiple values and each value can have key:value pairs. So imagine having a User entity, its data would look something like this:

{
    'name': [
        {
            'value': 'some name',
            'properties': {'some property': 'some value for some name'}
        },
        {
            'value': 'second name',
            'properties': {'some property for second name only': 'some value for second name'}
        }
    ]
}

I dont use this feature, but Gizmo by default will manipulte and return data in this way. To make things easier for myself, and hopefully you, I wrote a little code that you can mix into the Entity and Mapper classes that will make your vertices and edges behave as if their fields and values were a one-to-one mapping.

class BaseBlogEntity(object):
    @property
    def data(self):
        data = super().data
        fixed = {}

        for field, value in data.items():
            if (isinstance(value, list) and len(value)
                and isinstance(value[-1], dict)
                and 'value' in value[-1]):
                fixed[field] = value[-1]['value']
            else:
                fixed[field] = value

        return fixed

class BaseBlogMapper(object):
    async def data(self, entity):
        fixed = {}
        data = await super(BaseMapper, self).data(entity=entity)

        if not entity:
            return fixed

        for field, value in data.items():
            if (isinstance(value, list) and len(value)
                and isinstance(value[-1], dict)
                and 'value' in value[-1]):
                fixed[field] = value[-1]['value']
            else:
                fixed[field] = value

        return fixed

While Gizmo will allow you to do user['name'] = 'some name', utilizing the code above will return 'some name' instead of something like [{'value': 'some name', 'properties': {}}] with accessing the name attribute.

Setup a Simple Web Server

The next step in this journey is to utilize the power of Tornadoweb to create a very simple web server that will allow us to CRUD blog entries.

All of this code is on Github with instructions on how to run it locally.

import asyncio

from tornado.web import Application, HTTPError, RequestHandler
from tornado import httpserver, platform

from gremlinpy import Param

from model import mapper, Post, User, Author


UUID_RE = '[0-9a-f]{8}(?:-[0-9a-f]{4}){3}-[0-9a-f]{12}'
PORT = 9999
USER_ID = '0c8cfdaa-a42c-4fdb-8f62-e7d54a259c7c'


class BlogHandler(RequestHandler):

    async def get_by_id(self, id):
        try:
            if not id:
                raise

            _id = Param('GET_BY_ID', id)
            g = mapper.gremlin.V().hasId(_id)
            res = await mapper.query(gremlin=g)
            entity = res.first()

            if not entity:
                raise

            return entity
        except Exception as e:
            raise HTTPError(404)

    @property
    def data(self):
        return {
            'title': self.get_argument('title'),
            'content': self.get_argument('content'),
            'published': self.get_argument('published', False),
        }

    async def get(self, id):
        entry = await self.get_by_id(id)
        return self.write(entry.data)

    async def post(self, id=None):
        # we'll create a new blog post and connect it to the user
        entry = mapper.create(self.data, entity=Post)
        user = await self.get_by_id(USER_ID)
        author = mapper.connect(user, entry, edge_entity=Author)
        await mapper.save(author).send()

        return self.write(entry.data)

    async def put(self, id):
        entry = await self.get_by_id(id)
        entry.hydrate(self.data)
        await mapper.save(entry).send()

        return self.write(entry.data)

    async def delete(self, id):
        entry = await self.get_by_id(id)
        await mapper.delete(entry).send()

        return self.write('Blog {} deleted'.format(entry['title']))


class PostsHandler(RequestHandler):

    async def get(self):
        g = mapper.start(Post)
        res = await mapper.query(gremlin=g)
        data = {'data': res.data}

        return self.write(data)


def make_app():
    routes = [
        (r'/blog(?:/(' + UUID_RE + ')?)?/?', BlogHandler),
        (r'/posts/?', PostsHandler),
    ]
    settings = {
        'debug': True,
    }
    return Application(routes, **settings)


if __name__ == "__main__":
    platform.asyncio.AsyncIOMainLoop().install()

    ioloop = asyncio.get_event_loop()
    app = make_app()
    server = httpserver.HTTPServer(app)
    server.listen(PORT)
    print('Server Running on Port: {}'.format(PORT))
    ioloop.run_forever()

As you can see, this is a very simple web server that has two endpoints: /blog/[UUID]/ and /posts/. The /blog/ endpoint allows for full CRUD of blog posts while the /posts/ endpoint will return all posts in the graph.

Clone the project from here and we can being to consume this tiny api.

First we will add a new blog entry (I am using the Python project HTTPie, it's pretty dope).

adding blog post via api

Now we can take that id and get it from the api.

retrieving a blog post via api

We can add more blog entries and then see them all (at this point I realized that I had messed up with naming my routes, but I am too deep into this to change things up now)

retrieving all blog posts via api

Let us remove one of the posts

deleting a blog post via api

Conclusion

As you can see, Gremlin via Gizmo is pretty easy. I used Torandoweb in these examples because it had asyncio support out of the box, but just about any newer web framework can get the job done.

Keep an eye on the Gizmo project. Documentation and official releases will be coming soon.

Have fun and if you have any questions, hit me up on Twitter @emehrkay.