Getting Started with TinkerPop's Gremlin Server & Gizmo (Python) Part 2

In part one of this series we did a quick setup of the Gremlin server and put together a very simple blogging API that covered simple CRUD of blog posts. In part 2 we will go a little bit deeper with Gizmo covering some of its cooler features that will help you quickly and easily get data in and out of the graph.

Adding Functionality to our Blog API

One of the things that we mentioned, but did not implement, are tags for the blog posts. We added a Tag vertex and a HasTag edge and both of their mappers, but there is no functionality in the API to take advantage of any of that. We'll add that now.

let's think through some requirements for tags and tagging items in our small system:

  1. Each Tag entity should be unique. We should not have multiple tag vertices with the same value, e.g. there should only be one Python tag. Having a single tag vertex would make walking the graph from that point pretty damn easy.
  2. A Post should only connect to a Tag once. This is crucial from not only a data integrity standpoint, but also helps to ensure proper management and reporting.
  3. Whatever list of tags that we provide will be the only tags associated with the Post -- PUT-based logic.
  4. Return a list of tags with each blog post from the API.

We can accomplish these requirements pretty easily, we actually have the first requirement covered already. let's take a look at how our entities and mappers currently exist.

Requirement 1 -- Tag entities are unique:

class Tag(BaseBlogEntity, Vertex):
    tag = String()

class TagMapper(BaseBlogMapper, EntityMapper):
    entity = Tag
    unique_fields = ('tag')

That unique_fields property in the TagMapper is a list of fields that should be checked for uniqueness whenever a tag is saved.

# example
python_tag = mapper.create({'tag': 'Python'}, entity=Tag)
await mapper.save(python_tag).send()

When you save the python_tag entity, Gizmo will run a query that looks something like this:

blog.V().hasLabel('blog').has('tag', 'Python').tryNext().orElseGet{
    blog.addV('blog', 'tag', 'Python').next()
}

A very simple Get-or-Create. And if you had multiple values in the unique_fields property, Gizmo would just append more has('field', $ENTITY_FIELD_VALUE) checks onto the query.

Groovy is pretty cool, and it is even cooler that the Guys and Gals over at Tinkerpop decided to use it as a means to traverse the graph.

Requirement 2 --one connection between Post and Tag entities:

To fulfill this requirement we'll have to look at HasTag edge and create a new HasTagMapper.

# nothing needs to be updated here
class HasTag(BaseBlogEntity, Edge):
    pass

# our new mapper for the HasTag edge
class HasTagMapper(BaseBlogMapper, EntityMapper):
    entity = HasTag
    unique = 'both'

The unique property on the HasTagMapper is used to check for uniqueness on the out entity, in entity, and the label of the edge (since the label isn't explicitly defined on the HasTag object, Gizmo converts it to 'has_tag').

# example
has_tag = mapper.connect(some_post, python_tag, edge_entity=HasTag)
await maapper.save(has_tag).send()

When this code is executed Gizmo will do a few things: first it will check to see if either vertex needs to be saved and save them, then it will check if there is an edge between the two entities a lot like the query above. It looks something like this:

first = blog.V('some_id').next();
second = blog.V('some_oter_id').next();
edge = blog.V(first).both('has_tag').as('out').hasId('some_other_id').select('out').tryNext().orElseGet{
    first.addEdge('has_tag', second).next()
}

Here are are again with Gremlin/Groovy being a damn cool language and a great example that justifies Gizmo's decision to implement the Data Mapper pattern.

The first two lines in this script simply sets variables for each entity involved to be used in the edge creation portion, this example assumes the vertices already have an id and are simple retrievals (if the entities did not exist, they would had been created here). Now we can take a look at the edge section of the script, the first part of this query simply checks to see if there is an edge in both directions with the label of 'has_tag' between our some_post and the python_tag vertices. The rest of the script will create and return the edge if it doesn't currently exist.

Requirement 3 -- update the tags using only what is provided

To satisfy this requirement we can simply add a couple of methods to the PostMapper that will handle adding new tags and removing tags that were not specified.

class PostMapper(BaseBlogMapper, EntityMapper):
    entity = Post
    unique_fields = ('slug')

    async def get_tags(self, entity):
        gremlin = self.mapper.start(entity)

        gremlin.func('out', Param('has_tag', 'has_tag'))

        res = await self.mapper.query(gremlin=gremlin)

        return res

async def add_tags(self, entity, tags=None):
    tags = tags or []

    if not isinstance(tags, (list, set)):
        tags = [tags,]

    existing = await self.get_tags(entity)
    to_remove = [t for t in existing if t['tag'] not in tags]
    to_remove_name = [t['tag'] for t in to_remove]
    to_add = set(tags) - set(to_remove_name)

    for name in to_add:
        tag = self.mapper.create({'tag': name}, Tag)
        tag_edge = self.mapper.connect(entity, tag,
            edge_entity=HasTag)

        await self.mapper.save(tag_edge).send()

    # here we want to get the edge between the entity and
    # the tag to be removed and remove that
    for tag in to_remove:
        # dont put this heere in real code
        from gremlinpy.statement import GetEdge
        try:
            get_edge = GetEdge(entity['id'], tag['id'], 'has_tag',
                'both')
            self.mapper.gremlin.apply_statement(get_edge)
            res = await self.mapper.query(gremlin=self.mapper.gremlin)
            has_tag = res.first()
            await self.mapper.delete(has_tag).send()
        except:
            pass

Typically I would put the get_tags and add_tags methods in a class called TagMixin and mix that into all of the mappers whose entities should have tags. This makes your code a bit more flexible by utilizing composition, but this post isn't about that.

The get_tags method is pretty straight-forward; it gets the tags for the entity that was passed in. Things get interesting in the add_tags method.

We first retrieve all of the existing tags and create a list of tag entities that no longer will be used for this entity and a list of newly added tags. The code loop through the new tags and adds them, notice here that we are executing the query after every iteration. This is done to ensure that we create a simple template on the Gremlin server that will allow for faster code execution with every subsequent request.

Next we want to delete all of the edges between the Tag and the Post that were not passed in, but existed before. In this code we imported the GetEdge statement from gremlinpy.statement, statements are simple templates in that we can apply to our Gremlin scripts and this one simply gets the edge between two ids based on direction and label. Once we have the edge, we can delete it.

Requirement 4 -- adding tags to the Post entity's data

Each mapper has a data method that is asynchronous and accepts either a single entity or a collection of entities. This allows us to add more data to the representation of the entity, to further map more data onto it. Let's add that to our PostMapper.

class PostMapper(BaseBlogMapper, EntityMapper):
    entity = Post
    unique_fields = ('slug')

    async def data(self, entity):
        data = data = await super(PostMapper, self).data(entity=entity)
        data['Tags'] = await self.get_tags(entity)

        return data

Pretty simple. Now we have the option of augmenting a Post entity's data to include any Tag objects that it is connected to. We will need to make a few updates to the handlers to add expect 'tags' passed in with the request, utilize the add_tags method on the mapper, and to use the PostMapper.data method instead of the Post.data method:

class BlogHandler(RequestHandler):

    @property
    def data(self):
        return {
            'title': self.get_argument('title'),
            'content': self.get_argument('content'),
            'published': self.get_argument('published', False),
            'tags': self.get_arguments('tags'),
        }

    async def get(self, id):
        ...
        data = await mapper.data(entry)

        return self.write(data)

    async def post(self, id):
        ...
        await mapper.add_tags(entry, self.data['tags'])
        data = await mapper.data(entry)

        return self.write(data)

    async def put(self, id):
        ...
        await mapper.add_tags(entry, self.data['tags'])
        data = await mapper.data(entry)

        return self.write(data)

# PostsHandler already uses a method that returns data through the PostMapper.data method

Using the API

Refer back to part 1, or the README in associated files for instructions on how to run the API server.

Let's use our handy HTTPie tool (you can use whatever you want, hopefully it is as cool as HTTPie) to add a post with some tags:

Adding a new post with tags via the command line

Now we'll check our business rules by updating the post, but removing one of the tags:

Updating the last post with a single tag removed via the command line

As you can see, it is functioning the way we expect it to. The 'example' tag was removed from our post while keeping the 'python' one in tact. Now let's add a brand new tag:

Updating the last post via the command line and adding new tags

So now we know that the we are only adding one connection to a Tag per Post, but are we sure that we are not adding multiple Tags vertices? let's check:

I am using a little project that I wrote called GrimREPL. Yeah I know, it should be "Grem" like Gremlin. I messed up. I'm sorry, but still check it out.

Using the cli tool GrimREPL to show that only three tags have been added to the graph

One More Thing </Steve Jobs Filter>

What is the point of tags if we cannot query the API with them? We shall update the /posts/ end point to listen for a comma separated list of tags passed in with a query string like: ?tags=python,example.

class PostsHandler(RequestHandler):

    async def get(self):
        tags = self.get_argument('tags', None)

        if tags:
            post_mapper = mapper.get_mapper(Post)
            res = await post_mapper.get_by_tags(tags)
        else:
            g = mapper.start(Post)

            res = await mapper.query(gremlin=g)

        data = {'data': await res.mapper_data}

        return self.write(data)

We also need to add a new method to the PostMapper to retrieve posts by tag.

class PostMapper(BaseBlogMapper, EntityMapper):
    ...

    async def get_by_tags(self, tags=None):
        from gremlinpy.gremlin import within

        if not tags:
            return None

        tags = tags.split(',')
        g = self.mapper.start(self.entity)
        alias = Param('post', 'post')
        has_tag = Param('has_tag', 'has_tag')
        g.AS(alias).both(has_tag).has('"tag"', within(*tags))
        g.dedup().select(alias)

        res = await self.mapper.query(gremlin=g)

        return res

In this new method we write a query that simply checks all of the edges that have have a label of 'tag' and whose value is in the list of tags. The resulting Gremlin/Groovy script looks like this:

 blog.V().hasLabel('post').as('post').both('has_tag').has('tag', within('python', 'example')).dedup().select('post')

Let's query the API with this new query string:

Using the cli to query the API for posts by tag

Conclusion

Gizmo makes it pretty easy to add some business rules to your graph. I will write about Gizmo's other cool features that will allow you to fully expresses your business needs using Python against the Gremlin Server in the upcoming weeks.

Have fun and if you have any questions hit me up on Twitter @emehrkay.