GoSineSim -- Cosine Similarity in Golang

A few days ago I was chatting it up with my internet homegirl Ann@Afrolicious about one of her many upcoming World Domination™ projects and the topic moved to relevancy between bodies of text. I mentioned that I used a fairly easy-to-implement algorithm in the past with much success called Cosine Similarity. At the very same time I happen to be looking at some sing the praises of Golang.

Go has been something that I've been wanting to learn/build something with for a while and it hit me -- I could build a simple command line app to get my feet wet. I will make a simple way to calculate the cosine similarity between two or more JSON objects. Easy enough. I wrote up some requirements in my "Things I'd like to do" notes while at work and started coding as soon as I got home.

Go gives a great first impression. It is concise, easy to understand, and its interfaces make a lot of sense for a newcomer. I was able to code this first, and very rough, version of GoSineSim in only a few hours.

My goal with the project is to have a tool that can process large bodies of JSON objects and utilize GO's built-in concurrency to quickly get results that would otherwise take a lot of time, CPU, and memory with other langs that I use.

None of that cool stuff is implemented yet, so dont go sweating me about it if you read source code

Usage

Simply pass the executable two or more JSON objects in the format of {"id": String, "data": {String: Float}} and get the scores.

./gosinesim -source='{"id": "15", "data": {"cars": 30, "money": 99}}' --pool='[{"id": "44", "data": {"cars": 87, "money": 40}}]'

Result:

[{"Similarity":0.6632728204403626,"Id":"44","Data":{"cars":87,"money":40}}]

Future

I will keep updating the project, but probably wont write about it here unless I manage to create a young Skynet based on it. You should still keep up with the project over at Github though.

Have fun and if you have any questions hit me up on Twitter @emehrkay.