Category Archives: Graph Database

Demystifying Supply Chain management with Neo4j

Supply Chain management is very complex topic in itself, rather than focusing on details relating to Supply Chain management this blog post targets modelling it using Neo4j (a Graph Database) to answer questions like
1)Distance between retailer and wholesaler.
2)Best route to get a product involving raw manufacturer, supplier, wholesaler and retailer.

It’s natural to visualize entire supply chain as a big graph with different nodes/vertices representing sellers and products.
We have different suppliers,products,wholesalers and retailers in our supply chain each represented by a vertex/node in graph (Refer to the image above)
1)All Suppliers are connected to Product they supply
2)All Products are connected to Wholesalers.
3)All Retailers are connected to Wholesalers for specific product they sale.

Graph Database are very natural choice for this kind of data, as all the data is interrelated to each other with some relationship and Graph DBs store these relationships as in , like pointers on disk avoiding any lookup and disk seek. Modelling these kind of data in any Relational Database or NOSql will result in either joins or lookup to be performed while traversing.For this blog post we will be modelling data around Neo4j a Graph Database.
Neo4j Supports property graph – Graph with any number of vertices and edges each can have any number of properties (key / value pairs)

In our example we have Supplier with properties

and relationship Delivers with property

It’s natural to have suppliers across different geographic locations, represented by Latitude and Longitude in graph and connected to wholesalers by product that supplier is supplying.

For this blog post what important is the distance between supplier, product and wholesaler and then to retailer.
If we look in above diagram each supplier is connected to product they are supplying which are then connected to Wholesalers and Retailers by relationship Delivers with km as one of the property calculated using Haversine formula representing distance between their geographic locations.

Now we can easily traverse graph to calculate shortest path, time between different suppliers and retailers for different products.

If a Wholesaler wants to know the nearest supplier for a specific products, we need to start with that product find all suppliers supplying it and then sort by the distance between supplier and Wholeselar which is sum of
Distance b/w Supplier and product + Distance between Product and Wholeselar

Happy reading … ☺

What is Database Sharding?

Sharding is the partitioning mechanism  of dividing a very large DB into small,faster, manageable parts called Shards  such that all these shards are independent of each other and shares nothing and thus can be distributed across different servers while enjoying all the benefits of horizontal scaling .

Sharding is just another name for “horizontal partitioning” of a database
Horizontal partitioning is a design principle whereby rows of a database table are held separately, rather than splitting by columns.Where each partition consist of some number of rows and forms a part of shard.

With time as the DB grows the time taken to query it increases exponentially sharding helps in scaling the DB horizontally to achieve the performance benefits.


What are the Differences Between Relational and Graph Databases?

Relations in the relational databases are handled by the joins between the tables,These can be managed in two ways :either in a permanent declarative fashion by the use of foreign
keys, or in a temporary fashion by ad-hoc joins in our queries

Hierarchical data on the other hand, is often modelled with self joins

Whereas in Graph Databases the relationships  physically exist on disk and they are named, directed, and can be themselves decorated with properties This means if you chose to, you could look at the relationships on disk and see how they “join” entities. Relationships are therefore first-class entities in a graph database and are semantically far stronger than those implied relationships reified at runtime in a relational store.

When to use Graph Databases?

Graph Databases are good for storing relationships between two things where the relationships are at least as important as the items themselves. Graph Database are very good at analyzing how things are related ,How many steps are required to get from one point to other. Analyzing relationships between people in social media are typical use case of graph databases.

Sometimes the entities in the analysis are not known for example the real person behing the twitt on twitter or a post on Facebook but the relationships may be more important toward identifying groups that interact or how central an individual is to a group

Semantic analysis, or trying to understand the meaning of things, favors graph database solutions. Graph databases are frequently implemented using the triple store concept of object,predicate, and relationship, a basic concept in semantic analysis.