SQL Graph Database – the new feature in SQL Server 2017

SQL Graph Databases - header

SQL Server 2017 introduced a lot of new features. One of the most surprising is Graph Database capabilities (aka SQL Graph and Graph Extensions). This new functionality enables users to model complex relationships using new database objects and queries. Let’s see what it is and how it works. 

What are Graph Databases?

In general, Graph Databases are one of the NoSQL database types. They are mainly focused on storing and analyzing relationships between objects (entities). This kind of databases use graph structures with nodesedges, and properties to store all kind of information. In graphs, nodes may be also called vertices or points and edges often are called relationships.

Nodes are main data elements that can have one or more properties. Nodes are connected to other nodes via edges. Edges are directional and describe a relationship between two data elements. Nodes can have multiple, even recursive relationships.

Graph Databases allow users to traverse relationships and entities in a much easier way than in classic Relational Database. They are great for analyzing interconnected data for finding non-obvious relationships. That makes them a perfect solution for the following use cases:

  • Fraud Detection
  • Recommendation Systems
  • Social Networks
  • Knowledge Graphs
  • Network Operations
Why were Graph Database capabilities introduced to SQL Server 2017?

This is a question that many of you probably ask. SQL Server is and always was a Relational Database Management System. Why Microsoft decided to make a move and change it slightly? Please find the answer from Shreya Verma (t), a PM for Graph DB in SQL Server.

Over last couple of years, we have added many features to SQL Server that attempt to bring intelligence closer to the database, where the data lives. Once such feature is Graph Extensions in SQL Server 2017. The feature allows you to describe your schema as a graph and model relationships between different data points naturally. It allows many real-world scenarios like recommendation systems, fraud detection, CRM to be implemented more efficiently using SQL technologies. With this feature added to SQL Server, our customers do not need to turn to a new platform just for their evolving graph database needs. They can now use the same ecosystem and tools to grow their application.

Thanks to such attitude Microsoft is named a leader of Operational DBMS in Gartner report three years in a row.

Gartner 2017 ODBMS

New Database Objects in SQL Server 2017

To enable Graph Database storage and processing Microsoft introduced two new table types: Node and Edge. These tables can be created under any schema, but they all belong to one logical graph. Node table is a collection of the similar type of nodes (nodes with the same label and set of properties). Edge table is a collection of similar type of edges (describing the same type of relationship). The architecture of SQL Server 2017 Graph looks like the following:

SQL Graph Databases - architecture

Node table

In SQL Server 2017 we have a new syntax available:

CREATE TABLE 
    [ database_name . [ schema_name ] . | schema_name . ] table_name 
    ( { <column_definition> } [ ,...n ] ) 
    AS [ NODE | EDGE ] 
[ ; ]

To create NODE table you have to add AS NODE to the CREATE TABLE statement. Example:

CREATE TABLE dbo.Person (
       ID INTEGER PRIMARY KEY, 
       name VARCHAR(100), 
       email VARCHAR(100)
) AS NODE;

As a result, you can find a new table in SSMS Object Explorer. To do it, you have to use version 17.0 or newer. You can download the latest version here.

SQL Graph Databases - Node table

There are few interesting things worth noticing. Firstly, the NODE table got a new icon. It’s a regular table icon with the additional blue solid dot. Secondly, there are two additional columns that weren’t included in our table definition script, but they were created by SQL Server automatically.

First column graph_id_<hex_string> is a hidden, internal column and cannot be queried. It’s probably used internally by SQL Server to manage graph data in the proper way.

SQL Graph Databases - Node table - select

Second $node_id_<hex_string> column is an implicit column always created for every node table. This column uniquely identifies a given node in the database. Values in the $node_id column are automatically generated as a combination of object_id of the node table and internally generated bigint value. When this column is selected, its value is displayed as a JSON string. Fortunately, you don’t have to use full column name to reference it in queries. You can reference $node_id_<hex_string> using $node_id pseudo column name.

SQL Graph Databases - Node table - select2

Edge table

To create EDGE table you have to add AS EDGE to the CREATE TABLE statement. Here is an example:

CREATE TABLE dbo.friends (
       start_date date
) AS EDGE;

SQL Graph Databases - Edge table

Similarly to the NODE table, the EDGE table also got a new icon. It’s a regular table icon with the additional two blue connected circles. It also has additional implicitly created columns. This time there are eight of them. Five of them are internal, hidden columns:

  • graph_id_<hex_string>
  • from_obj_id_<hex_string>
  • from_id_<hex_string>
  • to_obj_id_<hex_string>
  • to_id_<hex_string>

These columns are used internally by SQL Server and cannot be queried.

SQL Graph Databases - Edge table - select

Remaining three columns are available for the user.

First column $edge_id_<hex_string> uniquely identifies a given edge in the database.  Values in the $edge_id column are automatically generated as a combination of object_id of the edge table and internally generated bigint value. When this column is selected, its value is displayed as a JSON string. You can reference $edge_id_<hex_string> using $edge_id pseudo column name.

Second column $from_id_<hex_string> stores the $node_id of the node, from where the edge originates. Third column $to_id_<hex_string> stores $node_id of the node, at which the edge terminates. Both columns can be accessed with pseudo column names:  $from_id  and $to_id.

SQL Graph Databases - Edge table - select2

How to insert data?

As we already know how to create node and edge tables, now we need to insert some data. This can be done with using regular INSERT statements, there are just a few details we should remember.

Inserting into a NODE table is same as inserting into any relational table. Values for the $node_id column are automatically generated.

Example:

INSERT INTO dbo.Person VALUES (1, 'Marek Masko', 'marek@sqlterritory.com');
INSERT INTO dbo.Person VALUES (2, 'Bob', 'bob@sqlterritory.com');

Inserting into EDGE table is also nothing exceptional. Values for the $edge_id column are automatically generated, but values for $from_id and $to_id columns must be provided by a user.

INSERT INTO dbo.friends VALUES (
        (SELECT $node_id FROM dbo.Person WHERE name = 'Marek Masko'),
        (SELECT $node_id FROM dbo.Person WHERE name = 'Bob'), 
        '01/01/2018'
);

Remember that edges are directional. If you want to create a bidirectional relationship, then you have to insert two rows.

How to SELECT data?

Together with two new table types, we got a new search condition: MATCH. This comes from Cypher Query Language, special graph query language developed by Neo4j. In 2015 it was made open source and since then it has become the default language for virtually all graph databases. MATCH can be used in the SELECT statement as part of WHERE clause. It can be used only with graph NODE and EDGE tables.

MATCH (<graph_search_pattern>) 

<graph_search_pattern>::= 
    {<node_alias> { 
                    { <-( <edge_alias> )- } 
                  | { -( <edge_alias> )-> } 
                <node_alias> 
                } 
    } 
    [ { AND } { ( <graph_search_pattern> ) } ] 
    [ ,...n ] 

<node_alias> ::= 
    node_table_name | node_alias 
<edge_alias> ::= 
    edge_table_name | edge_alias

It’s not really visible from the syntax definition, but MATCH condition uses new, special notation. It’s ASCII-art style syntax for pattern matching.

SQL Graph Databases - Match

Relationship ‘friends is presented with the ASCII arrow -(friends)->with the label. This is a completely new convention in T-SQL. The direction of the arrow is important and describes the direction of the relationship. In one MATCH condition, you can include several relationships in either direction.

Some examples:

-- use MATCH in SELECT to display all relationships
SELECT Person1.name AS SourceName, Person2.name AS TargetName
FROM Person Person1, friends, Person Person2
WHERE MATCH(Person1-(friends)->Person2)

-- use MATCH in SELECT to find who Marek likes
SELECT Person2.name AS FriendName
FROM Person Person1, friends, Person Person2
WHERE MATCH(Person1-(friends)->Person2)
AND Person1.name = 'Marek Masko';


-- use MATCH in SELECT to find who is liked by Marek
SELECT Person2.name AS FriendName
FROM Person Person1, friends, Person Person2
WHERE MATCH(Person1<-(friends)-Person2)
AND Person1.name = 'Marek Masko';


-- Different table order
SELECT Person2.name AS FriendName
FROM Person Person1, Person Person2, friends
WHERE MATCH(Person1-(friends)->Person2)
AND Person1.name = 'Marek Masko';


-- Second level friends
SELECT Person3.name AS FriendName
FROM Person Person1, Person Person2, friends, friends friends2, Person Person3
WHERE MATCH(Person1-(friends)->Person2-(friends2)->Person3)
AND Person1.name = 'Marek Masko';

To make these examples working you need to populate your tables with additional data. I’m not creating a ready script for you, because if you create relationships on your own, then you will better understand how it works and how to write queries to discover certain relationships.

Node names inside MATCH can be repeated. In such case, the same node can be traversed an arbitrary number of times in the same query. If you are interested in traversing different nodes of the same type you have to create many different aliases for the same table (node type) and use them in the query. An edge name cannot be repeated inside MATCH condition. For each usage, you need to create a new alias. An edge can point in either direction, but it must have explicit direction.

You can combine MATCH condition with other expressions only by using AND in WHERE clause. Usage of NOT and OR operators is not supported.

Indexes

Storage of graph data in SQL Server 2017 is based on tables. To speed up query processing these tables should be indexed. Exactly the same way how it is done with regular tables. Microsoft recommends creating unique constraint or index on the $node_id column at the time of creation of node table. If this is not performed by the user, a default unique, non-clustered index is automatically created.

Worth noticing is that any index on a graph pseudo column is created on the underlying internal columns. For example, index created on $node_id column appears on the internal graph_id<hex_string> column.

SQL Graph Databases - Node table - indexSQL Graph Databases - Edge table - index

Known limitations

At this very early stage Graph Database has a lot of limitations as well as missing features. Below I want to present a short list of the more important ones:

  • Local or global temporary tables cannot be node or edge tables.
  • Table types and table variables cannot be declared as a node or edge table.
  • Node and edge tables cannot be created as system-versioned temporal tables.
  • Node and edge tables cannot be memory-optimized tables.
  • Users cannot update the $from_id and $to_id columns of an edge using UPDATE statement. To update the nodes that an edge connects, users will have to insert the new edge pointing to new nodes and delete the previous one.
  • No OR and NOT operators in MATCH condition.
  • Cross-database queries on graph objects are not supported.
  • Polymorphism is not supported (The ability to find any type of node connected to a given node).
  • SQL supports only unidirectional mapping.
  • Transitive closure is not supported (The ability to recurse through a combination of nodes and edges, an arbitrary number of times).
  • There is no direct way or a wizard available to convert existing traditional database tables to a graph.
  • There is no GUI, so we have to rely on Power BI to plot and view the graph.
  • Missing many of graph functions like „Shortest Path”.
Usage from application

To use SQL Graph databases from an application you can use GraphView DLL delivered by Microsoft. It’s free and open-source library available on GitHub: https://github.com/Microsoft/GraphView. It enables users to use SQL Server or Azure SQL Database to efficiently manage graphs.

Summary

SQL Graph Databases in SQL Server 2017 is a very interesting feature opening up many new data processing capabilities. In its first release, it provides only basic functionalities and capabilities and has a long list of known issues. Fortunately for some of them, various workarounds can be implemented (with the usage of CTEs, triggers, and unique indexes). Despite its disadvantages, it may turn out that it is one of best options to store certain complex relationships.

-Marek

Share it:
FacebooktwitterredditpinterestlinkedintumblrmailFacebooktwitterredditpinterestlinkedintumblrmail

2 thoughts on “SQL Graph Database – the new feature in SQL Server 2017”

  1. Interesting article, thank you Marek 🙂
    With regards to GraphView, do you know if Microsoft are still supporting and developing this?
    Except for the readme file, the last commits are 2015\2016. It doesn’t seem to be receiving much love.

Leave a Reply

Your email address will not be published. Required fields are marked *

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close