Personal blog written from scratch using Node.js, Bootstrap, and MySQL. https://jrtechs.net
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

259 lines
9.2 KiB

  1. ## Graph Data Base Basics
  2. A graph database is based on graph structures. A graph is composed of nodes, edges, and properties. A key
  3. object/component in a graph database is stored as a node. Nodes are connected together via edges representing
  4. relationships. For example, you may represent people as nodes and have edges representing who people are friends
  5. with. You can assign properties to both nodes and edges. A person (node) may have the properties of age and name,
  6. where a friendship (edge) may have a start date property.
  7. #### Why Graph Databases?
  8. Graph databases are great for modeling data where the value lies in the shape of the graph, or, it would be difficult
  9. to model in a traditional table based database.
  10. #### What is Gremlin?
  11. Gremlin is a graph traversal language; think of Gremlin as SQL but for graph databases. Gremlin is not
  12. a graph database server, it is a language; but, there is a Gremlin Server and Gremlin Console available for
  13. interacting with graph databases using Gremlin. It is possible to use Gremlin on large database platforms
  14. like [Titan](https://www.digitalocean.com/community/tutorials/how-to-set-up-the-titan-graph-database-with-cassandra-and-elasticsearch-on-ubuntu-16-04)
  15. and [HBase](https://docs.janusgraph.org/latest/hbase.html).
  16. ## Gremlin Installation
  17. Download and extract the following:
  18. - [Gremlin Console](https://www.apache.org/dyn/closer.lua/tinkerpop/3.3.3/apache-tinkerpop-gremlin-console-3.3.3-bin.zip)
  19. - [Gremlin Server](https://www.apache.org/dyn/closer.lua/tinkerpop/3.3.3/apache-tinkerpop-gremlin-server-3.3.3-bin.zip)
  20. Start the Gremlin server by running it with the start script in the bin folder.
  21. ```
  22. ./gremlin-server.sh
  23. ```
  24. Start the Gremlin console by running the gremlin.sh or gremlin.bat script in the bin folder of the apache-tinkerpop folder.
  25. ```
  26. ./gremlin.sh
  27. ```
  28. Now you need to instantiate a new graph on the server to use. To to that, execute the following commands.
  29. ```gremlin
  30. #Creates a empty graph
  31. gremlin> graph = EmptyGraph.instance()
  32. ==>emptygraph[empty]
  33. #Opens a connection to the server -- listens on localhost by default
  34. gremlin> cluster = Cluster.open()
  35. ==>localhost/127.0.0.1:8182
  36. #Tells the server to use g as the graph traversal source
  37. gremlin> g = graph.traversal().withRemote(DriverRemoteConnection.using(cluster, "g"))
  38. ==>graphtraversalsource[emptygraph[empty], standard]
  39. ```
  40. ## Gremlin Syntax
  41. Now that you have your gremlin server and console set up, you are ready to start executing Gremlin queries.
  42. #### Add a Vertex
  43. For Gremlin, nodes are referred to as "Vertexes". To add a node/vertex to the graph, you simply use the
  44. command addV('node label') on your graph traversal source. For consistency, most people and documentation
  45. use "g" as their default graph traversal source. To append properties to your your node, you string a series of
  46. .property('property_name', 'property_value') to the queries.
  47. ex:
  48. ```gremlin
  49. g.addV('student').property('name', 'Jeffery').property('GPA', 4.0);
  50. ```
  51. #### Update a Property
  52. Unlike SQL, you are not limited to a specific schema for a graph database. If you want to add or change
  53. a property of a vertex or edge, you simply call its .property('property_name', 'property_value').
  54. The g.V(1) in the example refers to a specific node with the primary id of 1, these ids are auto assigned by the graph database.
  55. You can replace g.V(1) with a command to select a specific node.
  56. ```gremlin
  57. g.V(1).property('name', 'Jeffery R');
  58. ```
  59. #### Selection
  60. Selecting nodes and edges is the most complicated part of Gremlin. The concept is not particularly hard but, there
  61. are dozens of ways to do traversals and selections. I will cover the most common and helpful ways to traverse a
  62. graph.
  63. This example will select all vertexes which have the label "student". The .valueMap() appended to the end means
  64. that it will returns a map of all the properties of the nodes it returns.
  65. ```gremlin
  66. g.V().hasLabel('student').valueMap();
  67. ```
  68. In this example instead of returning a ValueMap of values, we are just returning the names of the students
  69. in the graph.
  70. ```gremlin
  71. g.V().hasLabel('student').values('name');
  72. ```
  73. This example will return the GPA of the student with the name "Jeffery R".
  74. ```gremlin
  75. g.V().hasLabel('student').has('name', 'Jeffery R').values('gpa');
  76. ```
  77. This command will all the students in order of their GPA.
  78. ```gremlin
  79. g.V().hasLabel('student').order().by('gpa', decr).value('name')
  80. ```
  81. #### Adding Edges
  82. If you want to add a edge (relationship/connection) between two nodes, the easiest way (my opinion) to do it in Gremlin is by
  83. using something called aliasing. In this example we select two nodes and give them a name, in this case it is "a", and "b".
  84. After we have selected two edges, we can add an edge to them using the addE('Relation_Name') command. The syntax of this is
  85. nice because we know that "a" is friends with "b"-- it is easy to tell the direction
  86. ```gremlin
  87. g.V(0).as('a').V(1).as('b').addE('knows')
  88. .from('a').to('b');
  89. ```
  90. ## Using Gremlin With Java
  91. Now that you know the syntax of Gremlin, you are ready to use it somewhere other than just the Gremlin console. If you
  92. are trying to use Gremlin with Java, there is a nice Maven dependency for TinkerPop and Gremlin. If you want to quickly
  93. connect to your server with Java, make sure your server is set up exactly as it was before this tutorial started discussing
  94. Gremlin Syntax.
  95. ```maven
  96. <!-- https://mvnrepository.com/artifact/com.tinkerpop/gremlin-core -->
  97. <dependency>
  98. <groupId>com.tinkerpop</groupId>
  99. <artifactId>gremlin-core</artifactId>
  100. <version>3.0.0.M7</version>
  101. </dependency>
  102. <!-- https://mvnrepository.com/artifact/org.apache.tinkerpop/gremlin-driver -->
  103. <dependency>
  104. <groupId>org.apache.tinkerpop</groupId>
  105. <artifactId>gremlin-driver</artifactId>
  106. <version>3.3.3</version>
  107. </dependency>
  108. <dependency>
  109. <groupId>org.apache.tinkerpop</groupId>
  110. <artifactId>tinkergraph-gremlin</artifactId>
  111. <version>3.3.3</version>
  112. </dependency>
  113. ```
  114. It is helpful to wrap everything relating to the graph database connection into a single Java class. This is roughly
  115. the code that I usually use to interact with a Gremlin Server, anybody is free to use it.
  116. ```java
  117. public class GraphConnection
  118. {
  119. /** Stores/manages client connections **/
  120. private Cluster cluster;
  121. /** Connection to the graph db */
  122. private Client client;
  123. public RemoteConnection()
  124. {
  125. Cluster.Builder b = Cluster.build();
  126. b.addContactPoint("localhost");
  127. b.port(8182);
  128. this.cluster = b.create();
  129. this.client = cluster.connect();
  130. }
  131. public synchronized ResultSet queryGraph(String q)
  132. {
  133. return this.client.submit(q);
  134. }
  135. public void closeConnection()
  136. {
  137. this.cluster.close();
  138. }
  139. }
  140. ```
  141. ex GraphConnection Usage:
  142. ```java
  143. RemoteConnection con = new RemoteConnection()
  144. String query = "g.V().hasLabel('player')" +
  145. ".has('id', '" + p1 + "')" +
  146. ".as('p1')" +
  147. "V().hasLabel('player')" +
  148. ".has('id', '" + p2 + "')" +
  149. ".as('p2')" +
  150. ".addE('friends')" +
  151. ".from('p1').to('p2')";
  152. //System.out.println(query);
  153. this.con.queryGraph(query);
  154. ```
  155. Overly complex usage with lambda example.
  156. ```java
  157. /**
  158. * Fetches a list of friends from the graph database
  159. *
  160. * @param id steam id
  161. * @return list of friends
  162. */
  163. private List<Player> getFriendsFromGraph(String id)
  164. {
  165. List<Player> friends = new ArrayList<>();
  166. String query = "g.V().hasLabel('player')" +
  167. ".has('id', '" + id + "')" +
  168. ".both().valueMap()";
  169. this.con.queryGraph(query).stream().forEach(r ->
  170. friends.add(new Player(
  171. ((ArrayList) (((HashMap<String, Object>) (r.getObject()))
  172. .get("name"))).get(0).toString(),
  173. ((ArrayList) (((HashMap<String, Object>) (r.getObject()))
  174. .get("id"))).get(0).toString()))
  175. );
  176. return friends;
  177. }
  178. ```
  179. The most important thing to do while playing around with Gremlin in Java is to keep an eye on the
  180. return type. From experience, I can say that it is often easier to return the node/edges from your
  181. query rather than doing a valueMap.
  182. Without adding valueMap()/values() to the end of the query, you now can directly access the vertex or edge in the result rather than
  183. doing some voodoo witchcraft and casting between ArrayLists and HashMaps.
  184. The previous example could be re-written as this:
  185. ```java
  186. List<Player> friends = new ArrayList<>();
  187. String query = "g.V().hasLabel('player')" +
  188. ".has('id', '" + id + "')" +
  189. ".both()";
  190. for(Result r: this.con.queryGraph(query))
  191. {
  192. friends.add(new Player(r.getVertex("name").value().toString),
  193. r.getVertex("id").value().toString),)
  194. }
  195. ```
  196. Now you know enough to be dangerous with Gremlin. Yay! If you want to do more than basic things with Gremlin,
  197. I highly suggest that you take a look at the tutorial [SQL 2 Gremlin](http://sql2gremlin.com/).
  198. If you plan on deploying this to production, it is recommended to use HBase with JanusGraph for a persistent back end storage
  199. server.
  200. ## Resources
  201. - [SQL 2 Gremlin](http://sql2gremlin.com/)
  202. - [Practical Gremlin](http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html)
  203. - [Apache TinkerPop](http://tinkerpop.apache.org/)
  204. - [Steam Friends Graph (Personal Gremlin Project)](https://github.com/jrtechs/SteamFriendsGraph)