Personal blog written from scratch using Node.js, Bootstrap, and MySQL. https://jrtechs.net
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

302 lines
9.3 KiB

  1. # What is Gremlin?
  2. Gremlin is a graph traversal language: think of Gremlin as the SQL for
  3. graph databases. Gremlin is not a graph database server, it is a
  4. language; but, there is a Gremlin Server and a Gremlin Console
  5. available for interacting with graph databases. It is possible to use
  6. Gremlin on large database platforms like
  7. [Titan](https://www.digitalocean.com/community/tutorials/how-to-set-up-the-titan-graph-database-with-cassandra-and-elasticsearch-on-ubuntu-16-04)
  8. and [HBase](https://docs.janusgraph.org/latest/hbase.html).
  9. # Graph Data Base Basics
  10. A graph database is based on graph theory. A graph is composed of
  11. nodes, edges, and properties. A key object/component in a graph
  12. database is stored as a node. Nodes are connected via edges
  13. representing relationships. For example, you may represent people as
  14. nodes and have edges representing friendships. You can assign
  15. properties to both nodes and edges. A person (node) may have the
  16. properties of age and name, where a friendship (edge) may have a
  17. start date property.
  18. ## Why Graph Databases?
  19. Graph databases are great for modeling data where the value lies in
  20. the shape of the graph. Graph databases also allow to to model more
  21. complex relationships which would be difficult to model in a normal
  22. table-based database.
  23. ## Gremlin Installation
  24. Download and extract the following:
  25. - [Gremlin Console](https://www.apache.org/dyn/closer.lua/tinkerpop/3.3.3/apache-tinkerpop-gremlin-console-3.3.3-bin.zip)
  26. - [Gremlin Server](https://www.apache.org/dyn/closer.lua/tinkerpop/3.3.3/apache-tinkerpop-gremlin-server-3.3.3-bin.zip)
  27. Start the Gremlin server by running it with the start script in the
  28. bin folder. As a prerequisite for running gremlin, you must have Java
  29. installed on your computer.
  30. ```bash
  31. ./gremlin-server.sh
  32. ```
  33. Start the Gremlin console by running the gremlin.sh or gremlin.bat
  34. script in the bin folder.
  35. ```bash
  36. ./gremlin.sh
  37. ```
  38. Now you need to instantiate a new graph on the server to use. To to
  39. that, execute the following commands in the Gremlin console.
  40. ```java
  41. #Creates a empty graph
  42. gremlin> graph = EmptyGraph.instance()
  43. ==>emptygraph[empty]
  44. #Opens a connection to the server -- listens on localhost by default
  45. gremlin> cluster = Cluster.open()
  46. ==>localhost/127.0.0.1:8182
  47. #Tells the server to use g as the graph traversal source
  48. gremlin> g = graph.traversal().withRemote(DriverRemoteConnection.using(cluster, "g"))
  49. ==>graphtraversalsource[emptygraph[empty], standard]
  50. ```
  51. # Gremlin Syntax
  52. Now that you have your gremlin server and console set up, you are
  53. ready to start executing Gremlin queries.
  54. ## Adding a Vertex
  55. In Gremlin nodes are referred to as "Vertexes". To add a node/vertex
  56. to the graph, you simply use the command addV() on your graph
  57. traversal source. For consistency, most people use "g" as their
  58. default graph traversal source. To append properties to your your
  59. vertex, you add a series of ".property('property_name',
  60. 'property_value')" strings to the add vertex query.
  61. EX:
  62. ```java
  63. g.addV('student').property('name', 'Jeffery').property('GPA', 4.0);
  64. ```
  65. ## Updating a Property
  66. Unlike SQL, you are not limited to a specific schema in a graph
  67. database. If you want to add or change a property on a vertex or
  68. edge, you simply use the property command again. The "g.V(1)" in the
  69. following example refers to a specific vertex with the primary id of
  70. 1-- the graph database auto assigns these ids. You can replace
  71. "g.V(1)" with a command to select a specific vertex or edge.
  72. ```java
  73. g.V(1).property('name', 'Jeffery R');
  74. ```
  75. ## Selection
  76. Selecting nodes and edges is the most complicated part of Gremlin. The
  77. concept is not particularly hard, but, there are dozens of ways to do
  78. graph traversals and selections. I will cover the most common aways to
  79. traverse a graph.
  80. This example will select all vertexes which have the label "student".
  81. The ".valueMap()" command appended to the end of the query makes
  82. Gremlin return a map of all the objects it returns with their
  83. properties.
  84. ```java
  85. g.V().hasLabel('student').valueMap();
  86. ```
  87. In this following example, instead of returning a ValueMap of values,
  88. we are just returning the names of the students in the graph.
  89. ```java
  90. g.V().hasLabel('student').values('name');
  91. ```
  92. This example will return the GPA of the student with the name "Jeffery
  93. R".
  94. ```java
  95. g.V().hasLabel('student').has('name', 'Jeffery R').values('gpa');
  96. ```
  97. This command will return all the students in order of their GPA.
  98. ```java
  99. g.V().hasLabel('student').order().by('gpa', decr).value('name')
  100. ```
  101. ## Adding Edges
  102. The easiest way (my opinion) to add edges in Gremlin is by using
  103. aliasing. In this example we select two nodes and assign them a name:
  104. in this case it is "a", and "b". After we have selected two edges, we
  105. can add an edge to them using the "addE()" command. The syntax of this
  106. is nice because we know that "a" is friends with "b"-- it is easy to
  107. tell the direction of the edge.
  108. ```java
  109. g.V(0).as('a').V(1).as('b').addE('knows')
  110. .from('a').to('b');
  111. ```
  112. # Using Gremlin with Java
  113. Now that you know the basic syntax of Gremlin, you are ready to use it
  114. somewhere other than the Gremlin console. If you are trying to use
  115. Gremlin with Java, there is a great Maven dependency for TinkerPop and
  116. Gremlin. If you want to quickly connect to your Gremlin server with
  117. Java, make sure your server is set up exactly as it was before this
  118. tutorial started discussing Gremlin syntax.
  119. ## Maven dependency for Java:
  120. ```html
  121. <!-- https://mvnrepository.com/artifact/com.tinkerpop/gremlin-core -->
  122. <dependency>
  123. <groupId>com.tinkerpop</groupId>
  124. <artifactId>gremlin-core</artifactId>
  125. <version>3.0.0.M7</version>
  126. </dependency>
  127. <!-- https://mvnrepository.com/artifact/org.apache.tinkerpop/gremlin-driver -->
  128. <dependency>
  129. <groupId>org.apache.tinkerpop</groupId>
  130. <artifactId>gremlin-driver</artifactId>
  131. <version>3.3.3</version>
  132. </dependency>
  133. <dependency>
  134. <groupId>org.apache.tinkerpop</groupId>
  135. <artifactId>tinkergraph-gremlin</artifactId>
  136. <version>3.3.3</version>
  137. </dependency>
  138. ```
  139. It is helpful to wrap everything relating to the graph database
  140. connection into a single Java class. This is roughly the code that I
  141. usually use to interact with a Gremlin Server-- anybody is free to use
  142. it.
  143. ```java
  144. public class GraphConnection
  145. {
  146. /** Stores/manages client connections **/
  147. private Cluster cluster;
  148. /** Connection to the graph db */
  149. private Client client;
  150. public RemoteConnection()
  151. {
  152. Cluster.Builder b = Cluster.build();
  153. b.addContactPoint("localhost");
  154. b.port(8182);
  155. this.cluster = b.create();
  156. this.client = cluster.connect();
  157. }
  158. public synchronized ResultSet queryGraph(String q)
  159. {
  160. return this.client.submit(q);
  161. }
  162. public void closeConnection()
  163. {
  164. this.cluster.close();
  165. }
  166. }
  167. ```
  168. ## Basic GraphConnection.java Usage:
  169. ```java
  170. RemoteConnection con = new RemoteConnection()
  171. String query = "g.V().hasLabel('player')" +
  172. ".has('id', '" + p1 + "')" +
  173. ".as('p1')" +
  174. "V().hasLabel('player')" +
  175. ".has('id', '" + p2 + "')" +
  176. ".as('p2')" +
  177. ".addE('friends')" +
  178. ".from('p1').to('p2')";
  179. this.con.queryGraph(query);
  180. ```
  181. ## Overly complex usage with a lambda statement
  182. ```java
  183. /**
  184. * Fetches the list of a player's friends.
  185. *
  186. * @param id steam id
  187. * @return list of friends
  188. */
  189. private List<Player> getFriendsFromGraph(String id)
  190. {
  191. List<Player> friends = new ArrayList<>();
  192. String query = "g.V().hasLabel('player')" +
  193. ".has('id', '" + id + "')" +
  194. ".both().valueMap()";
  195. this.con.queryGraph(query).stream().forEach(r ->
  196. friends.add(new Player(
  197. ((ArrayList) (((HashMap<String, Object>) (r.getObject()))
  198. .get("name"))).get(0).toString(),
  199. ((ArrayList) (((HashMap<String, Object>) (r.getObject()))
  200. .get("id"))).get(0).toString()))
  201. );
  202. return friends;
  203. }
  204. ```
  205. The most important thing to do while playing around with Gremlin in
  206. Java is to keep an eye on the return type. From experience, I can say
  207. that it is often easier to return the vertex from your query rather
  208. than returning the valueMap.
  209. Without returning the valueMap in the query, you can directly access
  210. the vertex in the result rather than doing some voodoo witchcraft and
  211. casting between ArrayLists and HashMaps.
  212. The previous example could be re-written as this:
  213. ```java
  214. List<Player> friends = new ArrayList<>();
  215. String query = "g.V().hasLabel('player')" +
  216. ".has('id', '" + id + "')" +
  217. ".both()";
  218. for(Result r: this.con.queryGraph(query))
  219. {
  220. friends.add(new Player(r.getVertex("name").value().toString),
  221. r.getVertex("id").value().toString));
  222. }
  223. ```
  224. You now know enough about Gremlin to be dangerous with it. Yay! If you
  225. want to do more than basic things with Gremlin, I highly suggest that
  226. you look at the tutorial [SQL 2 Gremlin](http://sql2gremlin.com/). If
  227. you plan on deploying this to production, it is recommended that you
  228. use HBase for a persistent back end storage server.
  229. # Resources
  230. - [SQL 2 Gremlin](http://sql2gremlin.com/)
  231. - [Practical Gremlin](http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html)
  232. - [Apache TinkerPop](http://tinkerpop.apache.org/)
  233. - [Steam Friends Graph (Personal Gremlin Project)](https://github.com/jrtechs/SteamFriendsGraph)