diff --git a/blogContent/headerImages/playTimes.png b/blogContent/headerImages/playTimes.png new file mode 100644 index 0000000..fd4256f Binary files /dev/null and b/blogContent/headerImages/playTimes.png differ diff --git a/blogContent/posts/data-science/media/steamGames/output_11_1.png b/blogContent/posts/data-science/media/steamGames/output_11_1.png new file mode 100644 index 0000000..1c2e3ed Binary files /dev/null and b/blogContent/posts/data-science/media/steamGames/output_11_1.png differ diff --git a/blogContent/posts/data-science/media/steamGames/output_13_1.png b/blogContent/posts/data-science/media/steamGames/output_13_1.png new file mode 100644 index 0000000..8b1cbb0 Binary files /dev/null and b/blogContent/posts/data-science/media/steamGames/output_13_1.png differ diff --git a/blogContent/posts/data-science/media/steamGames/output_15_1.png b/blogContent/posts/data-science/media/steamGames/output_15_1.png new file mode 100644 index 0000000..8b86e94 Binary files /dev/null and b/blogContent/posts/data-science/media/steamGames/output_15_1.png differ diff --git a/blogContent/posts/data-science/media/steamGames/output_17_0.png b/blogContent/posts/data-science/media/steamGames/output_17_0.png new file mode 100644 index 0000000..8265d56 Binary files /dev/null and b/blogContent/posts/data-science/media/steamGames/output_17_0.png differ diff --git a/blogContent/posts/data-science/media/steamGames/output_19_0.png b/blogContent/posts/data-science/media/steamGames/output_19_0.png new file mode 100644 index 0000000..b422f60 Binary files /dev/null and b/blogContent/posts/data-science/media/steamGames/output_19_0.png differ diff --git a/blogContent/posts/data-science/media/steamGames/output_25_0.png b/blogContent/posts/data-science/media/steamGames/output_25_0.png new file mode 100644 index 0000000..487c5df Binary files /dev/null and b/blogContent/posts/data-science/media/steamGames/output_25_0.png differ diff --git a/blogContent/posts/data-science/media/steamGames/output_27_1.png b/blogContent/posts/data-science/media/steamGames/output_27_1.png new file mode 100644 index 0000000..da75984 Binary files /dev/null and b/blogContent/posts/data-science/media/steamGames/output_27_1.png differ diff --git a/blogContent/posts/data-science/media/steamGames/output_31_0.png b/blogContent/posts/data-science/media/steamGames/output_31_0.png new file mode 100644 index 0000000..d3bbc94 Binary files /dev/null and b/blogContent/posts/data-science/media/steamGames/output_31_0.png differ diff --git a/blogContent/posts/data-science/media/steamGames/output_34_0.png b/blogContent/posts/data-science/media/steamGames/output_34_0.png new file mode 100644 index 0000000..24ec6be Binary files /dev/null and b/blogContent/posts/data-science/media/steamGames/output_34_0.png differ diff --git a/blogContent/posts/data-science/media/steamGames/output_37_0.png b/blogContent/posts/data-science/media/steamGames/output_37_0.png new file mode 100644 index 0000000..985dc41 Binary files /dev/null and b/blogContent/posts/data-science/media/steamGames/output_37_0.png differ diff --git a/blogContent/posts/data-science/media/steamGames/output_38_0.png b/blogContent/posts/data-science/media/steamGames/output_38_0.png new file mode 100644 index 0000000..c515f84 Binary files /dev/null and b/blogContent/posts/data-science/media/steamGames/output_38_0.png differ diff --git a/blogContent/posts/data-science/media/steamGames/output_9_1.png b/blogContent/posts/data-science/media/steamGames/output_9_1.png new file mode 100644 index 0000000..28d566c Binary files /dev/null and b/blogContent/posts/data-science/media/steamGames/output_9_1.png differ diff --git a/blogContent/posts/data-science/time-spent-in-steam-games.md b/blogContent/posts/data-science/time-spent-in-steam-games.md new file mode 100644 index 0000000..6087528 --- /dev/null +++ b/blogContent/posts/data-science/time-spent-in-steam-games.md @@ -0,0 +1,900 @@ +Last week I scrapped a bunch of data from the Steam API using my [Steam Graph Project](https://github.com/jrtechs/SteamFriendsGraph). +This project captures steam users, their friends, and the games that they own. +Using the Janus-Graph traversal object, I use the Gremlin graph query language to pull this data. +Since I am storing the hours played in a game as a property on the relationship between a player and a game node, I had to make a "join" statement to get the hours property with the game information in a single query. + +```java +Object o = graph.con.getTraversal() + .V() + .hasLabel(Game.KEY_DB) + .match( + __.as("c").values(Game.KEY_STEAM_GAME_ID).as("gameID"), + __.as("c").values(Game.KEY_GAME_NAME).as("gameName"), + __.as("c").inE(Game.KEY_RELATIONSHIP).values(Game.KEY_PLAY_TIME).as("time") + ).select("gameID", "time", "gameName").toList(); +WrappedFileWriter.writeToFile(new Gson().toJson(o).toLowerCase(), "games.json"); +``` + +Using the game indexing property on the players, I noted that I only ended up wholly indexing the games of 481 players after 8 hours. + +```java +graph.con.getTraversal() + .V() + .hasLabel(SteamGraph.KEY_PLAYER) + .has(SteamGraph.KEY_CRAWLED_GAME_STATUS, 1) + .count().next() +``` + +We now transition to Python and Matlptlib to visualize the data exported from our JanusGraph Query as a JSON object. +The dependencies for this [notebook](https://github.com/jrtechs/RandomScripts/tree/master/notebooks) can get installed using pip. + + +```python +!pip install pandas +!pip install matplotlib +``` + +``` + Collecting pandas + Downloading pandas-1.0.5-cp38-cp38-manylinux1_x86_64.whl (10.0 MB) + [K |████████████████████████████████| 10.0 MB 4.3 MB/s eta 0:00:01 + [?25hCollecting pytz>=2017.2 + Downloading pytz-2020.1-py2.py3-none-any.whl (510 kB) + [K |████████████████████████████████| 510 kB 2.9 MB/s eta 0:00:01 + [?25hRequirement already satisfied: numpy>=1.13.3 in /home/jeff/Documents/python/ml/lib/python3.8/site-packages (from pandas) (1.18.5) + Requirement already satisfied: python-dateutil>=2.6.1 in /home/jeff/Documents/python/ml/lib/python3.8/site-packages (from pandas) (2.8.1) + Requirement already satisfied: six>=1.5 in /home/jeff/Documents/python/ml/lib/python3.8/site-packages (from python-dateutil>=2.6.1->pandas) (1.15.0) + Installing collected packages: pytz, pandas + Successfully installed pandas-1.0.5 pytz-2020.1 +``` + +The first thing we are doing is importing our JSON data as a pandas data frame. +Pandas is an open-source data analysis and manipulation tool. +I enjoy pandas because it has native integration with matplotlib and supports operations like aggregations and groupings. + + +```python +import matplotlib.pyplot as plt +import pandas as pd + +games_df = pd.read_json('games.json') +games_df +``` + + +
+ | gameid | +time | +gamename | +
---|---|---|---|
0 | +210770 | +243 | +sanctum 2 | +
1 | +210770 | +31 | +sanctum 2 | +
2 | +210770 | +276 | +sanctum 2 | +
3 | +210770 | +147 | +sanctum 2 | +
4 | +210770 | +52 | +sanctum 2 | +
... | +... | +... | +... | +
36212 | +9800 | +9 | +death to spies | +
36213 | +445220 | +0 | +avorion | +
36214 | +445220 | +25509 | +avorion | +
36215 | +445220 | +763 | +avorion | +
36216 | +445220 | +3175 | +avorion | +
36217 rows × 3 columns
++ | time | +|||
---|---|---|---|---|
+ | count | +min | +max | +mean | +
gamename | ++ | + | + | + |
龙魂时刻 | +1 | +14 | +14 | +14.000000 | +
gryphon knight epic | +1 | +0 | +0 | +0.000000 | +
growing pains | +1 | +0 | +0 | +0.000000 | +
shoppy mart: steam edition | +1 | +0 | +0 | +0.000000 | +
ground pounders | +1 | +0 | +0 | +0.000000 | +
... | +... | +... | +... | +... | +
payday 2 | +102 | +0 | +84023 | +5115.813725 | +
team fortress 2 | +105 | +7 | +304090 | +25291.180952 | +
unturned | +107 | +0 | +16974 | +1339.757009 | +
garry's mod | +121 | +0 | +311103 | +20890.314050 | +
counter-strike: global offensive | +129 | +0 | +506638 | +46356.209302 | +
9235 rows × 4 columns
++ | time | +|||
---|---|---|---|---|
+ | count | +min | +max | +mean | +
gamename | ++ | + | + | + |
serious sam hd: the second encounter | +11 | +0 | +329 | +57.909091 | +
grim fandango remastered | +11 | +0 | +248 | +35.000000 | +
evga precision x1 | +11 | +0 | +21766 | +2498.181818 | +
f.e.a.r. 2: project origin | +11 | +0 | +292 | +43.272727 | +
transistor | +11 | +0 | +972 | +298.727273 | +
... | +... | +... | +... | +... | +
payday 2 | +102 | +0 | +84023 | +5115.813725 | +
team fortress 2 | +105 | +7 | +304090 | +25291.180952 | +
unturned | +107 | +0 | +16974 | +1339.757009 | +
garry's mod | +121 | +0 | +311103 | +20890.314050 | +
counter-strike: global offensive | +129 | +0 | +506638 | +46356.209302 | +
701 rows × 4 columns
++ | gameid | +time | +gamename | +
---|---|---|---|
13196 | +730 | +742 | +counter-strike: global offensive | +
13197 | +730 | +16019 | +counter-strike: global offensive | +
13198 | +730 | +1781 | +counter-strike: global offensive | +
13199 | +730 | +0 | +counter-strike: global offensive | +
13200 | +730 | +0 | +counter-strike: global offensive | +
... | +... | +... | +... | +
13320 | +730 | +3867 | +counter-strike: global offensive | +
13321 | +730 | +174176 | +counter-strike: global offensive | +
13322 | +730 | +186988 | +counter-strike: global offensive | +
13323 | +730 | +103341 | +counter-strike: global offensive | +
13324 | +730 | +10483 | +counter-strike: global offensive | +
129 rows × 3 columns
++ | gameid | +time | +gamename | +
---|---|---|---|
167 | +304930 | +140 | +unturned | +
168 | +304930 | +723 | +unturned | +
169 | +304930 | +1002 | +unturned | +
170 | +304930 | +1002 | +unturned | +
171 | +304930 | +0 | +unturned | +
... | +... | +... | +... | +
269 | +304930 | +97 | +unturned | +
270 | +304930 | +768 | +unturned | +
271 | +304930 | +1570 | +unturned | +
272 | +304930 | +23 | +unturned | +
273 | +304930 | +115 | +unturned | +
107 rows × 3 columns
++ | gameid | +time | +gamename | +
---|---|---|---|
167 | +304930 | +2.333333 | +unturned | +
168 | +304930 | +12.050000 | +unturned | +
169 | +304930 | +16.700000 | +unturned | +
170 | +304930 | +16.700000 | +unturned | +
171 | +304930 | +0.000000 | +unturned | +
... | +... | +... | +... | +
22682 | +578080 | +51.883333 | +playerunknown's battlegrounds | +
22683 | +578080 | +47.616667 | +playerunknown's battlegrounds | +
22684 | +578080 | +30.650000 | +playerunknown's battlegrounds | +
22685 | +578080 | +170.083333 | +playerunknown's battlegrounds | +
22686 | +578080 | +399.950000 | +playerunknown's battlegrounds | +
1099 rows × 3 columns
+