|
|
@ -0,0 +1,741 @@ |
|
|
|
Let's do a deep dive and start visualizing my life using Fitbit and |
|
|
|
Matplotlib. |
|
|
|
|
|
|
|
# What is Fitbit |
|
|
|
|
|
|
|
[Fitbit](https://www.fitbit.com) is a fitness watch that tracks your sleep, heart rate, and activity. |
|
|
|
Fitbit is able to track your steps, however, it is also able to detect multiple types of activity |
|
|
|
like running, walking, "sport" and biking. |
|
|
|
|
|
|
|
# What is Matplotlib |
|
|
|
|
|
|
|
[Matplotlib](https://matplotlib.org/) is a python visualization library that enables you to create bar graphs, line graphs, distributions and many more things. |
|
|
|
Being able to visualize your results is essential to any person working with data at any scale. |
|
|
|
Although I like [GGplot](https://ggplot2.tidyverse.org/) in R more than Matplotlib, Matplotlib is still my go to graphing library for Python. |
|
|
|
|
|
|
|
# Getting Your Fitbit Data |
|
|
|
|
|
|
|
There are two main ways that you can get your Fitbit data: |
|
|
|
|
|
|
|
- Fitbit API |
|
|
|
- Data Archival Export |
|
|
|
|
|
|
|
|
|
|
|
Since connecting to the API and setting up all the web hooks can be a |
|
|
|
pain, I'm just going to use the data export option because this is |
|
|
|
only for one person. You can export your data here: |
|
|
|
[https://www.fitbit.com/settings/data/export](https://www.fitbit.com/settings/data/export). |
|
|
|
|
|
|
|
![Data export on fitbit's website](media/vis_my_life/dataExport.png) |
|
|
|
|
|
|
|
The Fitbit data archive was very organized and kept meticulous records |
|
|
|
of everything. All of the data was organized in separate JSON files |
|
|
|
labeled by date. Fitbit keeps around 1MB of data on you per day; most |
|
|
|
of this data is from the heart rate sensors. Although 1MB of data may |
|
|
|
sound like a ton of data, it is probably a lot less if you store it in |
|
|
|
formats other than JSON. When I downloaded the compressed file it was |
|
|
|
20MB, but when I extracted it, it was 380MB! I've only been using |
|
|
|
Fitbit for 11 months at this point. |
|
|
|
|
|
|
|
![compressed data](media/vis_my_life/compression.png) |
|
|
|
|
|
|
|
## Sleep |
|
|
|
|
|
|
|
Sleep is something fun to visualize. No matter how much of it you get |
|
|
|
you still feel tired as a college student. In the "sleep_score" folder |
|
|
|
of the exported data you will find a single CSV file with your resting |
|
|
|
heart rate and Fitbit's computed sleep scores. Interesting enough, |
|
|
|
this is the only file that comes in the CSV format, everything else is |
|
|
|
JSON file. |
|
|
|
|
|
|
|
We can read in all the data using a single liner with the |
|
|
|
[Pandas](https://pandas.pydata.org/) python library. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
import matplotlib.pyplot as plt |
|
|
|
import pandas as pd |
|
|
|
|
|
|
|
sleep_score_df = pd.read_csv('data/sleep/sleep_score.csv') |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
print(sleep_score_df) |
|
|
|
``` |
|
|
|
|
|
|
|
sleep_log_entry_id timestamp overall_score \ |
|
|
|
0 26093459526 2020-02-27T06:04:30Z 80 |
|
|
|
1 26081303207 2020-02-26T06:13:30Z 83 |
|
|
|
2 26062481322 2020-02-25T06:00:30Z 82 |
|
|
|
3 26045941555 2020-02-24T05:49:30Z 79 |
|
|
|
4 26034268762 2020-02-23T08:35:30Z 75 |
|
|
|
.. ... ... ... |
|
|
|
176 23696231032 2019-09-02T07:38:30Z 79 |
|
|
|
177 23684345925 2019-09-01T07:15:30Z 84 |
|
|
|
178 23673204871 2019-08-31T07:11:00Z 74 |
|
|
|
179 23661278483 2019-08-30T06:34:00Z 73 |
|
|
|
180 23646265400 2019-08-29T05:55:00Z 80 |
|
|
|
|
|
|
|
composition_score revitalization_score duration_score \ |
|
|
|
0 20 19 41 |
|
|
|
1 22 21 40 |
|
|
|
2 22 21 39 |
|
|
|
3 17 20 42 |
|
|
|
4 20 16 39 |
|
|
|
.. ... ... ... |
|
|
|
176 20 20 39 |
|
|
|
177 22 21 41 |
|
|
|
178 18 21 35 |
|
|
|
179 17 19 37 |
|
|
|
180 21 21 38 |
|
|
|
|
|
|
|
deep_sleep_in_minutes resting_heart_rate restlessness |
|
|
|
0 65 60 0.117330 |
|
|
|
1 85 60 0.113188 |
|
|
|
2 95 60 0.120635 |
|
|
|
3 52 61 0.111224 |
|
|
|
4 43 59 0.154774 |
|
|
|
.. ... ... ... |
|
|
|
176 88 56 0.170923 |
|
|
|
177 95 56 0.133268 |
|
|
|
178 73 56 0.102703 |
|
|
|
179 50 55 0.121086 |
|
|
|
180 61 57 0.112961 |
|
|
|
|
|
|
|
[181 rows x 9 columns] |
|
|
|
|
|
|
|
|
|
|
|
With the Pandas library you can generate Matplotlib graphs. Although |
|
|
|
you can directly use Matplotlib, the wrapper functions using Pandas |
|
|
|
makes it easier to use. |
|
|
|
|
|
|
|
## Sleep Score Histogram |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
sleep_score_df.hist(column='overall_score') |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc2c0a270d0>]], |
|
|
|
dtype=object) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_7_1.png) |
|
|
|
|
|
|
|
|
|
|
|
## Heart Rate |
|
|
|
|
|
|
|
Fitbit keeps their calculated heart rates in the sleep scores file |
|
|
|
rather than heart. Knowing your resting heart rate is useful because |
|
|
|
it is a good indicator of your overall health. |
|
|
|
|
|
|
|
![](media/vis_my_life/restingHeartRate.jpg) |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
sleep_score_df.hist(column='resting_heart_rate') |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc2917a6090>]], |
|
|
|
dtype=object) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_9_1.png) |
|
|
|
|
|
|
|
|
|
|
|
## Resting Heart Rate Time Graph |
|
|
|
|
|
|
|
Using the pandas wrapper we can quickly create a heart rate graph over |
|
|
|
time. |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
sleep_score_df.plot(kind='line', y='resting_heart_rate', x ='timestamp', legend=False, title="Resting Heart Rate(BPM)") |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<matplotlib.axes._subplots.AxesSubplot at 0x7fc28f609b50> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_11_1.png) |
|
|
|
|
|
|
|
|
|
|
|
However, as we notice with the graph above, the time axis is wack. In |
|
|
|
the pandas data frame everything was stored as a string timestamp. We |
|
|
|
can convert this into a datetime object by telling pandas to parse the |
|
|
|
date as it reads it. |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
sleep_score_df = pd.read_csv('data/sleep/sleep_score.csv', parse_dates=[1]) |
|
|
|
sleep_score_df.plot(kind='line', y='resting_heart_rate', x ='timestamp', legend=False, title="Resting Heart Rate(BPM)") |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<matplotlib.axes._subplots.AxesSubplot at 0x7fc28f533510> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_13_1.png) |
|
|
|
|
|
|
|
|
|
|
|
To fully manipulate the graphs, we need to use some matplotlib code to |
|
|
|
do things like setting the axis labels or make multiple plots right |
|
|
|
next to each other. We can create grab the current axis being used by |
|
|
|
matplotlib by using plt.gca(). |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
ax = plt.gca() |
|
|
|
sleep_score_df.plot(kind='line', y='resting_heart_rate', x ='timestamp', legend=False, title="Resting Heart Rate Graph", ax=ax, figsize=(10, 5)) |
|
|
|
plt.xlabel("Date") |
|
|
|
plt.ylabel("Resting Heart Rate (BPM)") |
|
|
|
plt.show() |
|
|
|
|
|
|
|
#plt.savefig('restingHeartRate.svg') |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_15_0.png) |
|
|
|
|
|
|
|
|
|
|
|
The same thing can be done with sleep scores. It is interesting to |
|
|
|
note that the sleep scores rarely vary anything between 75 and 85. |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
ax = plt.gca() |
|
|
|
sleep_score_df.plot(kind='line', y='overall_score', x ='timestamp', legend=False, title="Sleep Score Time Series Graph", ax=ax) |
|
|
|
plt.xlabel("Date") |
|
|
|
plt.ylabel("Fitbit's Sleep Score") |
|
|
|
plt.show() |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_17_0.png) |
|
|
|
|
|
|
|
|
|
|
|
Using Pandas we can generate a new column with a specific date |
|
|
|
attribute like year, day, month, or weekday. If we add a new column |
|
|
|
for weekday, we can then group by weekday and collapse them all into a |
|
|
|
single column by summing or averaging the value. |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
temp = pd.DatetimeIndex(sleep_score_df['timestamp']) |
|
|
|
sleep_score_df['weekday'] = temp.weekday |
|
|
|
|
|
|
|
print(sleep_score_df) |
|
|
|
``` |
|
|
|
|
|
|
|
sleep_log_entry_id timestamp overall_score \ |
|
|
|
0 26093459526 2020-02-27 06:04:30+00:00 80 |
|
|
|
1 26081303207 2020-02-26 06:13:30+00:00 83 |
|
|
|
2 26062481322 2020-02-25 06:00:30+00:00 82 |
|
|
|
3 26045941555 2020-02-24 05:49:30+00:00 79 |
|
|
|
4 26034268762 2020-02-23 08:35:30+00:00 75 |
|
|
|
.. ... ... ... |
|
|
|
176 23696231032 2019-09-02 07:38:30+00:00 79 |
|
|
|
177 23684345925 2019-09-01 07:15:30+00:00 84 |
|
|
|
178 23673204871 2019-08-31 07:11:00+00:00 74 |
|
|
|
179 23661278483 2019-08-30 06:34:00+00:00 73 |
|
|
|
180 23646265400 2019-08-29 05:55:00+00:00 80 |
|
|
|
|
|
|
|
composition_score revitalization_score duration_score \ |
|
|
|
0 20 19 41 |
|
|
|
1 22 21 40 |
|
|
|
2 22 21 39 |
|
|
|
3 17 20 42 |
|
|
|
4 20 16 39 |
|
|
|
.. ... ... ... |
|
|
|
176 20 20 39 |
|
|
|
177 22 21 41 |
|
|
|
178 18 21 35 |
|
|
|
179 17 19 37 |
|
|
|
180 21 21 38 |
|
|
|
|
|
|
|
deep_sleep_in_minutes resting_heart_rate restlessness weekday |
|
|
|
0 65 60 0.117330 3 |
|
|
|
1 85 60 0.113188 2 |
|
|
|
2 95 60 0.120635 1 |
|
|
|
3 52 61 0.111224 0 |
|
|
|
4 43 59 0.154774 6 |
|
|
|
.. ... ... ... ... |
|
|
|
176 88 56 0.170923 0 |
|
|
|
177 95 56 0.133268 6 |
|
|
|
178 73 56 0.102703 5 |
|
|
|
179 50 55 0.121086 4 |
|
|
|
180 61 57 0.112961 3 |
|
|
|
|
|
|
|
[181 rows x 10 columns] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
print(sleep_score_df.groupby('weekday').mean()) |
|
|
|
``` |
|
|
|
|
|
|
|
sleep_log_entry_id overall_score composition_score \ |
|
|
|
weekday |
|
|
|
0 2.483733e+10 79.576923 20.269231 |
|
|
|
1 2.485200e+10 77.423077 20.423077 |
|
|
|
2 2.490383e+10 80.880000 21.120000 |
|
|
|
3 2.483418e+10 76.814815 20.370370 |
|
|
|
4 2.480085e+10 79.769231 20.961538 |
|
|
|
5 2.477002e+10 78.840000 20.520000 |
|
|
|
6 2.482581e+10 77.230769 20.269231 |
|
|
|
|
|
|
|
revitalization_score duration_score deep_sleep_in_minutes \ |
|
|
|
weekday |
|
|
|
0 19.153846 40.153846 88.000000 |
|
|
|
1 19.000000 38.000000 83.846154 |
|
|
|
2 19.400000 40.360000 93.760000 |
|
|
|
3 19.037037 37.407407 82.592593 |
|
|
|
4 19.346154 39.461538 94.461538 |
|
|
|
5 19.080000 39.240000 93.720000 |
|
|
|
6 18.269231 38.692308 89.423077 |
|
|
|
|
|
|
|
resting_heart_rate restlessness |
|
|
|
weekday |
|
|
|
0 58.576923 0.139440 |
|
|
|
1 58.538462 0.142984 |
|
|
|
2 58.560000 0.138661 |
|
|
|
3 58.333333 0.135819 |
|
|
|
4 58.269231 0.129791 |
|
|
|
5 58.080000 0.138315 |
|
|
|
6 58.153846 0.147171 |
|
|
|
|
|
|
|
|
|
|
|
## Sleep Score Based on Day |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
ax = plt.gca() |
|
|
|
sleep_score_df.groupby('weekday').mean().plot(kind='line', y='overall_score', ax = ax) |
|
|
|
plt.ylabel("Sleep Score") |
|
|
|
plt.title("Sleep Scores on Varying Days of Week") |
|
|
|
plt.show() |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_22_0.png) |
|
|
|
|
|
|
|
|
|
|
|
## Sleep Score Based on Days of Week |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
ax = plt.gca() |
|
|
|
sleep_score_df.groupby('weekday').mean().plot(kind='line', y='resting_heart_rate', ax = ax) |
|
|
|
plt.ylabel("Resting heart rate (BPM)") |
|
|
|
plt.title("Resting Heart Rate Varying Days of Week") |
|
|
|
plt.show() |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_24_0.png) |
|
|
|
|
|
|
|
|
|
|
|
# Calories |
|
|
|
|
|
|
|
Fitbit keeps all of their calorie data in JSON files representing |
|
|
|
sequence data at 1 minute increments. To extrapolate calorie data we |
|
|
|
need to group by day and then sum the days to get the total calories |
|
|
|
burned per day. |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
calories_df = pd.read_json("data/calories/calories-2019-07-01.json", convert_dates=True) |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
print(calories_df) |
|
|
|
``` |
|
|
|
|
|
|
|
dateTime value |
|
|
|
0 2019-07-01 00:00:00 1.07 |
|
|
|
1 2019-07-01 00:01:00 1.07 |
|
|
|
2 2019-07-01 00:02:00 1.07 |
|
|
|
3 2019-07-01 00:03:00 1.07 |
|
|
|
4 2019-07-01 00:04:00 1.07 |
|
|
|
... ... ... |
|
|
|
43195 2019-07-30 23:55:00 1.07 |
|
|
|
43196 2019-07-30 23:56:00 1.07 |
|
|
|
43197 2019-07-30 23:57:00 1.07 |
|
|
|
43198 2019-07-30 23:58:00 1.07 |
|
|
|
43199 2019-07-30 23:59:00 1.07 |
|
|
|
|
|
|
|
[43200 rows x 2 columns] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
import datetime |
|
|
|
calories_df['date_minus_time'] = calories_df["dateTime"].apply( lambda calories_df : |
|
|
|
datetime.datetime(year=calories_df.year, month=calories_df.month, day=calories_df.day)) |
|
|
|
|
|
|
|
calories_df.set_index(calories_df["date_minus_time"],inplace=True) |
|
|
|
|
|
|
|
print(calories_df) |
|
|
|
``` |
|
|
|
|
|
|
|
dateTime value date_minus_time |
|
|
|
date_minus_time |
|
|
|
2019-07-01 2019-07-01 00:00:00 1.07 2019-07-01 |
|
|
|
2019-07-01 2019-07-01 00:01:00 1.07 2019-07-01 |
|
|
|
2019-07-01 2019-07-01 00:02:00 1.07 2019-07-01 |
|
|
|
2019-07-01 2019-07-01 00:03:00 1.07 2019-07-01 |
|
|
|
2019-07-01 2019-07-01 00:04:00 1.07 2019-07-01 |
|
|
|
... ... ... ... |
|
|
|
2019-07-30 2019-07-30 23:55:00 1.07 2019-07-30 |
|
|
|
2019-07-30 2019-07-30 23:56:00 1.07 2019-07-30 |
|
|
|
2019-07-30 2019-07-30 23:57:00 1.07 2019-07-30 |
|
|
|
2019-07-30 2019-07-30 23:58:00 1.07 2019-07-30 |
|
|
|
2019-07-30 2019-07-30 23:59:00 1.07 2019-07-30 |
|
|
|
|
|
|
|
[43200 rows x 3 columns] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
calories_per_day = calories_df.resample('D').sum() |
|
|
|
print(calories_per_day) |
|
|
|
``` |
|
|
|
|
|
|
|
value |
|
|
|
date_minus_time |
|
|
|
2019-07-01 3422.68 |
|
|
|
2019-07-02 2705.85 |
|
|
|
2019-07-03 2871.73 |
|
|
|
2019-07-04 4089.93 |
|
|
|
2019-07-05 3917.91 |
|
|
|
2019-07-06 2762.55 |
|
|
|
2019-07-07 2929.58 |
|
|
|
2019-07-08 2698.99 |
|
|
|
2019-07-09 2833.27 |
|
|
|
2019-07-10 2529.21 |
|
|
|
2019-07-11 2634.25 |
|
|
|
2019-07-12 2953.91 |
|
|
|
2019-07-13 4247.45 |
|
|
|
2019-07-14 2998.35 |
|
|
|
2019-07-15 2846.18 |
|
|
|
2019-07-16 3084.39 |
|
|
|
2019-07-17 2331.06 |
|
|
|
2019-07-18 2849.20 |
|
|
|
2019-07-19 2071.63 |
|
|
|
2019-07-20 2746.25 |
|
|
|
2019-07-21 2562.11 |
|
|
|
2019-07-22 1892.99 |
|
|
|
2019-07-23 2372.89 |
|
|
|
2019-07-24 2320.42 |
|
|
|
2019-07-25 2140.87 |
|
|
|
2019-07-26 2430.38 |
|
|
|
2019-07-27 3769.04 |
|
|
|
2019-07-28 2036.24 |
|
|
|
2019-07-29 2814.87 |
|
|
|
2019-07-30 2077.82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
ax = plt.gca() |
|
|
|
calories_per_day.plot(kind='hist', title="Calorie Distribution", legend=False, ax=ax) |
|
|
|
plt.show() |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_30_0.png) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
ax = plt.gca() |
|
|
|
calories_per_day.plot(kind='line', y='value', legend=False, title="Calories Per Day", ax=ax) |
|
|
|
plt.xlabel("Date") |
|
|
|
plt.ylabel("Calories") |
|
|
|
plt.show() |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_31_0.png) |
|
|
|
|
|
|
|
|
|
|
|
## Calories Per Day Box Plot |
|
|
|
|
|
|
|
Using this data we can turn this into a boxplot to make it easier to |
|
|
|
visualize the distribution of calories burned during the month of |
|
|
|
July. |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
ax = plt.gca() |
|
|
|
ax.set_title('Calorie Distribution for July') |
|
|
|
ax.boxplot(calories_per_day['value'], vert=False,manage_ticks=False, notch=True) |
|
|
|
plt.xlabel("Calories Burned") |
|
|
|
ax.set_yticks([]) |
|
|
|
plt.show() |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_33_0.png) |
|
|
|
|
|
|
|
|
|
|
|
# Steps |
|
|
|
|
|
|
|
Fitbit is known for taking the amount of steps someone takes per day. |
|
|
|
Similar to calories burned, steps taken is stored in time series data |
|
|
|
at 1 minute increments. Since we are interested at the day level data, |
|
|
|
we need to first remove the time component of the dataframe so that we |
|
|
|
can group all the data by date. Once we have everything grouped by |
|
|
|
date, we can sum and produce steps per day. |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
steps_df = pd.read_json("data/steps-2019-07-01.json", convert_dates=True) |
|
|
|
|
|
|
|
steps_df['date_minus_time'] = steps_df["dateTime"].apply( lambda steps_df : |
|
|
|
datetime.datetime(year=steps_df.year, month=steps_df.month, day=steps_df.day)) |
|
|
|
|
|
|
|
steps_df.set_index(steps_df["date_minus_time"],inplace=True) |
|
|
|
print(steps_df) |
|
|
|
``` |
|
|
|
|
|
|
|
dateTime value date_minus_time |
|
|
|
date_minus_time |
|
|
|
2019-07-01 2019-07-01 04:00:00 0 2019-07-01 |
|
|
|
2019-07-01 2019-07-01 04:01:00 0 2019-07-01 |
|
|
|
2019-07-01 2019-07-01 04:02:00 0 2019-07-01 |
|
|
|
2019-07-01 2019-07-01 04:03:00 0 2019-07-01 |
|
|
|
2019-07-01 2019-07-01 04:04:00 0 2019-07-01 |
|
|
|
... ... ... ... |
|
|
|
2019-07-31 2019-07-31 03:55:00 0 2019-07-31 |
|
|
|
2019-07-31 2019-07-31 03:56:00 0 2019-07-31 |
|
|
|
2019-07-31 2019-07-31 03:57:00 0 2019-07-31 |
|
|
|
2019-07-31 2019-07-31 03:58:00 0 2019-07-31 |
|
|
|
2019-07-31 2019-07-31 03:59:00 0 2019-07-31 |
|
|
|
|
|
|
|
[41116 rows x 3 columns] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
steps_per_day = steps_df.resample('D').sum() |
|
|
|
print(steps_per_day) |
|
|
|
``` |
|
|
|
|
|
|
|
value |
|
|
|
date_minus_time |
|
|
|
2019-07-01 11285 |
|
|
|
2019-07-02 4957 |
|
|
|
2019-07-03 13119 |
|
|
|
2019-07-04 16034 |
|
|
|
2019-07-05 11634 |
|
|
|
2019-07-06 6860 |
|
|
|
2019-07-07 3758 |
|
|
|
2019-07-08 9130 |
|
|
|
2019-07-09 10960 |
|
|
|
2019-07-10 7012 |
|
|
|
2019-07-11 5420 |
|
|
|
2019-07-12 4051 |
|
|
|
2019-07-13 15980 |
|
|
|
2019-07-14 23109 |
|
|
|
2019-07-15 11247 |
|
|
|
2019-07-16 10170 |
|
|
|
2019-07-17 4905 |
|
|
|
2019-07-18 10769 |
|
|
|
2019-07-19 4504 |
|
|
|
2019-07-20 5032 |
|
|
|
2019-07-21 8953 |
|
|
|
2019-07-22 2200 |
|
|
|
2019-07-23 9392 |
|
|
|
2019-07-24 5666 |
|
|
|
2019-07-25 5016 |
|
|
|
2019-07-26 5879 |
|
|
|
2019-07-27 19492 |
|
|
|
2019-07-28 4987 |
|
|
|
2019-07-29 9943 |
|
|
|
2019-07-30 3897 |
|
|
|
2019-07-31 166 |
|
|
|
|
|
|
|
|
|
|
|
## Steps Per Day Histogram |
|
|
|
|
|
|
|
After the data is in the form that we want, graphing the data is |
|
|
|
straight forward. Two added things I like to do for normal box plots |
|
|
|
is to set the displays to horizontal add the notches. |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
ax = plt.gca() |
|
|
|
ax.set_title('Steps Distribution for July') |
|
|
|
ax.boxplot(steps_per_day['value'], vert=False,manage_ticks=False, notch=True) |
|
|
|
plt.xlabel("Steps Per Day") |
|
|
|
ax.set_yticks([]) |
|
|
|
plt.show() |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_38_0.png) |
|
|
|
|
|
|
|
|
|
|
|
Wrapping that all into a single function we get something like this: |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
def readFileIntoDataFrame(fName): |
|
|
|
steps_df = pd.read_json(fName, convert_dates=True) |
|
|
|
|
|
|
|
steps_df['date_minus_time'] = steps_df["dateTime"].apply( lambda steps_df : |
|
|
|
datetime.datetime(year=steps_df.year, month=steps_df.month, day=steps_df.day)) |
|
|
|
|
|
|
|
steps_df.set_index(steps_df["date_minus_time"],inplace=True) |
|
|
|
return steps_df.resample('D').sum() |
|
|
|
|
|
|
|
def graphBoxAndWhiskers(data, title, xlab): |
|
|
|
ax = plt.gca() |
|
|
|
ax.set_title(title) |
|
|
|
ax.boxplot(data['value'], vert=False, manage_ticks=False, notch=True) |
|
|
|
plt.xlabel(xlab) |
|
|
|
ax.set_yticks([]) |
|
|
|
plt.show() |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
graphBoxAndWhiskers(readFileIntoDataFrame("data/steps-2020-01-27.json"), "Steps In January", "Steps Per Day") |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_41_0.png) |
|
|
|
|
|
|
|
|
|
|
|
That is cool, but, what if we could view the distribution for each |
|
|
|
month in the same graph? Based on the two previous graphs, my step |
|
|
|
distribution during July looked distinctly different from my step |
|
|
|
distribution in January. The first difficultly would be to read in |
|
|
|
all the files since Fitbit creates a new file for every month. The |
|
|
|
next thing would be to group them by month and then graph it. |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
import os |
|
|
|
files = os.listdir("data") |
|
|
|
print(files) |
|
|
|
``` |
|
|
|
|
|
|
|
['steps-2019-04-02.json', 'steps-2019-08-30.json', 'steps-2020-02-26.json', 'steps-2019-10-29.json', 'steps-2019-07-01.json', 'steps-2020-01-27.json', 'steps-2019-07-31.json', 'steps-2019-06-01.json', 'steps-2019-09-29.json', '.ipynb_checkpoints', 'steps-2019-12-28.json', 'steps-2019-05-02.json', 'calories', 'steps-2019-11-28.json', 'sleep'] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
dfs = [] |
|
|
|
for file in files: # this can take 15 seconds |
|
|
|
if "steps" in file: # finds the steps files |
|
|
|
dfs.append(readFileIntoDataFrame("data/" + file)) |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
stepsPerDay = pd.concat(dfs) |
|
|
|
graphBoxAndWhiskers(stepsPerDay, "Steps Per Day Last 11 Months", "Steps per Day") |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_45_0.png) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
print(type(stepsPerDay['value'].to_numpy())) |
|
|
|
print(stepsPerDay['value'].keys()) |
|
|
|
|
|
|
|
stepsPerDay['month'] = pd.DatetimeIndex(stepsPerDay['value'].keys()).month |
|
|
|
stepsPerDay['week_day'] = pd.DatetimeIndex(stepsPerDay['value'].keys()).weekday |
|
|
|
|
|
|
|
print(stepsPerDay) |
|
|
|
``` |
|
|
|
|
|
|
|
<class 'numpy.ndarray'> |
|
|
|
DatetimeIndex(['2019-04-03', '2019-04-04', '2019-04-05', '2019-04-06', |
|
|
|
'2019-04-07', '2019-04-08', '2019-04-09', '2019-04-10', |
|
|
|
'2019-04-11', '2019-04-12', |
|
|
|
... |
|
|
|
'2019-12-19', '2019-12-20', '2019-12-21', '2019-12-22', |
|
|
|
'2019-12-23', '2019-12-24', '2019-12-25', '2019-12-26', |
|
|
|
'2019-12-27', '2019-12-28'], |
|
|
|
dtype='datetime64[ns]', name='date_minus_time', length=342, freq=None) |
|
|
|
value month week_day |
|
|
|
date_minus_time |
|
|
|
2019-04-03 510 4 2 |
|
|
|
2019-04-04 11453 4 3 |
|
|
|
2019-04-05 12684 4 4 |
|
|
|
2019-04-06 12910 4 5 |
|
|
|
2019-04-07 3368 4 6 |
|
|
|
... ... ... ... |
|
|
|
2019-12-24 5779 12 1 |
|
|
|
2019-12-25 4264 12 2 |
|
|
|
2019-12-26 4843 12 3 |
|
|
|
2019-12-27 9609 12 4 |
|
|
|
2019-12-28 2218 12 5 |
|
|
|
|
|
|
|
[342 rows x 3 columns] |
|
|
|
|
|
|
|
|
|
|
|
## Graphing Steps by Month |
|
|
|
|
|
|
|
Now that we have columns for the total amount of steps per day and the |
|
|
|
months, we can plot all the data on a single plot using the group by |
|
|
|
operator in the plotting library. |
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
ax = plt.gca() |
|
|
|
ax.set_title('Steps Distribution for July\n') |
|
|
|
stepsPerDay.boxplot(column=['value'], by='month',ax=ax, notch=True) |
|
|
|
plt.xlabel("Month") |
|
|
|
plt.ylabel("Steps Per Day") |
|
|
|
plt.show() |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_48_0.png) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```python |
|
|
|
ax = plt.gca() |
|
|
|
ax.set_title('Steps Distribution By Week Day\n') |
|
|
|
stepsPerDay.boxplot(column=['value'], by='week_day',ax=ax, notch=True) |
|
|
|
plt.xlabel("Week Day") |
|
|
|
plt.ylabel("Steps Per Day") |
|
|
|
plt.show() |
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
![png](media/vis_my_life/output_49_0.png) |
|
|
|
|
|
|
|
|
|
|
|
## Future Work |
|
|
|
|
|
|
|
Moving forward with this I would like to do more visualizations with |
|
|
|
sleep data and heart rate. |