Let's do a deep dive and start visualizing my life using Fitbit and
Matplotlib.  

# What is Fitbit

[Fitbit](https://www.fitbit.com) is a fitness watch that tracks your sleep, heart rate, and activity.
Fitbit is able to track your steps, however, it is also able to detect multiple types of activity
like running, walking, "sport" and biking.

# What is Matplotlib

[Matplotlib](https://matplotlib.org/) is a python visualization library that enables you to create bar graphs, line graphs, distributions and many more things.
Being able to visualize your results is essential to any person working with data at any scale.
Although I like [GGplot](https://ggplot2.tidyverse.org/) in R more than Matplotlib, Matplotlib is still my go to graphing library for Python. 

# Getting Your Fitbit Data

There are two main ways that you can get your Fitbit data: 

- Fitbit API
- Data Archival Export


Since connecting to the API and setting up all the web hooks can be a
pain, I'm just going to use the data export option because this is
only for one person. You can export your data here:
[https://www.fitbit.com/settings/data/export](https://www.fitbit.com/settings/data/export).

![Data export on fitbit's website](media/vis_my_life/dataExport.png)

The Fitbit data archive was very organized and kept meticulous records
of everything.  All of the data was organized in separate JSON files
labeled by date. Fitbit keeps around 1MB of data on you per day; most
of this data is from the heart rate sensors. Although 1MB of data may
sound like a ton of data, it is probably a lot less if you store it in
formats other than JSON.  When I downloaded the compressed file it was
20MB, but when I extracted it, it was 380MB! I've only been using
Fitbit for 11 months at this point.  

![compressed data](media/vis_my_life/compression.png)

## Sleep

Sleep is something fun to visualize. No matter how much of it you get
you still feel tired as a college student. In the "sleep_score" folder
of the exported data you will find a single CSV file with your resting
heart rate and Fitbit's computed sleep scores. Interesting enough,
this is the only file that comes in the CSV format, everything else is
JSON file.  

We can read in all the data using a single liner with the
[Pandas](https://pandas.pydata.org/) python library. 



```python
import matplotlib.pyplot as plt
import pandas as pd

sleep_score_df = pd.read_csv('data/sleep/sleep_score.csv')
```


```python
print(sleep_score_df)
```

         sleep_log_entry_id             timestamp  overall_score  \
    0           26093459526  2020-02-27T06:04:30Z             80   
    1           26081303207  2020-02-26T06:13:30Z             83   
    2           26062481322  2020-02-25T06:00:30Z             82   
    3           26045941555  2020-02-24T05:49:30Z             79   
    4           26034268762  2020-02-23T08:35:30Z             75   
    ..                  ...                   ...            ...   
    176         23696231032  2019-09-02T07:38:30Z             79   
    177         23684345925  2019-09-01T07:15:30Z             84   
    178         23673204871  2019-08-31T07:11:00Z             74   
    179         23661278483  2019-08-30T06:34:00Z             73   
    180         23646265400  2019-08-29T05:55:00Z             80   

         composition_score  revitalization_score  duration_score  \
    0                   20                    19              41   
    1                   22                    21              40   
    2                   22                    21              39   
    3                   17                    20              42   
    4                   20                    16              39   
    ..                 ...                   ...             ...   
    176                 20                    20              39   
    177                 22                    21              41   
    178                 18                    21              35   
    179                 17                    19              37   
    180                 21                    21              38   

         deep_sleep_in_minutes  resting_heart_rate  restlessness  
    0                       65                  60      0.117330  
    1                       85                  60      0.113188  
    2                       95                  60      0.120635  
    3                       52                  61      0.111224  
    4                       43                  59      0.154774  
    ..                     ...                 ...           ...  
    176                     88                  56      0.170923  
    177                     95                  56      0.133268  
    178                     73                  56      0.102703  
    179                     50                  55      0.121086  
    180                     61                  57      0.112961  

    [181 rows x 9 columns]


With the Pandas library you can generate Matplotlib graphs. Although
you can directly use Matplotlib, the wrapper functions using Pandas
makes it easier to use. 

## Sleep Score Histogram


```python
sleep_score_df.hist(column='overall_score')
```




    array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc2c0a270d0>]],
          dtype=object)




![png](media/vis_my_life/output_7_1.png)


## Heart Rate

Fitbit keeps their calculated heart rates in the sleep scores file
rather than heart. Knowing your resting heart rate is useful because
it is a good indicator of your overall health. 

![](media/vis_my_life/restingHeartRate.jpg)


```python
sleep_score_df.hist(column='resting_heart_rate')
```




    array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc2917a6090>]],
          dtype=object)




![png](media/vis_my_life/output_9_1.png)


## Resting Heart Rate Time Graph

Using the pandas wrapper we can quickly create a heart rate graph over
time. 


```python
sleep_score_df.plot(kind='line', y='resting_heart_rate', x ='timestamp', legend=False, title="Resting Heart Rate(BPM)")
```




    <matplotlib.axes._subplots.AxesSubplot at 0x7fc28f609b50>




![png](media/vis_my_life/output_11_1.png)


However, as we notice with the graph above, the time axis is wack. In
the pandas data frame everything was stored as a string timestamp. We
can convert this into a datetime object by telling pandas to parse the
date as it reads it. 


```python
sleep_score_df = pd.read_csv('data/sleep/sleep_score.csv', parse_dates=[1])
sleep_score_df.plot(kind='line', y='resting_heart_rate', x ='timestamp', legend=False, title="Resting Heart Rate(BPM)")
```




    <matplotlib.axes._subplots.AxesSubplot at 0x7fc28f533510>




![png](media/vis_my_life/output_13_1.png)


To fully manipulate the graphs, we need to use some matplotlib code to
do things like setting the axis labels or make multiple plots right
next to each other. We can create grab the current axis being used by
matplotlib by using plt.gca(). 


```python
ax = plt.gca()
sleep_score_df.plot(kind='line', y='resting_heart_rate', x ='timestamp', legend=False, title="Resting Heart Rate Graph", ax=ax, figsize=(10, 5))
plt.xlabel("Date")
plt.ylabel("Resting Heart Rate (BPM)")
plt.show()

#plt.savefig('restingHeartRate.svg')
```


![png](media/vis_my_life/output_15_0.png)


The same thing can be done with sleep scores. It is interesting to
note that  the sleep scores rarely vary anything between 75 and 85. 


```python
ax = plt.gca()
sleep_score_df.plot(kind='line', y='overall_score', x ='timestamp', legend=False, title="Sleep Score Time Series Graph", ax=ax)
plt.xlabel("Date")
plt.ylabel("Fitbit's Sleep Score")
plt.show()
```


![png](media/vis_my_life/output_17_0.png)


Using Pandas we can generate a new column with a specific date
attribute like year, day, month, or weekday. If we add a new column
for weekday, we can then group by weekday and collapse them all into a
single column by summing or averaging the value. 


```python
temp = pd.DatetimeIndex(sleep_score_df['timestamp'])
sleep_score_df['weekday'] = temp.weekday

print(sleep_score_df)
```

         sleep_log_entry_id                 timestamp  overall_score  \
    0           26093459526 2020-02-27 06:04:30+00:00             80   
    1           26081303207 2020-02-26 06:13:30+00:00             83   
    2           26062481322 2020-02-25 06:00:30+00:00             82   
    3           26045941555 2020-02-24 05:49:30+00:00             79   
    4           26034268762 2020-02-23 08:35:30+00:00             75   
    ..                  ...                       ...            ...   
    176         23696231032 2019-09-02 07:38:30+00:00             79   
    177         23684345925 2019-09-01 07:15:30+00:00             84   
    178         23673204871 2019-08-31 07:11:00+00:00             74   
    179         23661278483 2019-08-30 06:34:00+00:00             73   
    180         23646265400 2019-08-29 05:55:00+00:00             80   

         composition_score  revitalization_score  duration_score  \
    0                   20                    19              41   
    1                   22                    21              40   
    2                   22                    21              39   
    3                   17                    20              42   
    4                   20                    16              39   
    ..                 ...                   ...             ...   
    176                 20                    20              39   
    177                 22                    21              41   
    178                 18                    21              35   
    179                 17                    19              37   
    180                 21                    21              38   

         deep_sleep_in_minutes  resting_heart_rate  restlessness  weekday  
    0                       65                  60      0.117330        3  
    1                       85                  60      0.113188        2  
    2                       95                  60      0.120635        1  
    3                       52                  61      0.111224        0  
    4                       43                  59      0.154774        6  
    ..                     ...                 ...           ...      ...  
    176                     88                  56      0.170923        0  
    177                     95                  56      0.133268        6  
    178                     73                  56      0.102703        5  
    179                     50                  55      0.121086        4  
    180                     61                  57      0.112961        3  

    [181 rows x 10 columns]



```python
print(sleep_score_df.groupby('weekday').mean())
```

             sleep_log_entry_id  overall_score  composition_score  \
    weekday                                                         
    0              2.483733e+10      79.576923          20.269231   
    1              2.485200e+10      77.423077          20.423077   
    2              2.490383e+10      80.880000          21.120000   
    3              2.483418e+10      76.814815          20.370370   
    4              2.480085e+10      79.769231          20.961538   
    5              2.477002e+10      78.840000          20.520000   
    6              2.482581e+10      77.230769          20.269231   

             revitalization_score  duration_score  deep_sleep_in_minutes  \
    weekday                                                                
    0                   19.153846       40.153846              88.000000   
    1                   19.000000       38.000000              83.846154   
    2                   19.400000       40.360000              93.760000   
    3                   19.037037       37.407407              82.592593   
    4                   19.346154       39.461538              94.461538   
    5                   19.080000       39.240000              93.720000   
    6                   18.269231       38.692308              89.423077   

             resting_heart_rate  restlessness  
    weekday                                    
    0                 58.576923      0.139440  
    1                 58.538462      0.142984  
    2                 58.560000      0.138661  
    3                 58.333333      0.135819  
    4                 58.269231      0.129791  
    5                 58.080000      0.138315  
    6                 58.153846      0.147171  


## Sleep Score Based on Day


```python
ax = plt.gca()
sleep_score_df.groupby('weekday').mean().plot(kind='line', y='overall_score', ax = ax)
plt.ylabel("Sleep Score")
plt.title("Sleep Scores on Varying Days of Week")
plt.show()
```


![png](media/vis_my_life/output_22_0.png)


## Sleep Score Based on Days of Week


```python
ax = plt.gca()
sleep_score_df.groupby('weekday').mean().plot(kind='line', y='resting_heart_rate', ax = ax)
plt.ylabel("Resting heart rate (BPM)")
plt.title("Resting Heart Rate Varying Days of Week")
plt.show()
```


![png](media/vis_my_life/output_24_0.png)


# Calories

Fitbit keeps all of their calorie data in JSON files representing
sequence data at 1 minute increments. To extrapolate calorie data we
need to group by day and then sum the days to get the total calories
burned per day. 


```python
calories_df = pd.read_json("data/calories/calories-2019-07-01.json",  convert_dates=True)
```


```python
print(calories_df)
```

                     dateTime  value
    0     2019-07-01 00:00:00   1.07
    1     2019-07-01 00:01:00   1.07
    2     2019-07-01 00:02:00   1.07
    3     2019-07-01 00:03:00   1.07
    4     2019-07-01 00:04:00   1.07
    ...                   ...    ...
    43195 2019-07-30 23:55:00   1.07
    43196 2019-07-30 23:56:00   1.07
    43197 2019-07-30 23:57:00   1.07
    43198 2019-07-30 23:58:00   1.07
    43199 2019-07-30 23:59:00   1.07

    [43200 rows x 2 columns]



```python
import datetime
calories_df['date_minus_time'] = calories_df["dateTime"].apply( lambda calories_df : 
    datetime.datetime(year=calories_df.year, month=calories_df.month, day=calories_df.day))	

calories_df.set_index(calories_df["date_minus_time"],inplace=True)

print(calories_df)
```

                               dateTime  value date_minus_time
    date_minus_time                                           
    2019-07-01      2019-07-01 00:00:00   1.07      2019-07-01
    2019-07-01      2019-07-01 00:01:00   1.07      2019-07-01
    2019-07-01      2019-07-01 00:02:00   1.07      2019-07-01
    2019-07-01      2019-07-01 00:03:00   1.07      2019-07-01
    2019-07-01      2019-07-01 00:04:00   1.07      2019-07-01
    ...                             ...    ...             ...
    2019-07-30      2019-07-30 23:55:00   1.07      2019-07-30
    2019-07-30      2019-07-30 23:56:00   1.07      2019-07-30
    2019-07-30      2019-07-30 23:57:00   1.07      2019-07-30
    2019-07-30      2019-07-30 23:58:00   1.07      2019-07-30
    2019-07-30      2019-07-30 23:59:00   1.07      2019-07-30

    [43200 rows x 3 columns]



```python
calories_per_day = calories_df.resample('D').sum()
print(calories_per_day)
```

                       value
    date_minus_time         
    2019-07-01       3422.68
    2019-07-02       2705.85
    2019-07-03       2871.73
    2019-07-04       4089.93
    2019-07-05       3917.91
    2019-07-06       2762.55
    2019-07-07       2929.58
    2019-07-08       2698.99
    2019-07-09       2833.27
    2019-07-10       2529.21
    2019-07-11       2634.25
    2019-07-12       2953.91
    2019-07-13       4247.45
    2019-07-14       2998.35
    2019-07-15       2846.18
    2019-07-16       3084.39
    2019-07-17       2331.06
    2019-07-18       2849.20
    2019-07-19       2071.63
    2019-07-20       2746.25
    2019-07-21       2562.11
    2019-07-22       1892.99
    2019-07-23       2372.89
    2019-07-24       2320.42
    2019-07-25       2140.87
    2019-07-26       2430.38
    2019-07-27       3769.04
    2019-07-28       2036.24
    2019-07-29       2814.87
    2019-07-30       2077.82



```python
ax = plt.gca()
calories_per_day.plot(kind='hist', title="Calorie Distribution", legend=False, ax=ax)
plt.show()
```


![png](media/vis_my_life/output_30_0.png)



```python
ax = plt.gca()
calories_per_day.plot(kind='line', y='value', legend=False, title="Calories Per Day", ax=ax)
plt.xlabel("Date")
plt.ylabel("Calories")
plt.show()
```


![png](media/vis_my_life/output_31_0.png)


## Calories Per Day Box Plot

Using this data we can turn this into a boxplot to make it easier to
visualize the distribution of calories burned during the month of
July. 


```python
ax = plt.gca()
ax.set_title('Calorie Distribution for July')
ax.boxplot(calories_per_day['value'], vert=False,manage_ticks=False, notch=True)
plt.xlabel("Calories Burned")
ax.set_yticks([])
plt.show()
```


![png](media/vis_my_life/output_33_0.png)


# Steps

Fitbit is known for taking the amount of steps someone takes per day.
Similar to calories burned, steps taken is stored in time series data
at 1 minute increments. Since we are interested at the day level data,
we need to first remove the time component of the dataframe so that we
can group all the data by date. Once we have everything grouped by
date, we can sum and produce steps per day.  


```python
steps_df = pd.read_json("data/steps-2019-07-01.json",  convert_dates=True)

steps_df['date_minus_time'] = steps_df["dateTime"].apply( lambda steps_df : 
    datetime.datetime(year=steps_df.year, month=steps_df.month, day=steps_df.day))	

steps_df.set_index(steps_df["date_minus_time"],inplace=True)
print(steps_df)
```

                               dateTime  value date_minus_time
    date_minus_time                                           
    2019-07-01      2019-07-01 04:00:00      0      2019-07-01
    2019-07-01      2019-07-01 04:01:00      0      2019-07-01
    2019-07-01      2019-07-01 04:02:00      0      2019-07-01
    2019-07-01      2019-07-01 04:03:00      0      2019-07-01
    2019-07-01      2019-07-01 04:04:00      0      2019-07-01
    ...                             ...    ...             ...
    2019-07-31      2019-07-31 03:55:00      0      2019-07-31
    2019-07-31      2019-07-31 03:56:00      0      2019-07-31
    2019-07-31      2019-07-31 03:57:00      0      2019-07-31
    2019-07-31      2019-07-31 03:58:00      0      2019-07-31
    2019-07-31      2019-07-31 03:59:00      0      2019-07-31

    [41116 rows x 3 columns]



```python
steps_per_day = steps_df.resample('D').sum()
print(steps_per_day)
```

                     value
    date_minus_time       
    2019-07-01       11285
    2019-07-02        4957
    2019-07-03       13119
    2019-07-04       16034
    2019-07-05       11634
    2019-07-06        6860
    2019-07-07        3758
    2019-07-08        9130
    2019-07-09       10960
    2019-07-10        7012
    2019-07-11        5420
    2019-07-12        4051
    2019-07-13       15980
    2019-07-14       23109
    2019-07-15       11247
    2019-07-16       10170
    2019-07-17        4905
    2019-07-18       10769
    2019-07-19        4504
    2019-07-20        5032
    2019-07-21        8953
    2019-07-22        2200
    2019-07-23        9392
    2019-07-24        5666
    2019-07-25        5016
    2019-07-26        5879
    2019-07-27       19492
    2019-07-28        4987
    2019-07-29        9943
    2019-07-30        3897
    2019-07-31         166


## Steps Per Day Histogram

After the data is in the form that we want, graphing the data is
straight forward. Two added things I like to do for normal box plots
is to set the displays to horizontal add the notches.  


```python
ax = plt.gca()
ax.set_title('Steps Distribution for July')
ax.boxplot(steps_per_day['value'], vert=False,manage_ticks=False, notch=True)
plt.xlabel("Steps Per Day")
ax.set_yticks([])
plt.show()
```


![png](media/vis_my_life/output_38_0.png)


Wrapping that all into a single function we get something like this: 


```python
def readFileIntoDataFrame(fName):
    steps_df = pd.read_json(fName,  convert_dates=True)

    steps_df['date_minus_time'] = steps_df["dateTime"].apply( lambda steps_df : 
        datetime.datetime(year=steps_df.year, month=steps_df.month, day=steps_df.day))	

    steps_df.set_index(steps_df["date_minus_time"],inplace=True)
    return steps_df.resample('D').sum()

def graphBoxAndWhiskers(data, title, xlab):
    ax = plt.gca()
    ax.set_title(title)
    ax.boxplot(data['value'], vert=False, manage_ticks=False, notch=True)
    plt.xlabel(xlab)
    ax.set_yticks([])
    plt.show()
```


```python
graphBoxAndWhiskers(readFileIntoDataFrame("data/steps-2020-01-27.json"), "Steps In January", "Steps Per Day")
```


![png](media/vis_my_life/output_41_0.png)


That is cool, but, what if we could view the distribution for each
month in the same graph? Based on the two previous graphs, my step
distribution during July looked distinctly different from my step
distribution in January.  The first difficultly would be to read in
all the files since Fitbit creates a new file for every month. The
next thing would be to group them by month and then graph it. 


```python
import os
files = os.listdir("data")
print(files)
```

    ['steps-2019-04-02.json', 'steps-2019-08-30.json', 'steps-2020-02-26.json', 'steps-2019-10-29.json', 'steps-2019-07-01.json', 'steps-2020-01-27.json', 'steps-2019-07-31.json', 'steps-2019-06-01.json', 'steps-2019-09-29.json', '.ipynb_checkpoints', 'steps-2019-12-28.json', 'steps-2019-05-02.json', 'calories', 'steps-2019-11-28.json', 'sleep']



```python
dfs = []
for file in files: # this can take 15 seconds
    if "steps" in file: # finds the steps files
        dfs.append(readFileIntoDataFrame("data/" + file))
```


```python
stepsPerDay = pd.concat(dfs)
graphBoxAndWhiskers(stepsPerDay, "Steps Per Day Last 11 Months", "Steps per Day")
```


![png](media/vis_my_life/output_45_0.png)



```python
print(type(stepsPerDay['value'].to_numpy()))
print(stepsPerDay['value'].keys())

stepsPerDay['month'] = pd.DatetimeIndex(stepsPerDay['value'].keys()).month 
stepsPerDay['week_day'] = pd.DatetimeIndex(stepsPerDay['value'].keys()).weekday

print(stepsPerDay)
```

    <class 'numpy.ndarray'>
    DatetimeIndex(['2019-04-03', '2019-04-04', '2019-04-05', '2019-04-06',
                   '2019-04-07', '2019-04-08', '2019-04-09', '2019-04-10',
                   '2019-04-11', '2019-04-12',
                   ...
                   '2019-12-19', '2019-12-20', '2019-12-21', '2019-12-22',
                   '2019-12-23', '2019-12-24', '2019-12-25', '2019-12-26',
                   '2019-12-27', '2019-12-28'],
                  dtype='datetime64[ns]', name='date_minus_time', length=342, freq=None)
                     value  month  week_day
    date_minus_time                        
    2019-04-03         510      4         2
    2019-04-04       11453      4         3
    2019-04-05       12684      4         4
    2019-04-06       12910      4         5
    2019-04-07        3368      4         6
    ...                ...    ...       ...
    2019-12-24        5779     12         1
    2019-12-25        4264     12         2
    2019-12-26        4843     12         3
    2019-12-27        9609     12         4
    2019-12-28        2218     12         5

    [342 rows x 3 columns]


## Graphing Steps by Month

Now that we have columns for the total amount of steps per day and the
months, we can plot all the data on a single plot using the group by
operator in the plotting library. 


```python
ax = plt.gca()
ax.set_title('Steps Distribution for July\n')
stepsPerDay.boxplot(column=['value'], by='month',ax=ax, notch=True)
plt.xlabel("Month")
plt.ylabel("Steps Per Day")
plt.show()
```


![png](media/vis_my_life/output_48_0.png)



```python
ax = plt.gca()
ax.set_title('Steps Distribution By Week Day\n')
stepsPerDay.boxplot(column=['value'], by='week_day',ax=ax, notch=True)
plt.xlabel("Week Day")
plt.ylabel("Steps Per Day")
plt.show()
```


![png](media/vis_my_life/output_49_0.png)


## Future Work

Moving forward with this I would like to do more visualizations with
sleep data and heart rate.