jrtechs
/
jrtechs-NodeJSBlog
mirror of https://github.com/jrtechs/NodeJSBlog.git

Let's do a deep dive and start visualizing my life using Fitbit andMatplotlib.  
# What is Fitbit

[Fitbit](https://www.fitbit.com) is a fitness watch that tracks your sleep, heart rate, and activity.Fitbit is able to track your steps, however, it is also able to detect multiple types of activitylike running, walking, "sport" and biking.
# What is Matplotlib

[Matplotlib](https://matplotlib.org/) is a python visualization library that enables you to create bar graphs, line graphs, distributions and many more things.Being able to visualize your results is essential to any person working with data at any scale.Although I like [GGplot](https://ggplot2.tidyverse.org/) in R more than Matplotlib, Matplotlib is still my go to graphing library for Python. 
# Getting Your Fitbit Data

There are two main ways that you can get your Fitbit data: 
- Fitbit API- Data Archival Export

Since connecting to the API and setting up all the web hooks can be apain, I'm just going to use the data export option because this isonly for one person. You can export your data here:[https://www.fitbit.com/settings/data/export](https://www.fitbit.com/settings/data/export).
![Data export on fitbit's website](media/vis_my_life/dataExport.png)
The Fitbit data archive was very organized and kept meticulous recordsof everything.  All of the data was organized in separate JSON fileslabeled by date. Fitbit keeps around 1MB of data on you per day; mostof this data is from the heart rate sensors. Although 1MB of data maysound like a ton of data, it is probably a lot less if you store it informats other than JSON.  When I downloaded the compressed file it was20MB, but when I extracted it, it was 380MB! I've only been usingFitbit for 11 months at this point.  
![compressed data](media/vis_my_life/compression.png)
## Sleep

Sleep is something fun to visualize. No matter how much of it you getyou still feel tired as a college student. In the "sleep_score" folderof the exported data you will find a single CSV file with your restingheart rate and Fitbit's computed sleep scores. Interesting enough,this is the only file that comes in the CSV format, everything else isJSON file.  
We can read in all the data using a single liner with the[Pandas](https://pandas.pydata.org/) python library. 


```pythonimport matplotlib.pyplot as pltimport pandas as pd
sleep_score_df = pd.read_csv('data/sleep/sleep_score.csv')```

```pythonprint(sleep_score_df)```
         sleep_log_entry_id             timestamp  overall_score  \    0           26093459526  2020-02-27T06:04:30Z             80       1           26081303207  2020-02-26T06:13:30Z             83       2           26062481322  2020-02-25T06:00:30Z             82       3           26045941555  2020-02-24T05:49:30Z             79       4           26034268762  2020-02-23T08:35:30Z             75       ..                  ...                   ...            ...       176         23696231032  2019-09-02T07:38:30Z             79       177         23684345925  2019-09-01T07:15:30Z             84       178         23673204871  2019-08-31T07:11:00Z             74       179         23661278483  2019-08-30T06:34:00Z             73       180         23646265400  2019-08-29T05:55:00Z             80   
         composition_score  revitalization_score  duration_score  \    0                   20                    19              41       1                   22                    21              40       2                   22                    21              39       3                   17                    20              42       4                   20                    16              39       ..                 ...                   ...             ...       176                 20                    20              39       177                 22                    21              41       178                 18                    21              35       179                 17                    19              37       180                 21                    21              38   
         deep_sleep_in_minutes  resting_heart_rate  restlessness      0                       65                  60      0.117330      1                       85                  60      0.113188      2                       95                  60      0.120635      3                       52                  61      0.111224      4                       43                  59      0.154774      ..                     ...                 ...           ...      176                     88                  56      0.170923      177                     95                  56      0.133268      178                     73                  56      0.102703      179                     50                  55      0.121086      180                     61                  57      0.112961  
    [181 rows x 9 columns]

With the Pandas library you can generate Matplotlib graphs. Althoughyou can directly use Matplotlib, the wrapper functions using Pandasmakes it easier to use. 
## Sleep Score Histogram


```pythonsleep_score_df.hist(column='overall_score')```


    array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc2c0a270d0>]],          dtype=object)


![png](media/vis_my_life/output_7_1.png)

## Heart Rate

Fitbit keeps their calculated heart rates in the sleep scores filerather than heart. Knowing your resting heart rate is useful becauseit is a good indicator of your overall health. 
![](media/vis_my_life/restingHeartRate.jpg)

```pythonsleep_score_df.hist(column='resting_heart_rate')```


    array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc2917a6090>]],          dtype=object)


![png](media/vis_my_life/output_9_1.png)

## Resting Heart Rate Time Graph

Using the pandas wrapper we can quickly create a heart rate graph overtime. 

```pythonsleep_score_df.plot(kind='line', y='resting_heart_rate', x ='timestamp', legend=False, title="Resting Heart Rate(BPM)")```


    <matplotlib.axes._subplots.AxesSubplot at 0x7fc28f609b50>


![png](media/vis_my_life/output_11_1.png)

However, as we notice with the graph above, the time axis is wack. Inthe pandas data frame everything was stored as a string timestamp. Wecan convert this into a datetime object by telling pandas to parse thedate as it reads it. 

```pythonsleep_score_df = pd.read_csv('data/sleep/sleep_score.csv', parse_dates=[1])sleep_score_df.plot(kind='line', y='resting_heart_rate', x ='timestamp', legend=False, title="Resting Heart Rate(BPM)")```


    <matplotlib.axes._subplots.AxesSubplot at 0x7fc28f533510>


![png](media/vis_my_life/output_13_1.png)

To fully manipulate the graphs, we need to use some matplotlib code todo things like setting the axis labels or make multiple plots rightnext to each other. We can create grab the current axis being used bymatplotlib by using plt.gca(). 

```pythonax = plt.gca()sleep_score_df.plot(kind='line', y='resting_heart_rate', x ='timestamp', legend=False, title="Resting Heart Rate Graph", ax=ax, figsize=(10, 5))plt.xlabel("Date")plt.ylabel("Resting Heart Rate (BPM)")plt.show()
#plt.savefig('restingHeartRate.svg')
```

![png](media/vis_my_life/output_15_0.png)

The same thing can be done with sleep scores. It is interesting tonote that  the sleep scores rarely vary anything between 75 and 85. 

```pythonax = plt.gca()sleep_score_df.plot(kind='line', y='overall_score', x ='timestamp', legend=False, title="Sleep Score Time Series Graph", ax=ax)plt.xlabel("Date")plt.ylabel("Fitbit's Sleep Score")plt.show()```

![png](media/vis_my_life/output_17_0.png)

Using Pandas we can generate a new column with a specific dateattribute like year, day, month, or weekday. If we add a new columnfor weekday, we can then group by weekday and collapse them all into asingle column by summing or averaging the value. 

```pythontemp = pd.DatetimeIndex(sleep_score_df['timestamp'])sleep_score_df['weekday'] = temp.weekday
print(sleep_score_df)```
         sleep_log_entry_id                 timestamp  overall_score  \    0           26093459526 2020-02-27 06:04:30+00:00             80       1           26081303207 2020-02-26 06:13:30+00:00             83       2           26062481322 2020-02-25 06:00:30+00:00             82       3           26045941555 2020-02-24 05:49:30+00:00             79       4           26034268762 2020-02-23 08:35:30+00:00             75       ..                  ...                       ...            ...       176         23696231032 2019-09-02 07:38:30+00:00             79       177         23684345925 2019-09-01 07:15:30+00:00             84       178         23673204871 2019-08-31 07:11:00+00:00             74       179         23661278483 2019-08-30 06:34:00+00:00             73       180         23646265400 2019-08-29 05:55:00+00:00             80   
         composition_score  revitalization_score  duration_score  \    0                   20                    19              41       1                   22                    21              40       2                   22                    21              39       3                   17                    20              42       4                   20                    16              39       ..                 ...                   ...             ...       176                 20                    20              39       177                 22                    21              41       178                 18                    21              35       179                 17                    19              37       180                 21                    21              38   
         deep_sleep_in_minutes  resting_heart_rate  restlessness  weekday      0                       65                  60      0.117330        3      1                       85                  60      0.113188        2      2                       95                  60      0.120635        1      3                       52                  61      0.111224        0      4                       43                  59      0.154774        6      ..                     ...                 ...           ...      ...      176                     88                  56      0.170923        0      177                     95                  56      0.133268        6      178                     73                  56      0.102703        5      179                     50                  55      0.121086        4      180                     61                  57      0.112961        3  
    [181 rows x 10 columns]


```pythonprint(sleep_score_df.groupby('weekday').mean())```
             sleep_log_entry_id  overall_score  composition_score  \    weekday                                                             0              2.483733e+10      79.576923          20.269231       1              2.485200e+10      77.423077          20.423077       2              2.490383e+10      80.880000          21.120000       3              2.483418e+10      76.814815          20.370370       4              2.480085e+10      79.769231          20.961538       5              2.477002e+10      78.840000          20.520000       6              2.482581e+10      77.230769          20.269231   
             revitalization_score  duration_score  deep_sleep_in_minutes  \    weekday                                                                    0                   19.153846       40.153846              88.000000       1                   19.000000       38.000000              83.846154       2                   19.400000       40.360000              93.760000       3                   19.037037       37.407407              82.592593       4                   19.346154       39.461538              94.461538       5                   19.080000       39.240000              93.720000       6                   18.269231       38.692308              89.423077   
             resting_heart_rate  restlessness      weekday                                        0                 58.576923      0.139440      1                 58.538462      0.142984      2                 58.560000      0.138661      3                 58.333333      0.135819      4                 58.269231      0.129791      5                 58.080000      0.138315      6                 58.153846      0.147171  

## Sleep Score Based on Day


```pythonax = plt.gca()sleep_score_df.groupby('weekday').mean().plot(kind='line', y='overall_score', ax = ax)plt.ylabel("Sleep Score")plt.title("Sleep Scores on Varying Days of Week")plt.show()```

![png](media/vis_my_life/output_22_0.png)

## Sleep Score Based on Days of Week


```pythonax = plt.gca()sleep_score_df.groupby('weekday').mean().plot(kind='line', y='resting_heart_rate', ax = ax)plt.ylabel("Resting heart rate (BPM)")plt.title("Resting Heart Rate Varying Days of Week")plt.show()```

![png](media/vis_my_life/output_24_0.png)

# Calories

Fitbit keeps all of their calorie data in JSON files representingsequence data at 1 minute increments. To extrapolate calorie data weneed to group by day and then sum the days to get the total caloriesburned per day. 

```pythoncalories_df = pd.read_json("data/calories/calories-2019-07-01.json",  convert_dates=True)```

```pythonprint(calories_df)```
                     dateTime  value    0     2019-07-01 00:00:00   1.07    1     2019-07-01 00:01:00   1.07    2     2019-07-01 00:02:00   1.07    3     2019-07-01 00:03:00   1.07    4     2019-07-01 00:04:00   1.07    ...                   ...    ...    43195 2019-07-30 23:55:00   1.07    43196 2019-07-30 23:56:00   1.07    43197 2019-07-30 23:57:00   1.07    43198 2019-07-30 23:58:00   1.07    43199 2019-07-30 23:59:00   1.07
    [43200 rows x 2 columns]


```pythonimport datetimecalories_df['date_minus_time'] = calories_df["dateTime"].apply( lambda calories_df :     datetime.datetime(year=calories_df.year, month=calories_df.month, day=calories_df.day))	
calories_df.set_index(calories_df["date_minus_time"],inplace=True)
print(calories_df)```
                               dateTime  value date_minus_time    date_minus_time                                               2019-07-01      2019-07-01 00:00:00   1.07      2019-07-01    2019-07-01      2019-07-01 00:01:00   1.07      2019-07-01    2019-07-01      2019-07-01 00:02:00   1.07      2019-07-01    2019-07-01      2019-07-01 00:03:00   1.07      2019-07-01    2019-07-01      2019-07-01 00:04:00   1.07      2019-07-01    ...                             ...    ...             ...    2019-07-30      2019-07-30 23:55:00   1.07      2019-07-30    2019-07-30      2019-07-30 23:56:00   1.07      2019-07-30    2019-07-30      2019-07-30 23:57:00   1.07      2019-07-30    2019-07-30      2019-07-30 23:58:00   1.07      2019-07-30    2019-07-30      2019-07-30 23:59:00   1.07      2019-07-30
    [43200 rows x 3 columns]


```pythoncalories_per_day = calories_df.resample('D').sum()print(calories_per_day)```
                       value    date_minus_time             2019-07-01       3422.68    2019-07-02       2705.85    2019-07-03       2871.73    2019-07-04       4089.93    2019-07-05       3917.91    2019-07-06       2762.55    2019-07-07       2929.58    2019-07-08       2698.99    2019-07-09       2833.27    2019-07-10       2529.21    2019-07-11       2634.25    2019-07-12       2953.91    2019-07-13       4247.45    2019-07-14       2998.35    2019-07-15       2846.18    2019-07-16       3084.39    2019-07-17       2331.06    2019-07-18       2849.20    2019-07-19       2071.63    2019-07-20       2746.25    2019-07-21       2562.11    2019-07-22       1892.99    2019-07-23       2372.89    2019-07-24       2320.42    2019-07-25       2140.87    2019-07-26       2430.38    2019-07-27       3769.04    2019-07-28       2036.24    2019-07-29       2814.87    2019-07-30       2077.82


```pythonax = plt.gca()calories_per_day.plot(kind='hist', title="Calorie Distribution", legend=False, ax=ax)plt.show()```

![png](media/vis_my_life/output_30_0.png)


```pythonax = plt.gca()calories_per_day.plot(kind='line', y='value', legend=False, title="Calories Per Day", ax=ax)plt.xlabel("Date")plt.ylabel("Calories")plt.show()```

![png](media/vis_my_life/output_31_0.png)

## Calories Per Day Box Plot

Using this data we can turn this into a boxplot to make it easier tovisualize the distribution of calories burned during the month ofJuly. 

```pythonax = plt.gca()ax.set_title('Calorie Distribution for July')ax.boxplot(calories_per_day['value'], vert=False,manage_ticks=False, notch=True)plt.xlabel("Calories Burned")ax.set_yticks([])plt.show()```

![png](media/vis_my_life/output_33_0.png)

# Steps

Fitbit is known for taking the amount of steps someone takes per day.Similar to calories burned, steps taken is stored in time series dataat 1 minute increments. Since we are interested at the day level data,we need to first remove the time component of the dataframe so that wecan group all the data by date. Once we have everything grouped bydate, we can sum and produce steps per day.  

```pythonsteps_df = pd.read_json("data/steps-2019-07-01.json",  convert_dates=True)
steps_df['date_minus_time'] = steps_df["dateTime"].apply( lambda steps_df :     datetime.datetime(year=steps_df.year, month=steps_df.month, day=steps_df.day))	
steps_df.set_index(steps_df["date_minus_time"],inplace=True)print(steps_df)```
                               dateTime  value date_minus_time    date_minus_time                                               2019-07-01      2019-07-01 04:00:00      0      2019-07-01    2019-07-01      2019-07-01 04:01:00      0      2019-07-01    2019-07-01      2019-07-01 04:02:00      0      2019-07-01    2019-07-01      2019-07-01 04:03:00      0      2019-07-01    2019-07-01      2019-07-01 04:04:00      0      2019-07-01    ...                             ...    ...             ...    2019-07-31      2019-07-31 03:55:00      0      2019-07-31    2019-07-31      2019-07-31 03:56:00      0      2019-07-31    2019-07-31      2019-07-31 03:57:00      0      2019-07-31    2019-07-31      2019-07-31 03:58:00      0      2019-07-31    2019-07-31      2019-07-31 03:59:00      0      2019-07-31
    [41116 rows x 3 columns]


```pythonsteps_per_day = steps_df.resample('D').sum()print(steps_per_day)```
                     value    date_minus_time           2019-07-01       11285    2019-07-02        4957    2019-07-03       13119    2019-07-04       16034    2019-07-05       11634    2019-07-06        6860    2019-07-07        3758    2019-07-08        9130    2019-07-09       10960    2019-07-10        7012    2019-07-11        5420    2019-07-12        4051    2019-07-13       15980    2019-07-14       23109    2019-07-15       11247    2019-07-16       10170    2019-07-17        4905    2019-07-18       10769    2019-07-19        4504    2019-07-20        5032    2019-07-21        8953    2019-07-22        2200    2019-07-23        9392    2019-07-24        5666    2019-07-25        5016    2019-07-26        5879    2019-07-27       19492    2019-07-28        4987    2019-07-29        9943    2019-07-30        3897    2019-07-31         166

## Steps Per Day Histogram

After the data is in the form that we want, graphing the data isstraight forward. Two added things I like to do for normal box plotsis to set the displays to horizontal add the notches.  

```pythonax = plt.gca()ax.set_title('Steps Distribution for July')ax.boxplot(steps_per_day['value'], vert=False,manage_ticks=False, notch=True)plt.xlabel("Steps Per Day")ax.set_yticks([])plt.show()```

![png](media/vis_my_life/output_38_0.png)

Wrapping that all into a single function we get something like this: 

```pythondef readFileIntoDataFrame(fName):    steps_df = pd.read_json(fName,  convert_dates=True)
    steps_df['date_minus_time'] = steps_df["dateTime"].apply( lambda steps_df :         datetime.datetime(year=steps_df.year, month=steps_df.month, day=steps_df.day))	
    steps_df.set_index(steps_df["date_minus_time"],inplace=True)    return steps_df.resample('D').sum()
def graphBoxAndWhiskers(data, title, xlab):    ax = plt.gca()    ax.set_title(title)    ax.boxplot(data['value'], vert=False, manage_ticks=False, notch=True)    plt.xlabel(xlab)    ax.set_yticks([])    plt.show()```

```pythongraphBoxAndWhiskers(readFileIntoDataFrame("data/steps-2020-01-27.json"), "Steps In January", "Steps Per Day")```

![png](media/vis_my_life/output_41_0.png)

That is cool, but, what if we could view the distribution for eachmonth in the same graph? Based on the two previous graphs, my stepdistribution during July looked distinctly different from my stepdistribution in January.  The first difficultly would be to read inall the files since Fitbit creates a new file for every month. Thenext thing would be to group them by month and then graph it. 

```pythonimport osfiles = os.listdir("data")print(files)```
    ['steps-2019-04-02.json', 'steps-2019-08-30.json', 'steps-2020-02-26.json', 'steps-2019-10-29.json', 'steps-2019-07-01.json', 'steps-2020-01-27.json', 'steps-2019-07-31.json', 'steps-2019-06-01.json', 'steps-2019-09-29.json', '.ipynb_checkpoints', 'steps-2019-12-28.json', 'steps-2019-05-02.json', 'calories', 'steps-2019-11-28.json', 'sleep']


```pythondfs = []for file in files: # this can take 15 seconds    if "steps" in file: # finds the steps files        dfs.append(readFileIntoDataFrame("data/" + file))```

```pythonstepsPerDay = pd.concat(dfs)graphBoxAndWhiskers(stepsPerDay, "Steps Per Day Last 11 Months", "Steps per Day")```

![png](media/vis_my_life/output_45_0.png)


```pythonprint(type(stepsPerDay['value'].to_numpy()))print(stepsPerDay['value'].keys())
stepsPerDay['month'] = pd.DatetimeIndex(stepsPerDay['value'].keys()).month stepsPerDay['week_day'] = pd.DatetimeIndex(stepsPerDay['value'].keys()).weekday
print(stepsPerDay)```
    <class 'numpy.ndarray'>    DatetimeIndex(['2019-04-03', '2019-04-04', '2019-04-05', '2019-04-06',                   '2019-04-07', '2019-04-08', '2019-04-09', '2019-04-10',                   '2019-04-11', '2019-04-12',                   ...                   '2019-12-19', '2019-12-20', '2019-12-21', '2019-12-22',                   '2019-12-23', '2019-12-24', '2019-12-25', '2019-12-26',                   '2019-12-27', '2019-12-28'],                  dtype='datetime64[ns]', name='date_minus_time', length=342, freq=None)                     value  month  week_day    date_minus_time                            2019-04-03         510      4         2    2019-04-04       11453      4         3    2019-04-05       12684      4         4    2019-04-06       12910      4         5    2019-04-07        3368      4         6    ...                ...    ...       ...    2019-12-24        5779     12         1    2019-12-25        4264     12         2    2019-12-26        4843     12         3    2019-12-27        9609     12         4    2019-12-28        2218     12         5
    [342 rows x 3 columns]

## Graphing Steps by Month

Now that we have columns for the total amount of steps per day and themonths, we can plot all the data on a single plot using the group byoperator in the plotting library. 

```pythonax = plt.gca()ax.set_title('Steps Distribution for July\n')stepsPerDay.boxplot(column=['value'], by='month',ax=ax, notch=True)plt.xlabel("Month")plt.ylabel("Steps Per Day")plt.show()```

![png](media/vis_my_life/output_48_0.png)


```pythonax = plt.gca()ax.set_title('Steps Distribution By Week Day\n')stepsPerDay.boxplot(column=['value'], by='week_day',ax=ax, notch=True)plt.xlabel("Week Day")plt.ylabel("Steps Per Day")plt.show()```

![png](media/vis_my_life/output_49_0.png)

## Future Work

Moving forward with this I would like to do more visualizations withsleep data and heart rate.