| @ -0,0 +1,741 @@ | |||||
| Let's do a deep dive and start visualizing my life using Fitbit and | |||||
| Matplotlib. | |||||
| # What is Fitbit | |||||
| [Fitbit](https://www.fitbit.com) is a fitness watch that tracks your sleep, heart rate, and activity. | |||||
| Fitbit is able to track your steps, however, it is also able to detect multiple types of activity | |||||
| like running, walking, "sport" and biking. | |||||
| # What is Matplotlib | |||||
| [Matplotlib](https://matplotlib.org/) is a python visualization library that enables you to create bar graphs, line graphs, distributions and many more things. | |||||
| Being able to visualize your results is essential to any person working with data at any scale. | |||||
| Although I like [GGplot](https://ggplot2.tidyverse.org/) in R more than Matplotlib, Matplotlib is still my go to graphing library for Python. | |||||
| # Getting Your Fitbit Data | |||||
| There are two main ways that you can get your Fitbit data: | |||||
| - Fitbit API | |||||
| - Data Archival Export | |||||
| Since connecting to the API and setting up all the web hooks can be a | |||||
| pain, I'm just going to use the data export option because this is | |||||
| only for one person. You can export your data here: | |||||
| [https://www.fitbit.com/settings/data/export](https://www.fitbit.com/settings/data/export). | |||||
|  | |||||
| The Fitbit data archive was very organized and kept meticulous records | |||||
| of everything. All of the data was organized in separate JSON files | |||||
| labeled by date. Fitbit keeps around 1MB of data on you per day; most | |||||
| of this data is from the heart rate sensors. Although 1MB of data may | |||||
| sound like a ton of data, it is probably a lot less if you store it in | |||||
| formats other than JSON. When I downloaded the compressed file it was | |||||
| 20MB, but when I extracted it, it was 380MB! I've only been using | |||||
| Fitbit for 11 months at this point. | |||||
|  | |||||
| ## Sleep | |||||
| Sleep is something fun to visualize. No matter how much of it you get | |||||
| you still feel tired as a college student. In the "sleep_score" folder | |||||
| of the exported data you will find a single CSV file with your resting | |||||
| heart rate and Fitbit's computed sleep scores. Interesting enough, | |||||
| this is the only file that comes in the CSV format, everything else is | |||||
| JSON file. | |||||
| We can read in all the data using a single liner with the | |||||
| [Pandas](https://pandas.pydata.org/) python library. | |||||
| ```python | |||||
| import matplotlib.pyplot as plt | |||||
| import pandas as pd | |||||
| sleep_score_df = pd.read_csv('data/sleep/sleep_score.csv') | |||||
| ``` | |||||
| ```python | |||||
| print(sleep_score_df) | |||||
| ``` | |||||
| sleep_log_entry_id timestamp overall_score \ | |||||
| 0 26093459526 2020-02-27T06:04:30Z 80 | |||||
| 1 26081303207 2020-02-26T06:13:30Z 83 | |||||
| 2 26062481322 2020-02-25T06:00:30Z 82 | |||||
| 3 26045941555 2020-02-24T05:49:30Z 79 | |||||
| 4 26034268762 2020-02-23T08:35:30Z 75 | |||||
| .. ... ... ... | |||||
| 176 23696231032 2019-09-02T07:38:30Z 79 | |||||
| 177 23684345925 2019-09-01T07:15:30Z 84 | |||||
| 178 23673204871 2019-08-31T07:11:00Z 74 | |||||
| 179 23661278483 2019-08-30T06:34:00Z 73 | |||||
| 180 23646265400 2019-08-29T05:55:00Z 80 | |||||
| composition_score revitalization_score duration_score \ | |||||
| 0 20 19 41 | |||||
| 1 22 21 40 | |||||
| 2 22 21 39 | |||||
| 3 17 20 42 | |||||
| 4 20 16 39 | |||||
| .. ... ... ... | |||||
| 176 20 20 39 | |||||
| 177 22 21 41 | |||||
| 178 18 21 35 | |||||
| 179 17 19 37 | |||||
| 180 21 21 38 | |||||
| deep_sleep_in_minutes resting_heart_rate restlessness | |||||
| 0 65 60 0.117330 | |||||
| 1 85 60 0.113188 | |||||
| 2 95 60 0.120635 | |||||
| 3 52 61 0.111224 | |||||
| 4 43 59 0.154774 | |||||
| .. ... ... ... | |||||
| 176 88 56 0.170923 | |||||
| 177 95 56 0.133268 | |||||
| 178 73 56 0.102703 | |||||
| 179 50 55 0.121086 | |||||
| 180 61 57 0.112961 | |||||
| [181 rows x 9 columns] | |||||
| With the Pandas library you can generate Matplotlib graphs. Although | |||||
| you can directly use Matplotlib, the wrapper functions using Pandas | |||||
| makes it easier to use. | |||||
| ## Sleep Score Histogram | |||||
| ```python | |||||
| sleep_score_df.hist(column='overall_score') | |||||
| ``` | |||||
| array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc2c0a270d0>]], | |||||
| dtype=object) | |||||
|  | |||||
| ## Heart Rate | |||||
| Fitbit keeps their calculated heart rates in the sleep scores file | |||||
| rather than heart. Knowing your resting heart rate is useful because | |||||
| it is a good indicator of your overall health. | |||||
|  | |||||
| ```python | |||||
| sleep_score_df.hist(column='resting_heart_rate') | |||||
| ``` | |||||
| array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc2917a6090>]], | |||||
| dtype=object) | |||||
|  | |||||
| ## Resting Heart Rate Time Graph | |||||
| Using the pandas wrapper we can quickly create a heart rate graph over | |||||
| time. | |||||
| ```python | |||||
| sleep_score_df.plot(kind='line', y='resting_heart_rate', x ='timestamp', legend=False, title="Resting Heart Rate(BPM)") | |||||
| ``` | |||||
| <matplotlib.axes._subplots.AxesSubplot at 0x7fc28f609b50> | |||||
|  | |||||
| However, as we notice with the graph above, the time axis is wack. In | |||||
| the pandas data frame everything was stored as a string timestamp. We | |||||
| can convert this into a datetime object by telling pandas to parse the | |||||
| date as it reads it. | |||||
| ```python | |||||
| sleep_score_df = pd.read_csv('data/sleep/sleep_score.csv', parse_dates=[1]) | |||||
| sleep_score_df.plot(kind='line', y='resting_heart_rate', x ='timestamp', legend=False, title="Resting Heart Rate(BPM)") | |||||
| ``` | |||||
| <matplotlib.axes._subplots.AxesSubplot at 0x7fc28f533510> | |||||
|  | |||||
| To fully manipulate the graphs, we need to use some matplotlib code to | |||||
| do things like setting the axis labels or make multiple plots right | |||||
| next to each other. We can create grab the current axis being used by | |||||
| matplotlib by using plt.gca(). | |||||
| ```python | |||||
| ax = plt.gca() | |||||
| sleep_score_df.plot(kind='line', y='resting_heart_rate', x ='timestamp', legend=False, title="Resting Heart Rate Graph", ax=ax, figsize=(10, 5)) | |||||
| plt.xlabel("Date") | |||||
| plt.ylabel("Resting Heart Rate (BPM)") | |||||
| plt.show() | |||||
| #plt.savefig('restingHeartRate.svg') | |||||
| ``` | |||||
|  | |||||
| The same thing can be done with sleep scores. It is interesting to | |||||
| note that the sleep scores rarely vary anything between 75 and 85. | |||||
| ```python | |||||
| ax = plt.gca() | |||||
| sleep_score_df.plot(kind='line', y='overall_score', x ='timestamp', legend=False, title="Sleep Score Time Series Graph", ax=ax) | |||||
| plt.xlabel("Date") | |||||
| plt.ylabel("Fitbit's Sleep Score") | |||||
| plt.show() | |||||
| ``` | |||||
|  | |||||
| Using Pandas we can generate a new column with a specific date | |||||
| attribute like year, day, month, or weekday. If we add a new column | |||||
| for weekday, we can then group by weekday and collapse them all into a | |||||
| single column by summing or averaging the value. | |||||
| ```python | |||||
| temp = pd.DatetimeIndex(sleep_score_df['timestamp']) | |||||
| sleep_score_df['weekday'] = temp.weekday | |||||
| print(sleep_score_df) | |||||
| ``` | |||||
| sleep_log_entry_id timestamp overall_score \ | |||||
| 0 26093459526 2020-02-27 06:04:30+00:00 80 | |||||
| 1 26081303207 2020-02-26 06:13:30+00:00 83 | |||||
| 2 26062481322 2020-02-25 06:00:30+00:00 82 | |||||
| 3 26045941555 2020-02-24 05:49:30+00:00 79 | |||||
| 4 26034268762 2020-02-23 08:35:30+00:00 75 | |||||
| .. ... ... ... | |||||
| 176 23696231032 2019-09-02 07:38:30+00:00 79 | |||||
| 177 23684345925 2019-09-01 07:15:30+00:00 84 | |||||
| 178 23673204871 2019-08-31 07:11:00+00:00 74 | |||||
| 179 23661278483 2019-08-30 06:34:00+00:00 73 | |||||
| 180 23646265400 2019-08-29 05:55:00+00:00 80 | |||||
| composition_score revitalization_score duration_score \ | |||||
| 0 20 19 41 | |||||
| 1 22 21 40 | |||||
| 2 22 21 39 | |||||
| 3 17 20 42 | |||||
| 4 20 16 39 | |||||
| .. ... ... ... | |||||
| 176 20 20 39 | |||||
| 177 22 21 41 | |||||
| 178 18 21 35 | |||||
| 179 17 19 37 | |||||
| 180 21 21 38 | |||||
| deep_sleep_in_minutes resting_heart_rate restlessness weekday | |||||
| 0 65 60 0.117330 3 | |||||
| 1 85 60 0.113188 2 | |||||
| 2 95 60 0.120635 1 | |||||
| 3 52 61 0.111224 0 | |||||
| 4 43 59 0.154774 6 | |||||
| .. ... ... ... ... | |||||
| 176 88 56 0.170923 0 | |||||
| 177 95 56 0.133268 6 | |||||
| 178 73 56 0.102703 5 | |||||
| 179 50 55 0.121086 4 | |||||
| 180 61 57 0.112961 3 | |||||
| [181 rows x 10 columns] | |||||
| ```python | |||||
| print(sleep_score_df.groupby('weekday').mean()) | |||||
| ``` | |||||
| sleep_log_entry_id overall_score composition_score \ | |||||
| weekday | |||||
| 0 2.483733e+10 79.576923 20.269231 | |||||
| 1 2.485200e+10 77.423077 20.423077 | |||||
| 2 2.490383e+10 80.880000 21.120000 | |||||
| 3 2.483418e+10 76.814815 20.370370 | |||||
| 4 2.480085e+10 79.769231 20.961538 | |||||
| 5 2.477002e+10 78.840000 20.520000 | |||||
| 6 2.482581e+10 77.230769 20.269231 | |||||
| revitalization_score duration_score deep_sleep_in_minutes \ | |||||
| weekday | |||||
| 0 19.153846 40.153846 88.000000 | |||||
| 1 19.000000 38.000000 83.846154 | |||||
| 2 19.400000 40.360000 93.760000 | |||||
| 3 19.037037 37.407407 82.592593 | |||||
| 4 19.346154 39.461538 94.461538 | |||||
| 5 19.080000 39.240000 93.720000 | |||||
| 6 18.269231 38.692308 89.423077 | |||||
| resting_heart_rate restlessness | |||||
| weekday | |||||
| 0 58.576923 0.139440 | |||||
| 1 58.538462 0.142984 | |||||
| 2 58.560000 0.138661 | |||||
| 3 58.333333 0.135819 | |||||
| 4 58.269231 0.129791 | |||||
| 5 58.080000 0.138315 | |||||
| 6 58.153846 0.147171 | |||||
| ## Sleep Score Based on Day | |||||
| ```python | |||||
| ax = plt.gca() | |||||
| sleep_score_df.groupby('weekday').mean().plot(kind='line', y='overall_score', ax = ax) | |||||
| plt.ylabel("Sleep Score") | |||||
| plt.title("Sleep Scores on Varying Days of Week") | |||||
| plt.show() | |||||
| ``` | |||||
|  | |||||
| ## Sleep Score Based on Days of Week | |||||
| ```python | |||||
| ax = plt.gca() | |||||
| sleep_score_df.groupby('weekday').mean().plot(kind='line', y='resting_heart_rate', ax = ax) | |||||
| plt.ylabel("Resting heart rate (BPM)") | |||||
| plt.title("Resting Heart Rate Varying Days of Week") | |||||
| plt.show() | |||||
| ``` | |||||
|  | |||||
| # Calories | |||||
| Fitbit keeps all of their calorie data in JSON files representing | |||||
| sequence data at 1 minute increments. To extrapolate calorie data we | |||||
| need to group by day and then sum the days to get the total calories | |||||
| burned per day. | |||||
| ```python | |||||
| calories_df = pd.read_json("data/calories/calories-2019-07-01.json", convert_dates=True) | |||||
| ``` | |||||
| ```python | |||||
| print(calories_df) | |||||
| ``` | |||||
| dateTime value | |||||
| 0 2019-07-01 00:00:00 1.07 | |||||
| 1 2019-07-01 00:01:00 1.07 | |||||
| 2 2019-07-01 00:02:00 1.07 | |||||
| 3 2019-07-01 00:03:00 1.07 | |||||
| 4 2019-07-01 00:04:00 1.07 | |||||
| ... ... ... | |||||
| 43195 2019-07-30 23:55:00 1.07 | |||||
| 43196 2019-07-30 23:56:00 1.07 | |||||
| 43197 2019-07-30 23:57:00 1.07 | |||||
| 43198 2019-07-30 23:58:00 1.07 | |||||
| 43199 2019-07-30 23:59:00 1.07 | |||||
| [43200 rows x 2 columns] | |||||
| ```python | |||||
| import datetime | |||||
| calories_df['date_minus_time'] = calories_df["dateTime"].apply( lambda calories_df : | |||||
| datetime.datetime(year=calories_df.year, month=calories_df.month, day=calories_df.day)) | |||||
| calories_df.set_index(calories_df["date_minus_time"],inplace=True) | |||||
| print(calories_df) | |||||
| ``` | |||||
| dateTime value date_minus_time | |||||
| date_minus_time | |||||
| 2019-07-01 2019-07-01 00:00:00 1.07 2019-07-01 | |||||
| 2019-07-01 2019-07-01 00:01:00 1.07 2019-07-01 | |||||
| 2019-07-01 2019-07-01 00:02:00 1.07 2019-07-01 | |||||
| 2019-07-01 2019-07-01 00:03:00 1.07 2019-07-01 | |||||
| 2019-07-01 2019-07-01 00:04:00 1.07 2019-07-01 | |||||
| ... ... ... ... | |||||
| 2019-07-30 2019-07-30 23:55:00 1.07 2019-07-30 | |||||
| 2019-07-30 2019-07-30 23:56:00 1.07 2019-07-30 | |||||
| 2019-07-30 2019-07-30 23:57:00 1.07 2019-07-30 | |||||
| 2019-07-30 2019-07-30 23:58:00 1.07 2019-07-30 | |||||
| 2019-07-30 2019-07-30 23:59:00 1.07 2019-07-30 | |||||
| [43200 rows x 3 columns] | |||||
| ```python | |||||
| calories_per_day = calories_df.resample('D').sum() | |||||
| print(calories_per_day) | |||||
| ``` | |||||
| value | |||||
| date_minus_time | |||||
| 2019-07-01 3422.68 | |||||
| 2019-07-02 2705.85 | |||||
| 2019-07-03 2871.73 | |||||
| 2019-07-04 4089.93 | |||||
| 2019-07-05 3917.91 | |||||
| 2019-07-06 2762.55 | |||||
| 2019-07-07 2929.58 | |||||
| 2019-07-08 2698.99 | |||||
| 2019-07-09 2833.27 | |||||
| 2019-07-10 2529.21 | |||||
| 2019-07-11 2634.25 | |||||
| 2019-07-12 2953.91 | |||||
| 2019-07-13 4247.45 | |||||
| 2019-07-14 2998.35 | |||||
| 2019-07-15 2846.18 | |||||
| 2019-07-16 3084.39 | |||||
| 2019-07-17 2331.06 | |||||
| 2019-07-18 2849.20 | |||||
| 2019-07-19 2071.63 | |||||
| 2019-07-20 2746.25 | |||||
| 2019-07-21 2562.11 | |||||
| 2019-07-22 1892.99 | |||||
| 2019-07-23 2372.89 | |||||
| 2019-07-24 2320.42 | |||||
| 2019-07-25 2140.87 | |||||
| 2019-07-26 2430.38 | |||||
| 2019-07-27 3769.04 | |||||
| 2019-07-28 2036.24 | |||||
| 2019-07-29 2814.87 | |||||
| 2019-07-30 2077.82 | |||||
| ```python | |||||
| ax = plt.gca() | |||||
| calories_per_day.plot(kind='hist', title="Calorie Distribution", legend=False, ax=ax) | |||||
| plt.show() | |||||
| ``` | |||||
|  | |||||
| ```python | |||||
| ax = plt.gca() | |||||
| calories_per_day.plot(kind='line', y='value', legend=False, title="Calories Per Day", ax=ax) | |||||
| plt.xlabel("Date") | |||||
| plt.ylabel("Calories") | |||||
| plt.show() | |||||
| ``` | |||||
|  | |||||
| ## Calories Per Day Box Plot | |||||
| Using this data we can turn this into a boxplot to make it easier to | |||||
| visualize the distribution of calories burned during the month of | |||||
| July. | |||||
| ```python | |||||
| ax = plt.gca() | |||||
| ax.set_title('Calorie Distribution for July') | |||||
| ax.boxplot(calories_per_day['value'], vert=False,manage_ticks=False, notch=True) | |||||
| plt.xlabel("Calories Burned") | |||||
| ax.set_yticks([]) | |||||
| plt.show() | |||||
| ``` | |||||
|  | |||||
| # Steps | |||||
| Fitbit is known for taking the amount of steps someone takes per day. | |||||
| Similar to calories burned, steps taken is stored in time series data | |||||
| at 1 minute increments. Since we are interested at the day level data, | |||||
| we need to first remove the time component of the dataframe so that we | |||||
| can group all the data by date. Once we have everything grouped by | |||||
| date, we can sum and produce steps per day. | |||||
| ```python | |||||
| steps_df = pd.read_json("data/steps-2019-07-01.json", convert_dates=True) | |||||
| steps_df['date_minus_time'] = steps_df["dateTime"].apply( lambda steps_df : | |||||
| datetime.datetime(year=steps_df.year, month=steps_df.month, day=steps_df.day)) | |||||
| steps_df.set_index(steps_df["date_minus_time"],inplace=True) | |||||
| print(steps_df) | |||||
| ``` | |||||
| dateTime value date_minus_time | |||||
| date_minus_time | |||||
| 2019-07-01 2019-07-01 04:00:00 0 2019-07-01 | |||||
| 2019-07-01 2019-07-01 04:01:00 0 2019-07-01 | |||||
| 2019-07-01 2019-07-01 04:02:00 0 2019-07-01 | |||||
| 2019-07-01 2019-07-01 04:03:00 0 2019-07-01 | |||||
| 2019-07-01 2019-07-01 04:04:00 0 2019-07-01 | |||||
| ... ... ... ... | |||||
| 2019-07-31 2019-07-31 03:55:00 0 2019-07-31 | |||||
| 2019-07-31 2019-07-31 03:56:00 0 2019-07-31 | |||||
| 2019-07-31 2019-07-31 03:57:00 0 2019-07-31 | |||||
| 2019-07-31 2019-07-31 03:58:00 0 2019-07-31 | |||||
| 2019-07-31 2019-07-31 03:59:00 0 2019-07-31 | |||||
| [41116 rows x 3 columns] | |||||
| ```python | |||||
| steps_per_day = steps_df.resample('D').sum() | |||||
| print(steps_per_day) | |||||
| ``` | |||||
| value | |||||
| date_minus_time | |||||
| 2019-07-01 11285 | |||||
| 2019-07-02 4957 | |||||
| 2019-07-03 13119 | |||||
| 2019-07-04 16034 | |||||
| 2019-07-05 11634 | |||||
| 2019-07-06 6860 | |||||
| 2019-07-07 3758 | |||||
| 2019-07-08 9130 | |||||
| 2019-07-09 10960 | |||||
| 2019-07-10 7012 | |||||
| 2019-07-11 5420 | |||||
| 2019-07-12 4051 | |||||
| 2019-07-13 15980 | |||||
| 2019-07-14 23109 | |||||
| 2019-07-15 11247 | |||||
| 2019-07-16 10170 | |||||
| 2019-07-17 4905 | |||||
| 2019-07-18 10769 | |||||
| 2019-07-19 4504 | |||||
| 2019-07-20 5032 | |||||
| 2019-07-21 8953 | |||||
| 2019-07-22 2200 | |||||
| 2019-07-23 9392 | |||||
| 2019-07-24 5666 | |||||
| 2019-07-25 5016 | |||||
| 2019-07-26 5879 | |||||
| 2019-07-27 19492 | |||||
| 2019-07-28 4987 | |||||
| 2019-07-29 9943 | |||||
| 2019-07-30 3897 | |||||
| 2019-07-31 166 | |||||
| ## Steps Per Day Histogram | |||||
| After the data is in the form that we want, graphing the data is | |||||
| straight forward. Two added things I like to do for normal box plots | |||||
| is to set the displays to horizontal add the notches. | |||||
| ```python | |||||
| ax = plt.gca() | |||||
| ax.set_title('Steps Distribution for July') | |||||
| ax.boxplot(steps_per_day['value'], vert=False,manage_ticks=False, notch=True) | |||||
| plt.xlabel("Steps Per Day") | |||||
| ax.set_yticks([]) | |||||
| plt.show() | |||||
| ``` | |||||
|  | |||||
| Wrapping that all into a single function we get something like this: | |||||
| ```python | |||||
| def readFileIntoDataFrame(fName): | |||||
| steps_df = pd.read_json(fName, convert_dates=True) | |||||
| steps_df['date_minus_time'] = steps_df["dateTime"].apply( lambda steps_df : | |||||
| datetime.datetime(year=steps_df.year, month=steps_df.month, day=steps_df.day)) | |||||
| steps_df.set_index(steps_df["date_minus_time"],inplace=True) | |||||
| return steps_df.resample('D').sum() | |||||
| def graphBoxAndWhiskers(data, title, xlab): | |||||
| ax = plt.gca() | |||||
| ax.set_title(title) | |||||
| ax.boxplot(data['value'], vert=False, manage_ticks=False, notch=True) | |||||
| plt.xlabel(xlab) | |||||
| ax.set_yticks([]) | |||||
| plt.show() | |||||
| ``` | |||||
| ```python | |||||
| graphBoxAndWhiskers(readFileIntoDataFrame("data/steps-2020-01-27.json"), "Steps In January", "Steps Per Day") | |||||
| ``` | |||||
|  | |||||
| That is cool, but, what if we could view the distribution for each | |||||
| month in the same graph? Based on the two previous graphs, my step | |||||
| distribution during July looked distinctly different from my step | |||||
| distribution in January. The first difficultly would be to read in | |||||
| all the files since Fitbit creates a new file for every month. The | |||||
| next thing would be to group them by month and then graph it. | |||||
| ```python | |||||
| import os | |||||
| files = os.listdir("data") | |||||
| print(files) | |||||
| ``` | |||||
| ['steps-2019-04-02.json', 'steps-2019-08-30.json', 'steps-2020-02-26.json', 'steps-2019-10-29.json', 'steps-2019-07-01.json', 'steps-2020-01-27.json', 'steps-2019-07-31.json', 'steps-2019-06-01.json', 'steps-2019-09-29.json', '.ipynb_checkpoints', 'steps-2019-12-28.json', 'steps-2019-05-02.json', 'calories', 'steps-2019-11-28.json', 'sleep'] | |||||
| ```python | |||||
| dfs = [] | |||||
| for file in files: # this can take 15 seconds | |||||
| if "steps" in file: # finds the steps files | |||||
| dfs.append(readFileIntoDataFrame("data/" + file)) | |||||
| ``` | |||||
| ```python | |||||
| stepsPerDay = pd.concat(dfs) | |||||
| graphBoxAndWhiskers(stepsPerDay, "Steps Per Day Last 11 Months", "Steps per Day") | |||||
| ``` | |||||
|  | |||||
| ```python | |||||
| print(type(stepsPerDay['value'].to_numpy())) | |||||
| print(stepsPerDay['value'].keys()) | |||||
| stepsPerDay['month'] = pd.DatetimeIndex(stepsPerDay['value'].keys()).month | |||||
| stepsPerDay['week_day'] = pd.DatetimeIndex(stepsPerDay['value'].keys()).weekday | |||||
| print(stepsPerDay) | |||||
| ``` | |||||
| <class 'numpy.ndarray'> | |||||
| DatetimeIndex(['2019-04-03', '2019-04-04', '2019-04-05', '2019-04-06', | |||||
| '2019-04-07', '2019-04-08', '2019-04-09', '2019-04-10', | |||||
| '2019-04-11', '2019-04-12', | |||||
| ... | |||||
| '2019-12-19', '2019-12-20', '2019-12-21', '2019-12-22', | |||||
| '2019-12-23', '2019-12-24', '2019-12-25', '2019-12-26', | |||||
| '2019-12-27', '2019-12-28'], | |||||
| dtype='datetime64[ns]', name='date_minus_time', length=342, freq=None) | |||||
| value month week_day | |||||
| date_minus_time | |||||
| 2019-04-03 510 4 2 | |||||
| 2019-04-04 11453 4 3 | |||||
| 2019-04-05 12684 4 4 | |||||
| 2019-04-06 12910 4 5 | |||||
| 2019-04-07 3368 4 6 | |||||
| ... ... ... ... | |||||
| 2019-12-24 5779 12 1 | |||||
| 2019-12-25 4264 12 2 | |||||
| 2019-12-26 4843 12 3 | |||||
| 2019-12-27 9609 12 4 | |||||
| 2019-12-28 2218 12 5 | |||||
| [342 rows x 3 columns] | |||||
| ## Graphing Steps by Month | |||||
| Now that we have columns for the total amount of steps per day and the | |||||
| months, we can plot all the data on a single plot using the group by | |||||
| operator in the plotting library. | |||||
| ```python | |||||
| ax = plt.gca() | |||||
| ax.set_title('Steps Distribution for July\n') | |||||
| stepsPerDay.boxplot(column=['value'], by='month',ax=ax, notch=True) | |||||
| plt.xlabel("Month") | |||||
| plt.ylabel("Steps Per Day") | |||||
| plt.show() | |||||
| ``` | |||||
|  | |||||
| ```python | |||||
| ax = plt.gca() | |||||
| ax.set_title('Steps Distribution By Week Day\n') | |||||
| stepsPerDay.boxplot(column=['value'], by='week_day',ax=ax, notch=True) | |||||
| plt.xlabel("Week Day") | |||||
| plt.ylabel("Steps Per Day") | |||||
| plt.show() | |||||
| ``` | |||||
|  | |||||
| ## Future Work | |||||
| Moving forward with this I would like to do more visualizations with | |||||
| sleep data and heart rate. | |||||