From 89d2cf37c44361f023e14d9c3b1e3966e665c8a6 Mon Sep 17 00:00:00 2001 From: Ryan Missel Date: Sat, 30 Mar 2019 09:56:40 -0400 Subject: [PATCH] Added docs folder to hold relevant CSV docs. --- data_preparation/docs/wellness.txt | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 data_preparation/docs/wellness.txt diff --git a/data_preparation/docs/wellness.txt b/data_preparation/docs/wellness.txt new file mode 100644 index 0000000..5df9c2c --- /dev/null +++ b/data_preparation/docs/wellness.txt @@ -0,0 +1,22 @@ +Features in Wellness: + Pain [0, 1] - no NaNs + Illness [0, 0.5, 1] - no NaNs + Menstruation [0, 1] - 16 NaNs, filled with 0. Not a big statistical difference, so this is fine + Nutrition [0, 0.5, 1] - 837 NaN, filled with 0. Not a useful feature + NutritionAdj [0, 1] - 745 NaN, filled with 0. Again not useful + USGMeasurement [0, 1] 168 NaN, filled with 0. + USG [1.0...] 4382 NaN, not a useful feature + TrainingReadiness [0..1] - no NaNs + +Useful features include Pain, Illness, Menstruation, TrainingReadiness +The others either have too many NaNs present to extract any useful meaning or are just unhelpful features +to begin with, like Nutrition. + + +Notnormalized_with_0NaN_wellness.csv: + +- The only feature of significance that had NaN values put into it were Menstruation, as only 16 NaNs were present +and wouldn't present any statistical difference either way. + +- Working in the notnormalized_with_0NaN_wellness csv should be functional, just have to remove any string columns +before putting into algorithms as they are not removed in this version \ No newline at end of file