Recently, I learned how to parse data on Fitbit.com by going to the Log -> Activities page and looking at individual activities such as walks or runs in Stata. By viewing the page source information and saving it as an .html file, I am able to parse out what data my fitbit collects during an activity such as: duration, calories burned, distance, latitude, longitude, heart rate zones, heartbeat, pace, and speed.
Above, I've graphed my heart rate along with the map of last Saturday's run using GPS coordinates. The hearts (<3) represent my heart rate, which I thought was a really creative way of using the mlab option in the -scatter- command. Who says you're limited to circles, diamonds, squares or triangles in Stata? I made the heart symbols by using a variable I set to "<3". The mlab option also helped me make the "Start" and "End" markers in the map plot. In the next week or so, I plan on using Fitbit's API to make more use of my personal data. Stay tuned :) If you'd like to try this out, copy and paste this into your do-file editor. Make sure to change the global name of the file (fname) to the name of your html file and make sure you're in that directory when running this .do file. As always, feel free to reach out if you have any questions, suggestions, or comments! /*Author: Belen Chavez*/ /* Description: Parse .HTML code of GPS activity from Fitbit.com to bring into Stata and create graphs */ clear all version 12.1 global fname "runactivity2.html" tempfile f1 /*************** Remove double quotes and some html formatting ****************/ filefilter $fname `f1', f("\Q") t("") tokenize </span> </td> </tr> <tr> > td class=line-number /// <td class=line-content> <span class=html-tag> /// <span class=html-attribute-value> class=html-attribute-name> /// < <br> -07:00 local i = 1 local j = 2 tempfile f2 f30 f31 f32 f33 f34 f35 while "`1'" != ""{ filefilter `f`i'' `f`j'', f("`1'") t("") replace mac shift local ++i local ++j tempfile f`j' } filefilter `f`i'' `f`j'', f("<<") t("<") filefilter `f`j'' `f31', f(",{date") t("\n{date") filefilter `f31' `f32', f("}]") t("}\n") filefilter `f32' `f33', f("},{") t("}\n{") filefilter `f33' `f34', f(\n\n) t(\n) replace filefilter `f34' `f35', f(",{paused") t("\n{paused") replace v1 = itrim(v1) replace v1 = subinstr(v1,"<<","<",.) keep if regexm(v1,"{date:") /* Drop empty variables*/ replace v1 = subinstr(v1,"trackpoints: [","",.) qui desc forv i = 1/`r(k)'{ cap assert v`i'=="" if _rc==0{ drop v`i' } } /* Rename variables */ compress forv j = 1/15{ replace v`j' = subinstr(v`j',"{","",.) local nname "`=substr(v`j'[1],1,strpos(v`j'[1],":")-1)'" di "`nname'" replace v`j' = subinstr(v`j', "`nname':","",.) ren v`j' `=proper("`nname'")' destring `=proper("`nname'")', ignore("{""null" ) replace } gen Heartzone = real(substr(v23,-1,1))+1 if regexm(v23,"BELOW")!=1 replace Heartzone = 1 if regexm(v23,"BELOW")==1 move Heartzone Heartrate /* Drop unnecessary variables */ qui desc local vr = `r(k)'-1 drop v16 - v`vr' /* Format time variable */ replace Date = subinstr(Date,"`=substr(Date,-6,6)'","",.) replace Date = subinstr(Date,"T"," ",.) gen time = Clock(Date,"YMD hms") format time %tC move time Date drop Date replace Dur = Dur/60/1000 * Replace missing Heartzone values: ren Heartrate Heartbeat forv j = 1/4{ cap qui summ Heartbeat if Heartzone==`j' cap replace Heartzone = `j' if Heartzone ==. & Heartbeat>=`r(min)' & Heartbeat<=`r(max)' } * Label variables la def m 1 "No-Zone" 2 "Fat Burn" 3 "Cardio" 4 "Peak", replace la val Heartzone m la var Heartbeat "Beats per Minute" la var Heartzone "Heart Rate Zone" la var Cal "Calories Burned" la var Speed "Miles per Hour" la var Pace "Seconds per Mile" la var Elev "Feet" la var Dis "Miles" la var Dur "Minutes" la var Lat "Latitude" la var Long "Longitude" la var Steps "Steps" gen lab = "<3" gen tick = "Start" in 1 replace tick = "End" in l gen every_10 = 1 if mod(_n,10)==1 summ Dis local Dis: di %4.2fc `r(max)' di `Dis' summ Dur local tim: di %2.0fc `r(max)' di `tim' summ time local da: di %tdDay_Mon_dd,_CCYY dofc(`r(min)') twoway (scatter Heartbeat Dur if every_10==1 ,ylab(80(20)200) /// mlab(lab) msymb(none) mlabcolor(red) mlabangle(vertical) /// mlabpos(12)) , /// tit("Beats per minute during run") name(minutes, replace) scatter Lat Long, mlab(tick) msymb(smcircle) tit("Map of run") /// lpattern(dash) mcolor(blue) /// xlab(none) ylab(none) name(maps, replace) graph combine minutes maps, /// title(Fitbit Parsed Data) subtitle("For Run on `da'") /// note("Summary: Total Time= `tim' Minutes, Total Distance= `Dis' Miles")
0 Comments
It's crazy knowing how much data we put out there ourselves as consumers of social media. Take LinkedIn, for example. You may think you're only posting your skills, current/previous job titles held, and connecting with people, but you're also giving that information away for anyone to see and scrutinize. I'm not just talking about strangers -- e.g. future co-workers, old classmates, people you met once -- I'm talking about anybody you've let into your network.
Today, for example, I went through two profiles, for person X and person Y, and noticed just how much data I can analyze and what I can deduce from such data. Person X: Person X is a contact in my network. First thing I notice are overstatements abound. Person X used VBA once and listed VBA as a skill (does anybody else do this? Or am I the only one who thinks it's slightly deceiving?). This person looks at some large data (10k rows, maybe), but calls it "Big Data". This person also has 3-4 pretty lengthy bullet points regarding new job but, that position was started not even 3 months ago. Many of these bullet points are of underway projects that seem like they've been completed and some of these bullet points are pure hyperboles. I could conclude the following: this person is on the job market again and is marketing him/herself for another job. (Another conclusion could be made, but I noticed that this person has consistently ranked in the top 1% for profile views, which I'm assuming are of interested recruiters). Person Y: This person is a 3rd degree connection in my network. I can gather the following details from this person's profile. Current and previous jobs along with a timeline for when those positions were held. I can also see this person's education and degree information. Person Y piques my interest because this person is working in a position for which their qualifications don't quite match the job description --think MPA working as a private banker. Something doesn't quite add up. Looking over their profile didn't help the confusion, but just as you might be, I was curious. These are only two tiny examples. There is everybody else who uses LinkedIn. You can estimate people's age (or range) if that they have posted graduation dates on their profiles (creepy!). At the end of the day, you can (or at least, try to) tell a story with someone's information. Take me, for example, I've had interview questions regarding my movement from UC Irvine to Florida for work, then Duke for graduate school, San Francisco for public policy work, and finally San Diego. I can only guess that others who don't know me might wonder the same thing. LinkedIn is only one of social media website for which data is available for people to check out through posted profiles. Personal public (and even private) profiles on LinkedIn, Facebook, Instagram, and others social media website are up for anyone to see, analyze, and judge. Just think somebody, somewhere could be looking at your data on your profile right now. You never know! |
AuthorMy name is Belen, I like to play with data using Stata during work hours and in my free time. I like blogging about my Fitbit, Stata, and random musings. Archives
March 2018
Categories
All
|