messing around with statistics
Brian Hicks, March 5, 2024
The basic idea behind TagTime is:
- Choose how often you want to be asked what you're up to. This constant is called λ. Let's assume that
λ = 1
for now, which will mean "once per hour." If you wanted to be asked every 30 minutes (on average) you could set this to 2. - Schedule pings (instances of being asked what you're doing) into the future following a Poisson distribution (which means the time between pings follows an exponential distribution with an average of λ.)
- It seems like the way to do this might be to generate a random value between 0 and 1 and plug it in like so:
math.log(random.random()) / lambda * -1
- Since you know the time between pings averages out to λ, you perform analysis as if each ping was worth λ time (so if it's 1, and that means 1 per hour, that means each ping is worth one hour.)
I'm going to try this out and see how it works, just to test my understanding of the math. I'm going to write a Python program simulating someone with a very simple and strict schedule. Specifically:
What | Start | End |
---|
Sleep | 10:00pm | 6:00am |
Morning Activities | 6am | 7:30am |
Commute | 7:30am | 8:00am |
Work | 8:00am | 12:00pm |
Lunch | 12:00pm | 1:00pm |
Work | 1:00pm | 2:30pm |
Afternoon Coffee | 2:30pm | 2:36pm |
Work | 2:36pm | 5:00pm |
Commute | 5:00pm | 5:30pm |
Evening Activities | 5:30pm | 10:00pm |
Some points of interest:
- Lots of the schedule doesn't fall exactly on hour boundaries.
- A few of the activities are shorter than an hour.
- One activity is tiny! Making coffee every day takes 6 minutes exactly (one tenth of an hour.)
I'm doing that to see how much data this system actually needs to capture smaller changes in activity throughout the day.
Here's a Python script that can generate pings according to this schedule:
#!/usr/bin/env python3
import argparse
from datetime import datetime, timedelta
import json
import math
import random
import sys
def tag(ping):
time = (ping.hour, ping.minute)
if time >= (22, 0) or time < (6, 0): # 8 hours
return 'sleep'
elif time >= (6, 0) and time < (7, 30): # 1.5 hours
return 'morning'
elif time >= (7, 30) and time < (8, 0): # 0.5 hours
return 'commute'
elif time >= (8, 0) and time < (12, 0): # 4 hours
return 'work'
elif time >= (12, 0) and time < (13, 0): # 1 hour
return 'lunch'
elif time >= (13, 0) and time < (14, 30): # 1.5 hours
return 'work'
elif time >= (14, 30) and time < (14, 36): # 0.1 hours
return 'coffee'
elif time >= (14, 36) and time < (17, 0): # 2.4 hours
return 'work'
elif time >= (17, 0) and time < (17, 30): # 0.5 hours
return 'commute'
elif time >= (17, 30) and time < (22, 0): # 4.5 hours
return 'evening'
else:
return None
def main(args):
entries = []
next_ping = datetime.now()
end = next_ping + timedelta(days = args.days)
while next_ping <= end:
entries.append({ "at": next_ping.isoformat(), "tag": tag(next_ping) })
next_gap = math.log(random.random()) / args.l * -1
next_ping += timedelta(hours = next_gap)
json.dump(entries, sys.stdout, indent=2)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('days', type=int)
parser.add_argument('-l', type=float, default='1')
main(parser.parse_args())
Next we assume that each ping is 1 / λ
hours and calculate both the total time for all the pings. That total time comes out pretty close to what I'd expect: for a week, I'd expect 168 pings if there's an average of one an hour and I'm getting like 173, 141, 167, 148, etc.
We sum up the total time per tag, then find out the standard error from the mean for each tag to get a range. Here's the Python code that slurps up the stdout
of the last script:
#!/usr/bin/env python3
import argparse
import collections
from datetime import datetime
import json
import math
import sys
class Ping:
def __init__(self, at, tag):
self.at = at
self.tag = tag
@classmethod
def from_json(cls, obj):
return cls(datetime.fromisoformat(obj['at']), obj['tag'])
def __repr__(self):
return f"<Ping at={repr(self.at)} tag={repr(self.tag)}>"
def main(args):
pings = [Ping.from_json(obj) for obj in json.load(sys.stdin)]
total_hours = len(pings) * (1 / args.l)
print(f"From {total_hours} hours tracked...\n")
total_hours_by_tag = collections.Counter()
for ping in pings:
total_hours_by_tag[ping.tag] += 1 / args.l
for (tag, tag_total) in total_hours_by_tag.most_common():
proportion = tag_total / total_hours
other_ping_proportion = 1 - proportion
sem = math.sqrt(proportion * other_ping_proportion / total_hours)
print(f"{tag}\t{tag_total} hours\tplus or minus {sem * total_hours:.2f} hours")
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-l', type=float, default=1)
main(parser.parse_args())
That outputs things like this:
From 158.0 hours tracked...
sleep 56.0 hours plus or minus 6.01 hours
work 45.0 hours plus or minus 5.67 hours
evening 26.0 hours plus or minus 4.66 hours
morning 12.0 hours plus or minus 3.33 hours
commute 10.0 hours plus or minus 3.06 hours
lunch 9.0 hours plus or minus 2.91 hours
Or this:
From 193.0 hours tracked...
sleep 67.0 hours plus or minus 6.61 hours
work 58.0 hours plus or minus 6.37 hours
evening 40.0 hours plus or minus 5.63 hours
morning 13.0 hours plus or minus 3.48 hours
lunch 7.0 hours plus or minus 2.60 hours
commute 6.0 hours plus or minus 2.41 hours
coffee 2.0 hours plus or minus 1.41 hours
These are reasonably accurate! For any 7-day period, here's how the "actual" time compared to the statistics:
Tag | Actual | Sample 1 | Sample 2 |
---|
sleep | 56h | ✅ 56 ± 6.01 | ✅ 67 ± 6.61 |
work | 55.3h | ❌ 45 ± 5.67 | ✅ 58 ± 6.37 |
evening | 31.5h | ❌ 26 ± 4.66 | ❌ 40 ± 5.63 |
morning | 10.5h | ✅ 12 ± 3.33 | ✅ 13 ± 3.48 |
commute | 7h | ✅ 10 ± 3.06 | ✅ 6 ± 2.41 |
lunch | 7h | ✅ 9 ± 2.91 | ✅ 7 ± 2.60 |
coffee | 0.7h | ❌ did not capture | ✅ 2 ± 1.41 |
I put a ✅ when the range covers the actual value and an ❌ when it doesn't. As you can see, these samples get in the ballpark but the ranges don't always cover the actual values. However, these would definitely be good enough to get a sense of how you're spending your life as a whole, so maybe it's OK!
I wonder, though, if it gets more accurate if you sample more frequently. An average of a half hour seems like it'd get annoying (because of the exponential distribution, some pings would be very close together) but I wonder about 45 minutes. Let's try. That's a λ of 1⅓. Here's the results:
Tag | Actual | Sample 1 | Sample 2 |
---|
sleep | 56h | ✅ 60.75 ± 6.3 | ✅ 61.5 ± 6.14 |
work | 55.3h | ✅ 59.25 ± 6.26 | ❌ 48.75 ± 5.81 |
evening | 31.5h | ✅ 27 ± 4.78 | ✅ 29.25 ± 4.89 |
morning | 10.5h | ✅ 10.5 ± 3.14 | ❌ 4.5 ± 2.09 |
commute | 7h | ✅ 6.75 ± 2.55 | ✅ 8.25 ± 2.8 |
lunch | 7h | ✅ 9 ± 2.92 | ✅ 6.75 ± 2.54 |
coffee | 0.7h | ✅ 1.5 ± 1.22 | ❌ did not capture |
That seems about the same. The ranges don't feel like they're that much smaller to me. Maybe half an hour really would be better?
Tag | Actual | Sample 1 | Sample 2 |
---|
sleep | 56h | ✅ 54 ± 6.04 | ✅ 52.5 ± 5.98 |
work | 55.3h | ✅ 52 ± 5.98 | ✅ 49.5 ± 5.89 |
evening | 31.5h | ✅ 33.5 ± 5.17 | ✅ 34 ± 5.2 |
morning | 10.5h | ✅ 10 ± 3.07 | ✅ 14 ± 3.58 |
commute | 7h | ✅ 7 ± 2.59 | ✅ 7 ± 2.59 |
lunch | 7h | ✅ 8 ± 2.76 | ✅ 7 ± 2.59 |
coffee | 0.7h | ✅ 1.5 ± 1.22 | ✅ 1 ± 1 |
Those two samples happen to be all green, but some of them barely squeaked in. I still think a λ of 30 minutes would be far too annoying, so I'm going to leave it out.
Next I'm going to go and see if this is the same stuff that the Perl version of TagTime actually uses, and then maybe repeat this analysis!