what's between two pings?

Brian Hicks, March 5, 2024

I got to thinking about how pings work in this system (last post and realized an optimization. Right now I'm treating them as though they're all the same size—that's safe because of the law of large numbers, remember—but they're not all the same size! Time varies between pings.

Say you have these pings with these tags:

Time	Tag
8:00	work
9:00	work
9:30	coffee
10:30	work

Assuming we're comfortable with this small sample as being representative of what you actually did, you can say with confidence that between 8:00 and 9:00 you were working. But sometime (vaguely) between 9:00 and 9:30 you transitioned to making coffee, and sometime (vaguely) between 9:30 and 10:30 you transitioned back to working.

If we treat every ping as equal, assuming λ is 1 hour, this will be reported as 3 hours (± 0.85 hours) working and one hour (± 0.85 hours) getting coffee. That's pretty good—or at least enough to get a sense of how you're spending your time.

But what if we take advantage of the fact that pings aren't exactly hourly? We'd have to take care of the vagueness of when you transitioned. In the absence of other data, we might just take the time halfway between two pings as the transition time. So that means that our times look like this:

Time	Tag	Duration
8:00	work	0:30 (halfway to 9)
9:00	work	0:45 (halfway back to 8 plus halfway to 9:30)
9:30	coffee	0:45 (halfway back to 9:30 plus halfway to 10:30)
10:30	work	0:30 (halfway back to 9:30)

This makes it look like we have less time, though: we now have 2.5 hours tracked instead of 4. This is probably not a problem in real life: we can take pings continuously and tag any that aren't answered as "afk." If we really need to, it's probably safe to double the duration of the first and last ping, giving us a total of 3.5 hours in this sample.

But does it give us better insight into our life? Let's see. Doing this by hand:

Tag	Ping as hour	Ping as halfway between
work	3h ± 0.85h	2.75h ± 0.82h
coffee	1h ± 0.85h	0.75h ± 0.82h

It feels weird to me that the error bar goes below zero for coffee now. I definitely didn't spend no time on it, much less negative time. But let's pretend that getting coffee took 15 minutes and the remainder of the 4 hours was spent working: both of these systems produce a perfectly acceptable answer to the question of "where did my day go?"

Given that, I think the first version of this system should assume that pings are 1 hour / λ or similar instead of trying to get fancy. The transformation is not that hard (I'll attach a Python script below that can evaluate the same data I generated in the last post) so it would hypothetically be feasible to change if it looked like there was a big advantage to doing so. Although I want to be careful to avoid giving precise-but-fuzzy numbers, though: sticking with a rougher-grained unit as a base unit probably makes a ton of sense for setting expectations… you wouldn't want to bill a client on data from this system, for example!

All this talk of coffee has made me want some. brb.

#!/usr/bin/env python3
import argparse
import collections
from datetime import datetime, timedelta
import json
import math
import sys


class Ping:
    def __init__(self, at, tag, duration):
        self.at = at
        self.tag = tag
        self.duration = duration

    @classmethod
    def from_json(cls, obj):
        return cls(datetime.fromisoformat(obj['at']), obj['tag'], timedelta(0))

    def __repr__(self):
        return f"<Ping at={self.at.isoformat()} tag={repr(self.tag)}, duration={str(self.duration)}>"


def main(args):
    pings = [Ping.from_json(obj) for obj in json.load(sys.stdin)]

    for (i, ping) in enumerate(pings):
        if i == 0:
            continue

        before = pings[i-1]

        halfway = (ping.at - before.at) / 2
        ping.duration += halfway
        before.duration += halfway

    total_seconds = sum((ping.duration.total_seconds() for ping in pings))
    print(f"From {timedelta(seconds=total_seconds)} hours tracked...\n")

    total_seconds_by_tag = collections.Counter()
    for ping in pings:
        total_seconds_by_tag[ping.tag] += ping.duration.total_seconds()

    for (tag, tag_total) in total_seconds_by_tag.most_common():
        proportion = tag_total / total_seconds
        other_ping_proportion = 1 - proportion
        sem = math.sqrt(proportion * other_ping_proportion / len(pings))
        plus_minus = sem * total_seconds

        print(f"{tag}\t{tag_total/60/60:.2f} hours\tplus or minus {plus_minus/60/60:.2f} hours")


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-l', type=float, default=1)

    main(parser.parse_args())