Alloy

Brian Hicks, December 12, 2022

Alloy is a tool for modeling software. It's best at representing things like databases or data structures, but in my experience it does fine with things like UI states or processes as well.

I've found Alloy most helpful in exploring the relationship between different parts of a system. Unlike other tools in the formal methods space, Alloy has a visualizer. For me, this is a huge advantage: not only does it make it easier for me to think about my models, but it also makes communicating about specs with people who are not familiar with Alloy much nicer. Having something you can talk about or mark up enables better communication, as opposed to having to read states and relationships out of a table.

To illustrate that, here's an oversimplified version of email. I've told Alloy that there are many accounts on a server, and that accounts can send emails back and forth. This should even work across servers, as long as the servers can deliver to each other. The code looks like this:

sig Email {
  from: one Account,
  to: one Account,
}

sig Account {
  server: one Server,
}

sig Server {
  deliversTo: set Server,
}

To start off, Alloy can generate a "metamodel" (that is, the relationships between things it knows about.) It basically looks like you might expect given the code:

a metamodel visualization generated by Alloy, with Email pointing "to" and "from" arrows to Account, Account pointing a "server" error to Server, and Server pointing "deliversTo" arrow to itself.

With no code changes, it can also generate as many instances of the model as you care to ask it for. Here's one, where an email is being sent from one server to another:

an example generated by Alloy, with an email being delivered from an account on one server to an account on another

I've found it useful to set up relationships like this, then ask the tool to generate examples. If something looks off or out of the ordinary, I can go back and have a think (maybe with other people) about whether or not that thing should be allowed. For example, I notice that one email server above can't see the other, and the other can't deliver to its own accounts. Is that reasonable? Depending on the situation you're modeling, it may or may not be!

You can use Alloy to make assertions about your models as well. In the example above, for example, you could assert that there couldn't be any undeliverable email (that is, any email going between servers without a deliversTo relationship.) If you've allowed that possibility somehow, Alloy will tell you why and generate a counterexample. Here's how we'd do that:

check NoUndeliverableEmail {
  all e: Email | e.to.server in e.from.server.*deliversTo
}

In English, this means "for an email to be deliverable, the server in from must be able to reach the server in to through the deliversTo relation, 0 or more times. And, by the way, every Email is deliverable." Of course, we haven't guaranteed that assertion in any way, so Alloy comes up with a counterexample:

an example generated by Alloy, with an account trying to send email to a server that its server cannot deliver to

In this, Server1 can deliver everywhere, Server0 can deliver to Server2, and Server2 can't deliver anywhere. Unfortunately for everyone involved, Account0 on Server2 wants to send an email to their buddy on Server1. Guess you're outta luck, pal!

We can try to fix this, of course, by saying that all of these deliversTo relations are symmetric. We'd just establish some fact that Alloy will take as given:

fact "all delivery relationships are symmetric" {
  deliversTo = ~deliversTo
}

This works because deliversTo is actually a set of tuples; the server.deliversTo is doing a lookup into it! The ~ operator reverses all the tuples, so we're asserting that the forward version is equal to the backwards one—symmetric!

But in the face of this stunning victory, Alloy replies "ah, yes, but what if the servers are on completely separate networks?"

an example generated by Alloy showing two completely independent networks of email servers with one account trying to send an email across them. There are no connections between the networks, so this email is undeliverable.

We can keep going down this rabbit hole of generating examples and making assertions until we're satisfied that the system has few enough problems to implement!

Why is this Useful?

I've used Alloy in a lot of situations now where I would previously have just been guessing at edge cases in the model. It's been really useful to show me exactly what problems exist in my model and helping me communicate with other people about why we should care about them.

Alloy helps simplify models, as well! For example, at work we were building a feature with inline commenting (like Google Docs.) We started by modeling Google Docs' comments and it turned out that each comment had something like 32 unique states. We realized this was going to be way too much for our use case, and simplified it down to 4. We probably could have had this realization without using Alloy, but it made the conversation much simpler—the designer and I sat down and looked at the visualizations, then kept saying "but for us, states X and Y would be the same…" until we arrived at a simpler model. It was really nice, and only took about an hour and a half of modeling to come up with a solution that satisfied all our requirements!

Yeah, but…

So all that's super exciting, and I love using Alloy, but it'd be silly to pretend that it's perfect. Here's some examples:

It's very easy to give Alloy contradicting instructions that rule out some case you care about. Sometimes this even means that Alloy can't come up with any examples at all! It can be really difficult to debug this when you're getting started. (But if you change the solver to "MiniSat with Unsat Core" you can at least get a hint at what might be conflicting.)
Alloy only checks a finite size of the models you generate (4 of everything by default.) This means that it won't find bugs that require there to be a huge number of something. However, almost everything I've done in Alloy gives useful feedback with the default limits—it checks all states comprehensively within the given size—and it's easy (bordering on trivial) to raise the limit if you suspect there would be a bug if you had more instances. Just know going in that you're not going to get proof for an infinitely sized model. This property does mean that Alloy is very, very fast, though!
As a beginner, it's hard to know how to learn. The usual learning path is to buy and work through Software Abstractions by Alloy's author, Daniel Jackson. However, Alloy has evolved over time and now has important features—like temporal logic—that are not covered in the book. To make matters worse, documentation outside of the normal learning path often means reading the spec. However, once you know a bit about Alloy, and even while you're learning, Hillel Wayne's Alloy reference is great.

Another one of Alloy's weakness used to be time—that is, it was hard to model things that could change—but the release of Alloy 6 added temporal logic and made this situation a lot more tenable. I still haven't completely come to grips with the new features, but I see concepts in the spec that I first learned about in TLA+—although I appreciate that Alloy uses English-language keywords instead of symbol operators like [] (always in Alloy) or <> (eventually in Alloy.)

All told, I recommend learning Alloy! I've found it tremendously helpful and I think it's too bad that it's not better-known by industry programmers.