1 of 39

Equational Theories Project

Daniel Weber, Vlad Tsyrklevich

https://tinyurl.com/LeanTogetherETPslides

Jan 14, 2025

2 of 39

Contributors

3 of 39

Origin of the project

Sept 25, 2024: Terence Tao announces a new project on his blog:

So what is the Equational Theories Project? At the end of September of last year, Terence Tao announced a new project on his blog. The gist of his blog post is that:

Mathematicians tend to work in small trusted groups. It’s hard to work with the public or with AI due to the risk of errors invalidating your results. But ITPs can counteract the risk of errors.
Many formalization projects have tended to focus on formalizing existing results, but proof assistants could also be used for exploring new problems. The blog post announced a pilot project with two goals: to study a specific mathematical problem, but also to explore how to collaboratively work on novel mathematics in a public setting, using Lean-formalization as the medium.
The idea is to collaboratively work on research, kind of like Polymath, with the additional correctness guarantees from formalizating the results. The recent completion of the Busy Beaver Challenge is an example of this style of project. The Busy Beaver Challenge was a collaborative online research project, that studied and formalized a result about Turing machines in Coq.
Terence’s idea was rather than trying to work on an individual problem, crowdsourcing a class of problems seems like a good approach for this style of project. This makes it easy for many people to participate, including those without deep knowledge of a specific subject area. The problem he chose, in universal algebra, is general enough to be widely-accessible.

As I showed on the previous slide, in the end there were many contributors from a number of different backgrounds. From Lean-experts working for the FRO, to those with no experience in Lean. From accomplished mathematicians to people with no graduate-level training.

4 of 39

Definitions

A magma is a set M with a closed binary operation ◇ : M×M → M
We consider magmas with an additional constraint, an equational law such as ∀ x, y ∈ M : x ◇ y = y ◇ x
The project studies magmas satisfying a single equational law
Some example equational laws:

Equation 1: x = x
Equation 2: x = y
Equation 46: x ◇ y = z ◇ w
Equation 4512: (x ◇ y) ◇ z = x ◇ (y ◇ z)

We say Equation A implies Equation B, if all models of A are also models of B
Otherwise Equation A refutes Equation B

Before describing the the research goal of the project, I have to set down a number of definitions.

A magma is a set with a closed binary operation. I’ll write this operation using a diamond throughout the presentation.

A magma is an algebraic structure just like a group or a monoid, except it has no constraints whatsoever on the operation. In this sense it’s not very interesting, unless we add some constraints to study.

We can consider magmas with an additional constraint, an equational law, which is just an equality of term containing variables and potentially making use of the magma operation. One example is the commutative law, which I’ve written ∀ x, y ∈ M, x ◇ y = y ◇ x. In all of the equations studied by the project, the variables, X and Y in this case, are always free variables, they do not refer to individual elements of the set M. So going forward, I will write equations without the for all-quantifier. I will just write x ◇ y = y ◇ x, but the variables are implicitly universally quantified.

The equational theories project only studies magmas that satisfy one equational law

I give some example equational laws. Note that the project has numbered these equations to give us consistent ways to reference them. The numbers themselves don’t have meaning.

Equation 1 is the trivial equation x = x, every single magma satisfies this equation.

The equation below it, Equation 2, x = y, is its opposite. The only magmas that satisfy the equation x = y, are magmas over the empty set, or the set with one element. This equation can not satisfy any other magmas.

I give 2 other examples below that just for illustration. Equation 46 is an equation that defines magmas where x ◇ y is a constant value for all values x and y. And the last one is the associative law. Note that since the magma operation is not known to be associative or commutative, parentheses and order matter.

The last definitions that we need are for implication and refutation. We are interested in whether all magmas satisfying one equation, also satisfy another equation.

We say Equation A implies Equation B, if every model of Equation A is also model of Equation B. If this is not true, we say Equation A refutes Equation B

5 of 39

Example

The model on the right satisfies Equation 3: x = x ◇ x

0 = 0 ◇ 0
1 = 1 ◇ 1

The model does not satisfy Equation 4: x = x ◇ y

0 ≠ 0 ◇ 1

Hence, Equation 3 refutes Equation 4

However, Equation 4 implies Equation 3:

Proof: If x = x ◇ y for all y, set y = x to derive x = x ◇ x

◇	0	1
0	0	1
1	1	1

I’m going to give you an example now to make this concrete. I define a magma over the set {0,1}, with the operation defined by the table on the right.

This magma satisfies Equation 3, which is x = x ◇ x. Remember that x is universally quantified, so this is saying that 0 = 0 ◇ 0 and 1 = 1 ◇ 1. Looking at the table, this is true, so this is a model of Equation 3.

Furthermore, I claim this model does not satisfy Equation 4, x = x ◇ y. If I pick x=0 and y=1, we should have 0 = 0 ◇ 1, but this is not true. Hence, this is not a model of Equation 4.

Therefore, since there is a model of Equation 3 that is not a model of Equation 4, we say that Equation 3 refutes Equation 4.

Looking at the opposite direction, I can prove that Equation 4 implies Equation 3. The proof is straightforward, if we have a magma that satisfies the equation x = x ◇ y for all x and y, then set y = x. We get that x = x ◇ x. So a magma that satisfies Equation 4, x = x ◇ y, must also satisfy Equation 3, x = x ◇ x.

This is a fairly simple example, but the arguments I used are general. Almost all refutations in the equational theories project were proven by constructing an explicit counterexample like we did above. And for implications, all implications are proven using syntactic rewriting like above, though the rewriting could be more involved and complex.

6 of 39

Goal of the project

Goal: Determine the implication graph for all equational laws up to 4 operations
There are 4,694 equations with up to 4 operations (up to normalization)
4694² ≈ 22 million total implications/refutations between equations of order 4
In practice, we require far fewer proofs to resolve the full graph

Many equations are equivalent, so we can reason about ~1.4k equivalence classes
Furthermore, our list include dual equations, which can be ‘translated’ by switching the order of the operation
The full implication graph is determined by ~10k theorems

So with the definitions out of the way, I can explain the research goal of the equational theories project.

The goal of the project is to fully determine the implication graph for all equational laws that make use of up to 4 operations. Or stated otherwise, determine whether every equation (up to 4 operations) does or does not imply every other equation. We count operations on both the left- and right-hand sides of the equation.

You can generate an infinite number of equations with up to four operations, for example x = x, y = y, z = z, etc. Normalizing the list of equations to get rid of obvious duplicate equations like x = x and y = y, or x = y and y = x, you are left with 4,694 equations with up to 4 operations.

This means there are 4694^2 pairs of equations, or 22 million hypothetical implications or refutations to be determined

But in practice, we don’t have to prove all of these individually. Many equations are equivalent, and we can use transitivity and some other methods to reduce the numbers of statement we have to prove.

And we also have a concept of duality, for example all magmas that satisfy x = x ◇ y satisfy x = y ◇ x if you flip the order of applying the magma operation. Or visually, if you flip the operation table by on its diagonal axis.

In the end, the project consists of about 10 thousand theorems in Lean, and Daniel will talk more about how we minimize proofs later.

7 of 39

Project organization

Used familiar tools: Zulip, GitHub, CI, blueprint
Terence Tao’s personal log
Project management tooling by Pietro Monticone and Shreyas Srinivas
@[equational_result] attribute to track results
Implication status: unknown → conjecture → theorem
Some custom tools developed for the project:

I’m going to talk a little about the organization of the project, because some elements of it differ from other Lean-formalization projects and I think it may be interesting to highlight how.
For anyone familiar with Lean, we used familiar tools like Zulip, Github, continuous integration, and lean blueprint
I also link a helpful log that Terence Tao maintained. He summarized results and Zulip discussions as they occurred, so it offers a historical perspective on the progression of the project.
The project also used new tooling developed by Pietro Monticone for managing the project. Maintainers create GitHub issues, and then participants can claim them. This reduces duplication of work by participants in the project, and reduces management overhead keeping track of claimed work for maintainers.
We made use of a custom attribute, @[equational_result]. Proofs of implications or refutations in the project would add the @[equational_result] attribute, this served 3 functions: first it verified that theorem statements were well-formed. Second it enforced that tracked theorems were sorry-free, in contrast to other formalization projects, results are largely independent of one another so sorry-ed theorem statements are undesirable. Lastly, it allowed us to export the known results to tooling outside of Lean.
In addition to a given implication being unknown or proven, we also had also had a third category for tracking ‘informal’ results. If we had an informal human proof (or some machine-assisted proof that we did not yet know how to formalize), then this could be marked as a conjecture. These conjectures were Lean statements in the repo, so they were also collected and exported alongside the formalized results. Many implications and refutations went straight from unknown to Lean-formalized without having being a conjecture, but it was a useful tool for tracking results and making sure participants didn’t waste time working on solved problems.
We have a tool called extract_implications that extracted data about theorems and conjectures. This was run in CI and then that data powered some custom web applications built specifically for this project.
Custom tools were necessary because of the volume of data, no one can manually track 22 million pairs. But the web applications were also useful because there are project participants with no Lean-experience who don’t have working development environments, but who still made valuable contributions, like novel informal proofs, on Zulip. Having web applications outside of the IDE allows everyone to participate, and the web apps just ended up being much more useful than the scripts they replaced.
Now I’ll briefly give a summary of the individual tools.

8 of 39

Dashboard

9 of 39

Equation

Explorer

10 of 39

Graphiti

11 of 39

12 of 39

Results

Formal proof for all implications completed on Day 9 (Oct 4)
Refutations were informally resolved on Day 57 (Nov 21)
Implication graph is conjecturally, e.g. informally, complete

Awaiting 3 formalizations

With all the high-level organization and infrastructure of the project covered, what is the actual current status of the project? For the original research goal of the project: all but 30 refutations have been formalized. Of the 30 remaining refutations, all of them have informal proofs that are awaiting formalization. And we are not awaiting 30 formalizations, there are actually only 3 remaining constructions to complete the graph. In short, the project believes it has fully determined the implication graph and the formalization is close to complete.

Proving implications turned out to be much easier than proving refutations. Automated Theorem Provers, or ATPs were able to resolve all positive implications (and many refutations). Daniel finished formalizing the positive implications for the project on Day 9! But the work to finish proving refutations continued for almost another 2 months. The last refutation was not informally resolved until Day 57. So most of the project was spent proving refutations, and then the continued work of completing formalizations afterwards. As you might expect, a number of the informal proofs also had to be corrected in the course of formalization.

ATPs were not able to resolve all refutations. As the project progressed the remaining refutations required increasingly more effort to resolve. Participants made extensive use of ATPs to guide their research. Not just having ATPs directly solve a given problem, but guiding them and using them interactively. The ATP could confirm or deny whether adding an additional hypothesis would invalidate a given avenue of attack, or the ATPs could also be directed to explore specific parts of the search space that participants thought were likely to bear fruit. The use of ATPs in the project was quite extensive, even though I believe that most participants did not have prior experience with them.

13 of 39

Finite graph

Some implications/refutations depend on whether the underlying magma is finite or infinite
Original aim of the project: Determine the implication graph for all magmas
Also possible to study the implication graph for finite magmas
The graphs are almost identical: <1k differences

All differences are where a finite implication is an infinite refutation

Finding finite implications is no longer first order

Most finite implications automatically proven using injectivity ⇔ surjectivity
The only implications that required human proofs were finite, using the fact that finite functions must be eventually periodic

Next I’ll describe a related problem that the project also studied.

Some implications are true for all finite magmas, but not for magmas with infinite order. In the original aim of the project, this would count as a refutation since there are infinite models of one equation that do not satisfy the other equation. The proof would generally then require constructing an infinite model to demonstrate the refutation.

So if the original goal of the project was to resolve implications for all magmas, this introduces another question: What does the implication graph look like if we restrict ourselves to finite magmas? I’ll call the implication graph for finite and infinite magmas the ‘general graph’, in contrast to the ‘finite graph’ of implications for only finite magmas.

It turns out that in almost all cases, the finite graph and the ‘general graph’ are equivalent. Proving implications by syntactic re-writes as I illustrated earlier holds for both finite and infinite magmas. And for refutations, the project had proven the majority of them using finite magmas, with the remainder using infinite models or meta-theoretic arguments.

One interesting thing about the finite graph, is that in contrast to the general graph, proving implications is not quite as easy. Proving a finite implication is no longer a first-order problem that an automated theorem prover can solve out-of-the-box. Outside of a few implications at the very beginning of the project, every implication for the general graph was proven using an ATP. This was not the case in the finite graph.

Most finite implications were still proven by making use of ATPs, by giving them the additional axiom that injectivity is equivalent to surjectivity, which is true for finite magmas. But there were a handful of proofs, discovered by hand, that made use of the fact that repeated function application in finite contexts is eventually periodic. This is one way that work on the finite graph differed from the general graph. In the case of the general graph, participants believed that the only remaining unknowns were all refutations beginning on Day 9. For the finite graph this was never clear.

14 of 39

Finite graph

A number of refutations originally determined using infinite constructions had finite counter-examples

Some difficult cases that required new methods to construct finite counter-examples
The largest finite counter-example found the project, of order 232, was originally resolved using infinite methods

Working on the finite graph also led us to discover that we had used infinite constructions to prove a number of refutations when finite countermodels existed. Generally, the refutation was established using an infinite counterexample because using ATPs to find finite counterexamples had not been fruitful earlier. Going back to address these cases meant refining old methods or developing new tools to work on the most-difficult-to-establish cases.

Prior to the work on the finite graph beginning, the largest finite counterexample was of order 32. But work to resolve the finite graph led to a counterexample of order 232. Note that this is not the theoretically minimal size to prove that refutation, we’re not sure what that is, it’s just an illustration that going back to look at these cases led us to discover new phenomena.

15 of 39

Results: finite graph

Open question:

Does Equation 677 x = y ◇ (x ◇ ((y ◇ x) ◇ y))�imply or refute Equation 255 x = ((x ◇ x) ◇ x) ◇ x ?

16 of 39

Results

A paper is in the works
Project participants also studied a number of other related questions:

Which equations are equivalent to the Higman-Neumann equation characterizing division in groups?
Which equations have satisfying finite models of every cardinality?
Which equations of order 5 have satisfying finite/infinite models?
and more

17 of 39

18 of 39

Infrastructure

Goal: Maintaining all possible implications, and their status.

Automatically use transitivity:

𝑎→𝑏 and 𝑏→𝑐 implies 𝑎→𝑐.
𝑎→𝑏 and ¬(𝑎→𝑐) implies ¬(𝑏→𝑐).
𝑏→𝑐 and ¬(𝑎→𝑐) implies ¬(𝑎→𝑏).

19 of 39

Infrastructure – continued

Positive implications: transitive closure
For anti-implications, we build a representation graph

Two copies of each law, 𝑥 and 𝑥′
𝑥→𝑦 becomes 𝑦→𝑥 and 𝑦′→𝑥′
¬(𝑥→𝑦) becomes the edge 𝑥→𝑦′

Transitive closure of the representation graph

A path 𝑥→𝑦′ is an anti-implication ¬(𝑥→𝑦)
A path 𝑦→𝑥 is the positive implication 𝑥→𝑦

20 of 39

Infrastructure – Lean Technicalities

Mark results relevant to the implication graph using the attribute @[equational_result].
Many anti-implications from a single magma.
Custom Facts syntax Facts M list1 list2

Satisfies the laws in list1
Refutes the laws in list2

Transitive closure with a dummy node

𝑥→𝑀 for laws in list1
𝑀→𝑥′ for laws in list2

Anti-implications can be proven in multiple magmas - match up magmas and anti-implications to minimize verification work.

21 of 39

Proof Automation

Low hanging fruit without much automation

Simple rewrites
Fixed proof scripts
Developed concurrently with the infrastructure

Two types of proof automation

Lean tactics: tightly integrated, produce proofs in Lean
ATPs: independent of Lean, soundness risk

Translating proofs from ATPs to Lean

22 of 39

Proof Automation – Classification

ATP directly in Lean as a tactic
A Lean tactic which uses an external ATP, and translates its output to a Lean proof
An external script which uses the ATP to produce Lean code.
Most used for new automations in this project, less user-friendly

23 of 39

Proof Automation – Vampire

Vampire is a state-of-the-art automated theorem prover for first order logic (Kovács, L., Voronkov, A.. First-Order Theorem Proving and Vampire, CAV 2013). It implements superposition calculus, a generalization of resolution for equational logic. Its main deductive step is

where 𝜃 is the most general unifier of 𝑙, 𝑠.

24 of 39

Proof Automation – Example Interaction With Vampire

problem.tptp

fof(lhs, axiom, X = mul(X, mul(mul(X, X), Y))).

fof(rhs, conjecture, X = mul(X, mul(X, X))).

danielweber:~$ vampire problem.tptp

1. ! [X0] : ! [X1] : mul(X0,mul(mul(X0,X0),X1)) = X0 [input]

2. ! [X0] : mul(X0,mul(X0,X0)) = X0 [input]

3. ~! [X0] : mul(X0,mul(X0,X0)) = X0 [negated conjecture 2]

4. ! [X0,X1] : mul(X0,mul(mul(X0,X0),X1)) = X0 [flattening 1]

5. ? [X0] : mul(X0,mul(X0,X0)) != X0 [ennf transformation 3]

6. ? [X0] : mul(X0,mul(X0,X0)) != X0 => sK0 != mul(sK0,mul(sK0,sK0)) [choice axiom]

7. sK0 != mul(sK0,mul(sK0,sK0)) [skolemisation 5,6]

8. mul(X0,mul(mul(X0,X0),X1)) = X0 [cnf transformation 4]

9. sK0 != mul(sK0,mul(sK0,sK0)) [cnf transformation 7]

10. mul(X0,mul(X0,X0)) = X0 [superposition 8,8]

11. sK0 != sK0 [superposition 9,10]

13. $false [trivial inequality removal 11]

100 𝑥 = 𝑥 ◇ ((𝑥 ◇ 𝑥) ◇ 𝑦)→8 𝑥 = 𝑥 ◇ (𝑥 ◇ 𝑥)

25 of 39

Proof Automation – Example Implication

Suppose by contradiction (1) 𝑥 ◇ ( (𝑥 ◇ 𝑥) ◇ 𝑦 ) = 𝑥 and (∗)𝑥₀ ≠ 𝑥₀ ◇ (𝑥₀ ◇ 𝑥₀).
Superpose equation (1) with itself and get (2) 𝑥 ◇ ( 𝑥 ◇ 𝑥 ) = 𝑥.

𝑥
= 𝑥 ◇ ((𝑥 ◇ 𝑥) ◇ 𝑦)
= 𝑥 ◇ (𝑥’ ◇ ( (𝑥’ ◇ 𝑥’) ◇ 𝑦’ )) (setting 𝑥′ ≔ 𝑥 ◇ 𝑥,𝑦 ≔ (𝑥’ ◇ 𝑥’) ◇ 𝑦’)
= 𝑥 ◇ 𝑥’ (by equation (1))
= 𝑥 ◇ ( 𝑥 ◇ 𝑥 )

Rewrite with (2) at (∗) and get 𝑥₀ ≠ 𝑥₀.
Contradiction.

26 of 39

Proof Automation – Superposition in Lean

To justify superposition in Lean, we need to show it from simple substitution.

Given (by Vampire) an equation 𝑎 which is superposed at 𝑏 to give 𝑐, we will match up 𝑏 and 𝑐, using 𝑎 or its inverse to resolve leftover equalities.
This can be approximated by convert b .. <;> (first | apply a | apply (a ..).symm).
convert can sometimes assign the metavariables incorrectly

In the previous example the simple approach would’ve tried to set y := x and prove 𝑥 ◇ 𝑥 = 𝑥

We use a custom backtracking implementation
There can also be leftover unassigned metavariables, which were erased by 𝑎:

Rewriting 𝑥 ◇ (𝑥’ ◇ ( (𝑥’ ◇ 𝑥’) ◇ 𝑦’ )) with 𝑥’ ◇ ( (𝑥’ ◇ 𝑥’) ◇ 𝑦’ ) = 𝑥’ erases 𝑦.
Assigned using assumption, which works because we used contradiction to get some 𝑥₀,… which don’t satisfy the target equation.

27 of 39

Counterexamples

Vampire resolved all positive implications
Finite counterexamples from Vampire’s finite model building
We also used Vampire to find infinite models
Vampire’s algorithm is complete - when it finishes without a proof there is no proof. The refutation can be seen as finding a confluent set of rewriting rules for an equation, which can resolved all implications from that equation.

Rewriting rules: directed equalities
Confluence: every value reduces to a unique normal form by greedily applying rewriting rules
Wasn’t done automatically

28 of 39

Asterix Equation

Equation 65: x = y ◇ (x ◇ (y ◇ x)) - The “Asterix” equation
Equation 1491: x = (y ◇ x) ◇ (y ◇ (y ◇ x)) - The “Obelix” equation
Asterix implies Obelix for finite magmas.
Does it imply it for infinite magmas?

29 of 39

Ruleset Approach

A new approach for deciding equational theories modulo an equation
Define a rule: if (some list of hypotheses of the form aᵢ ◇ aⱼ = aₖ) then (either a conclusion of the form aᵢ ◇ aⱼ = aₖ or aᵢ = aⱼ).
A ruleset is valid if all finite partial binary operators ◇ which satisfy all the rules can be completed to a total binary operator which satisfies all of the rules (possibly on a larger base set).

30 of 39

Ruleset Approach – Continued

A ruleset is necessary for a given equation if the equation implies all of the rules.
A ruleset is complete for an equation 𝑒 if it’s valid, necessary for it, and implies it for total operators.
We also add a rule for ◇ being a single-valued - if a₁ ◇ a₂ = a₃ and a₁ ◇ a₂ = a₄ then a₃ = a₄, because we represent a partial function as a predicate ◇(a₁, a₂, a₃).

31 of 39

Ruleset Approach – Decision Procedure

A complete ruleset gives a decision procedure - assign a variable for each intermediate value in the equation, and repeatedly apply the rules to figure out the operation on these values, and which ones are equal.

If the two sides of the equation become equal, then by necessity it’s implied by our equation, and if they don’t by validity it can be completed to a counterexample.

32 of 39

Validity from Greedy Extension

The method we used for showing validity of rulesets is greedy extension
The construction is over an arbitrary countable set
At each step the partial function is finite, so we can find a “fresh” value 𝑐 which isn’t mentioned at all.
Choose some a ◇ b which is unassigned and set a ◇ b = 𝑐
Make sure other rules are satisfied

Apply one iteration of the rules, making sure they’re satisfied when the hypotheses use the old operation and the conclusion uses the new operation.
Only assign x ◇ y = z if one of x,y,z is 𝑐

Prove that the rules are still preserved

33 of 39

Greedy Extension – Example

Consider for example Equation 1648 - x = (x ◇ y) ◇ ((x ◇ y) ◇ y).

As a rule it becomes x ◇ y = a, a ◇ y = b -> a ◇ b = x.

Adding the rule for left injectivity, a ◇ x = y, b ◇ x = y -> a = b, this becomes a complete ruleset for equation 1648 via greedy extension.

34 of 39

Greedy Extension – Visualization

x ◇ b = a, a ◇ b = c -> a ◇ c = x

Refutes 3253 x ◇ x = x ◇ (x ◇ (x ◇ x))

◇	1	2	3	4	5	6	7	8	9	10
1	2	3	3	7			2
2	4	5	8	1
3	6	9							1
4	10									2
5
6
7
8
9

◇	1	2	3
1	2	3	3
2
3

◇	1	2	3	4
1	2	3	3
2	4			1
3
4

◇	1	2	3	4	5
1	2	3	3
2	4	5		1
3
4
5

◇	1	2	3	4	5	6
1	2	3	3
2	4	5		1
3	6
4
5
6

◇	1	2	3	4	5	6	7
1	2	3	3	7			2
2	4	5		1
3	6
4
5
6
7

◇	1	2	3	4	5	6	7	8
1	2	3	3	7			2
2	4	5	8	1
3	6
4
5
6
7
8

◇	1	2	3	4	5	6	7	8	9
1	2	3	3	7			2
2	4	5	8	1
3	6	9							1
4
5
6
7
8
9

◇	1	2	3	4	5	6	7	8	9	10
1	2	3	3	7			2
2	4	5	8	1
3	6	9							1
4	10									2
5
6
7
8
9

35 of 39

Proof Automation – Ruleset Construction

Maintain a ruleset
Use Vampire to prove the rules are preserved when defining the new operation
If they aren’t, find a finite counterexample.
Either we need some x ◇ y = z but it’s false, or we need some x = y but they’re different.

We can add a new rule to make sure this rule will be satisfied if we were to do this again.

If the violation doesn’t involve 𝑐, we can add the rule with the assumptions from the finite counterexample.
If it’s 𝑐 = z we change the conclusion to a ◇ b = z.
Otherwise, we also need to add the assumption that a ◇ b = 𝑐.

Continue until we can prove the rules are preserved (or there are too many rules for it to either prove or find a counterexample in reasonable time)

36 of 39

Ruleset Construction – Example

Initial rules:

a ◇ b = c, a ◇ b = d -> c = d
x ◇ b = a, a ◇ b = c -> a ◇ c = x

	a	b	x	c
a		c		b/x
b	a	a	a
x	a	a	a
c

New rule:

3. b ◇ a = a, b ◇ b = a, b ◇ x = a, x ◇ a = a, x ◇ b = a, x ◇ x = a -> b = x

Problem – excessively specific rule

	a	b	x
a
b	a	a	a
x	a	a	a

37 of 39

Proof Automation – Rule minimization

Vampire sometimes finds counterexample which aren’t minimal, in terms of the rules they produce.

Sometimes hypotheses can be dropped, or there are values which are equal but don’t necessarily need to be.

To make the code more efficient, before we add a rule to our ruleset, we minimize it - attempt to drop hypotheses, and attempt to make different occurrences of a variable different. We use Vampire to check that the minimized rules remain consequences of the equation.

In the previous example, the rule b ◇ a = a, b ◇ b = a, b ◇ x = a, x ◇ a = a, x ◇ b = a, x ◇ x = a -> b = x would be minimized to b ◇ a = a, x ◇ a = a -> b = x by dropping hypotheses, and then to b ◇ a = y, x ◇ a = y -> b = x by making occurrences different.

38 of 39

Theoretical Tools

In addition to what was discussed in here, many approaches were used in the project for handling specific equations, for example:

Linear operations: x ◇ y = a x + b y
Translation invariant magmas: x ◇ y = x + f(y-x)
Magma twisting: x ◇’ y = Tx ◇ Uy
Magma cohomology, similar to group cohomology.

39 of 39

Conclusion

Proof assistants like Lean let us run mathematical projects at a large scale, allowing for contribution from anybody, with a high guarantee of correctness.

https://tinyurl.com/LeanTogetherETPslides