In the previous post, I showed that the Control Flow graphs we built of our programs match how they are really executed. This means that we can rely on these graphs to compute program information. In this post, we finally get to compute that information. Here’s a quick bit paraphrasing from last time that provides a summary of our approach:

  1. We will construct a finite-height lattice. Every single element of this lattice will contain information about each variable at each node in the Control Flow Graph.
  2. We will then define a monotonic function that update this information using the structure encoded in the CFG’s edges and nodes.
  3. Then, using the fixed-point algorithm, we will find the least element of the lattice, which will give us a precise description of all program variables at all points in the program.
  4. Because we have just validated our CFGs to be faithful to the language’s semantics, we’ll be able to prove that our algorithm produces accurate results.

Let’s jump right into it!

Choosing a Lattice

A lot of this time, we have been talking about lattices, particularly lattices of finite height. These structures represent things we know about the program, and provide operators like ()(\sqcup) and ()(\sqcap) that help us combine such knowledge.

The forward analysis code I present here will work with any finite-height lattice, with the additional constraint that equivalence of lattices is decidable, which comes from the implementation of the fixed-point algorithm, in which we routinely check if a function’s output is the same as its input.

From Forward.agda, lines 4 through 8
4
5
6
7
8
module Analysis.Forward
    {L : Set} {h}
    {_≈ˡ_ : L  L  Set} {_⊔ˡ_ : L  L  L} {_⊓ˡ_ : L  L  L}
    (isFiniteHeightLatticeˡ : IsFiniteHeightLattice L h _≈ˡ_ _⊔ˡ_ _⊓ˡ_)
    (≈ˡ-dec : IsDecidable _≈ˡ_) where

The finite-height lattice L is intended to describe the state of a single variable. One example of a lattice that can be used as L is our sign lattice. We’ve been using the sign lattice in our examples from the very beginning, and we will stick with it for the purposes of this explanation. However, this lattice alone does not describe our program, since it only talks about a single sign; programs have lots of variables, all of which can have different signs! So, we might go one step further and define a map lattice from variables to their signs:

VariableSign \text{Variable} \to \text{Sign}

We have seen that we can turn any lattice LL into a map lattice ALA \to L, for any type of keys AA. In this case, we will define AVariableA \triangleq \text{Variable}, and LSignL \triangleq \text{Sign}. The sign lattice has a finite height, and I’ve proven that, as long as we pick a finite set of keys, map lattices ALA \to L have a finite height if LL has a finite height. Since a program’s text is finite, Variable\text{Variable} is a finite set, and we have ourselves a finite-height lattice VariableSign\text{Variable} \to \text{Sign}.

We’re on the right track, but even the lattice we have so far is not sufficient. That’s because variables have different signs at different points in the program! You might initialize a variable with x = 1, making it positive, and then go on to compute some arbitrary function using loops and conditionals. For each variable, we need to keep track of its sign at various points in the code. When we defined Control Flow Graphs, we split our programs into sequences of statements that are guaranteed to execute together — basic blocks. For our analysis, we’ll keep per-variable for each basic block in the program. Since basic blocks are nodes in the Control Flow Graph of our program, our whole lattice will be as follows:

InfoNodeId(VariableSign) \text{Info} \triangleq \text{NodeId} \to (\text{Variable} \to \text{Sign})

We follow the same logic we just did for the variable-sign lattice; since VariableSign\text{Variable} \to \text{Sign} is a lattice of finite height, and since NodeId\text{NodeId} is a finite set, the whole Info\text{Info} map will be a lattice with a finite height.

Notice that both the sets of Variable\text{Variable} and NodeId\text{NodeId} depend on the program in question. The lattice we use is slightly different for each input program! We can use Agda’s parameterized modules to automaitcally parameterize all our functions over programs:

From Forward.agda, lines 36 through 37
36
37
module WithProg (prog : Program) where
    open Program prog

Now, let’s make the informal descriptions above into code, by instantiating our map lattice modules. First, I invoked the code for the smaller variable-sign lattice. This ended up being quite long, so that I could rename variables I brought into scope. I will collapse the relevant code block; suffice to say that I used the suffix v (e.g., renaming _⊔_ to _⊔ᵛ_) for properties and operators to do with variable-sign maps (in Agda: VariableValuesFiniteMap).

(Click here to expand the module uses for variable-sign maps)
From Forward.agda, lines 41 through 82
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
    module VariableValuesFiniteMap = Lattice.FiniteValueMap.WithKeys _≟ˢ_ isLatticeˡ vars
    open VariableValuesFiniteMap
        using ()
        renaming
            ( FiniteMap to VariableValues
            ; isLattice to isLatticeᵛ
            ; _≈_ to _≈ᵛ_
            ; _⊔_ to _⊔ᵛ_
            ; _≼_ to _≼ᵛ_
            ; ≈₂-dec⇒≈-dec to ≈ˡ-dec⇒≈ᵛ-dec
            ; _∈_ to _∈ᵛ_
            ; _∈k_ to _∈kᵛ_
            ; _updating_via_ to _updatingᵛ_via_
            ; locate to locateᵛ
            ; m₁≼m₂⇒m₁[k]≼m₂[k] to m₁≼m₂⇒m₁[k]ᵛ≼m₂[k]ᵛ
            ; ∈k-dec to ∈k-decᵛ
            ; all-equal-keys to all-equal-keysᵛ
            )
        public
    open IsLattice isLatticeᵛ
        using ()
        renaming
            ( ⊔-Monotonicˡ to ⊔ᵛ-Monotonicˡ
            ; ⊔-Monotonicʳ to ⊔ᵛ-Monotonicʳ
            ; ⊔-idemp to ⊔ᵛ-idemp
            )
    open Lattice.FiniteValueMap.IterProdIsomorphism _≟ˢ_ isLatticeˡ
        using ()
        renaming
            ( Provenance-union to Provenance-unionᵐ
            )
    open Lattice.FiniteValueMap.IterProdIsomorphism.WithUniqueKeysAndFixedHeight _≟ˢ_ isLatticeˡ vars-Unique ≈ˡ-dec _ fixedHeightˡ
        using ()
        renaming
            ( isFiniteHeightLattice to isFiniteHeightLatticeᵛ
            ; ⊥-contains-bottoms to ⊥ᵛ-contains-bottoms
            )

    ≈ᵛ-dec = ≈ˡ-dec⇒≈ᵛ-dec ≈ˡ-dec
    joinSemilatticeᵛ = IsFiniteHeightLattice.joinSemilattice isFiniteHeightLatticeᵛ
    fixedHeightᵛ = IsFiniteHeightLattice.fixedHeight isFiniteHeightLatticeᵛ
    ⊥ᵛ = Chain.Height.⊥ fixedHeightᵛ

I then used this lattice as an argument to the map module again, to construct the top-level Info\text{Info} lattice (in Agda: StateVariablesFiniteMap). This also required a fair bit of code, most of it to do with renaming.

(Click here to expand the module uses for the top-level lattice)
From Forward.agda, lines 85 through 112
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
    module StateVariablesFiniteMap = Lattice.FiniteValueMap.WithKeys _≟_ isLatticeᵛ states
    open StateVariablesFiniteMap
        using (_[_]; []-∈; m₁≼m₂⇒m₁[ks]≼m₂[ks]; m₁≈m₂⇒k∈m₁⇒k∈km₂⇒v₁≈v₂)
        renaming
            ( FiniteMap to StateVariables
            ; isLattice to isLatticeᵐ
            ; _≈_ to _≈ᵐ_
            ; _∈_ to _∈ᵐ_
            ; _∈k_ to _∈kᵐ_
            ; locate to locateᵐ
            ; _≼_ to _≼ᵐ_
            ; ≈₂-dec⇒≈-dec to ≈ᵛ-dec⇒≈ᵐ-dec
            ; m₁≼m₂⇒m₁[k]≼m₂[k] to m₁≼m₂⇒m₁[k]ᵐ≼m₂[k]ᵐ
            )
        public
    open Lattice.FiniteValueMap.IterProdIsomorphism.WithUniqueKeysAndFixedHeight _≟_ isLatticeᵛ states-Unique ≈ᵛ-dec _ fixedHeightᵛ
        using ()
        renaming
            ( isFiniteHeightLattice to isFiniteHeightLatticeᵐ
            )
    open IsFiniteHeightLattice isFiniteHeightLatticeᵐ
        using ()
        renaming
            ( ≈-sym to ≈ᵐ-sym
            )

    ≈ᵐ-dec = ≈ᵛ-dec⇒≈ᵐ-dec ≈ᵛ-dec
    fixedHeightᵐ = IsFiniteHeightLattice.fixedHeight isFiniteHeightLatticeᵐ

Constructing a Monotone Function

We now have a lattice in hand; the next step is to define a function over this lattice. For us to be able to use the fixed-point algorithm on this function, it will need to be monotonic.

Our goal with static analysis is to compute information about our program; that’s what we want the function to do. When the lattice we’re using is the sign lattice, we’re trying to determine the signs of each of the variables in various parts of the program. How do we go about this?

Each piece of code in the program might change a variable’s sign. For instance, if x has sign 00, and we run the statement x = x - 1, the sign of x will be -. If we have an expression y + z, we can use the signs of y and z to compute the sign of the whole thing. This is a form of abstract interpretation, in which we almost-run the program, but forget some details (e.g., the exact values of x, y, and z, leaving only their signs). The exact details of how this partial evaluation is done are analysis-specific; in general, we simply require an analysis to provide an evaluator. We will define an evaluator for the sign lattice below.

From Forward.agda, lines 166 through 167
166
167
    module WithEvaluator (eval : Expr  VariableValues  L)
                         (eval-Mono :  (e : Expr)  Monotonic _≼ᵛ_ _≼ˡ_ (eval e)) where

From this, we know how each statement and basic block will change variables in the function. But we have described them process as “if a variable has sign X, it becomes sign Y” – how do we know what sign a variable has before the code runs? Fortunately, the Control Flow Graph tells us exactly what code could be executed before any given basic block. Recall that edges in the graph describe all possible jumps that could occur; thus, for any node, the incoming edges describe all possible blocks that can precede it. This is why we spent all that time defining the predecessors function.

We proceed as follows: for any given node, find its predecessors. By accessing our Info\text{Info} map for each predecessor, we can determine our current best guess of variable signs at that point, in the form of a VariableSign\text{Variable} \to \text{Sign} map (more generally, VariableL\text{Variable} \to L map in an arbitrary analysis). We know that any of these predecessors could’ve been the previous point of execution; if a variable x has sign ++ in one predecessor and - in another, it can be either one or the other when we start executing the current block. Early on, we saw that the ()(\sqcup) operator models disjunction (“A or B”). So, we apply ()(\sqcup) to the variable-sign maps of all predecessors. The reference Static Program Analysis text calls this operation JOIN\text{JOIN}:

JOIN(v)=wpred(v)w \textit{JOIN}(v) = \bigsqcup_{w \in \textit{pred}(v)} \llbracket w \rrbracket

The Agda implementation uses a foldr:

From Forward.agda, lines 139 through 140
139
140
    joinForKey : State  StateVariables  VariableValues
    joinForKey k states = foldr _⊔ᵛ_ ⊥ᵛ (states [ incoming k ])

Computing the “combined incoming states” for any node is a monotonic function. This follows from the monotonicity of ()(\sqcup) — in both arguments — and the definition of foldr.

(Click here to expand the general proof)
From Lattice.agda, lines 143 through 151
143
144
145
146
147
148
149
150
151
    foldr-Mono :  (l₁ l₂ : List A) (f : A  B  B) (b₁ b₂ : B) 
                 Pairwise _≼₁_ l₁ l₂  b₁ ≼₂ b₂ 
                 ( b  Monotonic _≼₁_ _≼₂_ (λ a  f a b)) 
                 ( a  Monotonic _≼₂_ _≼₂_ (f a)) 
                 foldr f b₁ l₁ ≼₂ foldr f b₂ l₂
    foldr-Mono List.[] List.[] f b₁ b₂ _ b₁≼b₂ _ _ = b₁≼b₂
    foldr-Mono (x  xs) (y  ys) f b₁ b₂ (x≼y  xs≼ys) b₁≼b₂ f-Mono₁ f-Mono₂ =
        ≼₂-trans (f-Mono₁ (foldr f b₁ xs) x≼y)
                 (f-Mono₂ y (foldr-Mono xs ys f b₁ b₂ xs≼ys b₁≼b₂ f-Mono₁ f-Mono₂))

From this, we can formally state that JOIN\text{JOIN} is monotonic. Note that the input and output lattices are different: the input lattice is the lattice of variable states at each block, and the output lattice is a single variable-sign map, representing the combined preceding state at a given node.

From Forward.agda, lines 145 through 149

Above, the m₁≼m₂⇒m₁[ks]≼m₂[ks] lemma states that for two maps with the same keys, where one map is less than another, all the values for any subset of keys ks are pairwise less than each other (i.e. m₁[k]≼m₂[k], and m₁[l]≼m₂[l], etc.). This follows from the definition of “less than” for maps.

So those are the two pieces: first, join all the preceding states, then use the abstract interpretation function. I opted to do both of these in bulk:

  1. Take an initial Info\text{Info} map, and update every basic block’s entry to be the join of its predecessors.
  2. In the new joined map, each key now contains the variable state at the beginning of the block; so, apply the abstract interpretation function via eval to each key, computing the state at the end of the block.

I chose to do these in bulk because this way, after each application of the function, we have updated each block with exactly one round of information. The alternative — which is specified in the reference text — is to update one key at a time. The difference there is that updates to later keys might be “tainted” by updates to keys that came before them. This is probably fine (and perhaps more efficient, in that it “moves faster”), but it’s harder to reason about.

Generalized Update

To implement bulk assignment, I needed to implement the source text’s Exercise 4.26:

Exercise 4.26: Recall that f[ax]f[a \leftarrow x] denotes the function that is identical to ff except that it maps aa to xx. Assume f:L1(AL2)f : L_1 \to (A \to L_2) and g:L1L2g : L_1 \to L_2 are monotone functions where L1L_1 and L2L_2 are lattices and AA is a set, and let aAa \in A. (Note that the codomain of ff is a map lattice.)

Show that the function h:L1(AL2)h : L_1 \to (A \to L_2) defined by h(x)=f(x)[ag(x)]h(x) = f(x)[a \leftarrow g(x)] is monotone.

In fact, I generalized this statement to update several keys at once, as follows:

h(x)=f(x)[a1g(a1,x), ..., ang(an,x)] h(x) = f(x)[a_1 \leftarrow g(a_1, x),\ ...,\ a_n \leftarrow g(a_n, x)]

I called this operation “generalized update”.

At first, the exercise may not obviously correspond to the bulk operation I’ve described. Particularly confusing is the fact that it has two lattices, L1L_1 and L2L_2. In fact, the exercise results in a very general theorem; we can exploit a more concrete version of the theorem by setting L1AL2L_1 \triangleq A \to L_2, resulting in an overall signature for ff and hh:

f:(AL2)(AL2) f : (A \to L_2) \to (A \to L_2)

In other words, if we give the entire operation in Exercise 4.26 a type, it would look like this:

ex4.26:Kvalue of a(MapV)updaterMapMapfMapMaph \text{ex}_{4.26} : \underbrace{K}_{\text{value of}\ a} \to \underbrace{(\text{Map} \to V)}_{\text{updater}} \to \underbrace{\text{Map} \to \text{Map}}_{f} \to \underbrace{\text{Map} \to \text{Map}}_{h}

That’s still more general than we need it. This here allows us to modify any map-to-map function by updating a certain key in that function. If we just want to update keys (as we do for the purposes of static analysis), we can recover a simpler version by setting fidf \triangleq id, which results in an updater h(x)=x[ag(x)]h(x) = x[a \leftarrow g(x)], and a signature for the exercise:

ex4.26:Kvalue of a(MapV)updater gMapold mapMapupdated map \text{ex}_{4.26} : \underbrace{K}_{\text{value of}\ a} \to \underbrace{(\text{Map} \to V)}_{\text{updater}\ g} \to \underbrace{\text{Map}}_{\text{old map}} \to \underbrace{\text{Map}}_{\text{updated map}}

This looks just like Haskell’s Data.Map.adjust function, except that it can take the entire map into consideration when updating a key.

My generalized version takes in a list of keys to update, and makes the updater accept a key so that its behavior can be specialized for each entry it changes. The sketch of the implementation is in the _updating_via_ function from the Map module, and its helper transform. Here, I collapse its definition, since it’s not particularly important.

(Click here to see the definition of transform)
From Map.agda, lines 926 through 931
926
927
928
929
930
931
    transform : List (A × B)  List A  (A  B)  List (A × B)
    transform [] _ _ = []
    transform ((k , v)  xs) ks f
        with k∈-dec k ks
    ...   | yes _ = (k , f k)  transform xs ks f
    ...   | no _ = (k , v)  transform xs ks f

The proof of monotonicity — which is the solution to the exercise — is actually quite complicated. I will omit its description, and show it here in another collapsed block.

(Click here to see the proof of monotonicity of hh)
From Map.agda, lines 1042 through 1105
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
        f'-Monotonic : Monotonic _≼ˡ_ _≼_ f'
        f'-Monotonic {l₁} {l₂} l₁≼l₂ = (f'l₁f'l₂⊆f'l₂ , f'l₂⊆f'l₁f'l₂)
            where
                fl₁fl₂⊆fl₂ = proj₁ (f-Monotonic l₁≼l₂)
                fl₂⊆fl₁fl₂ = proj₂ (f-Monotonic l₁≼l₂)

                f'l₁f'l₂⊆f'l₂ : ((f' l₁)  (f' l₂))  f' l₂
                f'l₁f'l₂⊆f'l₂ k v k,v∈f'l₁f'l₂
                    with Expr-Provenance-≡ ((` (f' l₁))  (` (f' l₂))) k,v∈f'l₁f'l₂
                ...   | in (single k,v∈f'l₁) k∉kf'l₂ =
                        let
                            k∈kfl₁ = updating-via-∈k-backward (f l₁) ks (updater l₁) (forget k,v∈f'l₁)
                            k∈kfl₁fl₂ = union-preserves-∈k₁ {l₁ = proj₁ (f l₁)} {l₂ = proj₁ (f l₂)} k∈kfl₁
                            (v' , k,v'∈fl₁l₂) = locate {m = (f l₁  f l₂)} k∈kfl₁fl₂
                            (v'' , (v'≈v'' , k,v''∈fl₂)) = fl₁fl₂⊆fl₂ k v' k,v'∈fl₁l₂
                            k∈kf'l₂ = updating-via-∈k-forward (f l₂) ks (updater l₂) (forget k,v''∈fl₂)
                        in
                            ⊥-elim (k∉kf'l₂ k∈kf'l₂)
                ...   | in k∉kf'l₁ (single k,v'∈f'l₂) =
                        (v , (IsLattice.≈-refl lB , k,v'∈f'l₂))
                ...   | bothᵘ (single {v₁} k,v₁∈f'l₁) (single {v₂} k,v₂∈f'l₂)
                        with k∈-dec k ks
                ...       | yes k∈ks
                            with refl  updating-via-k∈ks-≡ (f l₁) (updater l₁) k∈ks k,v₁∈f'l₁
                            with refl  updating-via-k∈ks-≡ (f l₂) (updater l₂) k∈ks k,v₂∈f'l₂ =
                            (updater l₂ k , (g-Monotonicʳ k l₁≼l₂ , k,v₂∈f'l₂))
                ...       | no k∉ks =
                                let
                                    k,v₁∈fl₁ = updating-via-k∉ks-backward (f l₁) (updater l₁) k∉ks k,v₁∈f'l₁
                                    k,v₂∈fl₂ = updating-via-k∉ks-backward (f l₂) (updater l₂) k∉ks k,v₂∈f'l₂
                                    k,v₁v₂∈fl₁fl₂ = ⊔-combines {m₁ = f l₁} {m₂ = f l₂} k,v₁∈fl₁ k,v₂∈fl₂
                                    (v' , (v'≈v₁v₂ , k,v'∈fl₂)) = fl₁fl₂⊆fl₂ k _ k,v₁v₂∈fl₁fl₂
                                    k,v'∈f'l₂ = updating-via-k∉ks-forward (f l₂) (updater l₂) k∉ks k,v'∈fl₂
                                in
                                    (v' , (v'≈v₁v₂ , k,v'∈f'l₂))

                f'l₂⊆f'l₁f'l₂ : f' l₂  ((f' l₁)  (f' l₂))
                f'l₂⊆f'l₁f'l₂ k v k,v∈f'l₂
                    with k∈kfl₂  updating-via-∈k-backward (f l₂) ks (updater l₂) (forget k,v∈f'l₂)
                    with (v' , k,v'∈fl₂)  locate {m = f l₂} k∈kfl₂
                    with (v'' , (v'≈v'' , k,v''∈fl₁fl₂))  fl₂⊆fl₁fl₂ k v' k,v'∈fl₂
                    with Expr-Provenance-≡ ((` (f l₁))  (` (f l₂))) k,v''∈fl₁fl₂
                ...   | in (single k,v''∈fl₁) k∉kfl₂ = ⊥-elim (k∉kfl₂ k∈kfl₂)
                ...   | in k∉kfl₁ (single k,v''∈fl₂) =
                        let
                            k∉kf'l₁ = updating-via-∉k-forward (f l₁) ks (updater l₁) k∉kfl₁
                        in
                            (v , (IsLattice.≈-refl lB , union-preserves-∈₂ k∉kf'l₁ k,v∈f'l₂))
                ...   | bothᵘ (single {v₁} k,v₁∈fl₁) (single {v₂} k,v₂∈fl₂)
                        with k∈-dec k ks
                ...       | yes k∈ks with refl  updating-via-k∈ks-≡ (f l₂) (updater l₂) k∈ks k,v∈f'l₂ =
                            let
                                k,uv₁∈f'l₁ = updating-via-k∈ks-forward (f l₁) (updater l₁) k∈ks (forget k,v₁∈fl₁)
                                k,uv₂∈f'l₂ = updating-via-k∈ks-forward (f l₂) (updater l₂) k∈ks (forget k,v₂∈fl₂)
                                k,uv₁uv₂∈f'l₁f'l₂ = ⊔-combines {m₁ = f' l₁} {m₂ = f' l₂} k,uv₁∈f'l₁ k,uv₂∈f'l₂
                            in
                                (updater l₁ k ⊔₂ updater l₂ k , (IsLattice.≈-sym lB (g-Monotonicʳ k l₁≼l₂) , k,uv₁uv₂∈f'l₁f'l₂))
                ...       | no k∉ks
                            with k,v₁∈f'l₁  updating-via-k∉ks-forward (f l₁) (updater l₁) k∉ks k,v₁∈fl₁
                            with k,v₂∈f'l₂  updating-via-k∉ks-forward (f l₂) (updater l₂) k∉ks k,v₂∈fl₂
                            with k,v₁v₂∈f'l₁f'l₂  ⊔-combines {m₁ = f' l₁} {m₂ = f' l₂} k,v₁∈f'l₁ k,v₂∈f'l₂
                            with refl  Map-functional {m = f' l₂} k,v∈f'l₂ k,v₂∈f'l₂
                            with refl  Map-functional {m = f l₂} k,v'∈fl₂ k,v₂∈fl₂ =
                            (v₁ ⊔₂ v , (v'≈v'' , k,v₁v₂∈f'l₁f'l₂))

Given a proof of the exercise, all that’s left is to instantiate the theorem with the argument I described. Specifically:

In the equation for gg, I explicitly insert the map mm instead of leaving it implicit as the textbook does. In Agda, this instantiation for joining all predecessor looks like this (using states as the list of keys to update, indicating that we should update every key):

From Forward.agda, lines 152 through 157
152
153
154
155
156
157
    open StateVariablesFiniteMap.GeneralizedUpdate states isLatticeᵐ (λ x  x) (λ a₁≼a₂  a₁≼a₂) joinForKey joinForKey-Mono states
        renaming
            ( f' to joinAll
            ; f'-Monotonic to joinAll-Mono
            ; f'-k∈ks-≡ to joinAll-k∈ks-≡
            )

And the one for evaluating all programs looks like this:

From Forward.agda, lines 215 through 220
215
216
217
218
219
220
        open StateVariablesFiniteMap.GeneralizedUpdate states isLatticeᵐ (λ x  x) (λ a₁≼a₂  a₁≼a₂) updateVariablesForState updateVariablesForState-Monoʳ states
            renaming
                ( f' to updateAll
                ; f'-Monotonic to updateAll-Mono
                ; f'-k∈ks-≡ to updateAll-k∈ks-≡
                )

Actually, we haven’t yet seen that updateVariablesFromStmt. This is a function that we can define using the user-provided abtract interpretation eval. Specifically, it handles the job of updating the sign of a variable once it has been assigned to (or doing nothing if the statement is a no-op).

The updateVariablesFromExpression is now new, and it is yet another map update, which changes the sign of a variable k to be the one we get from running eval on it. Map updates are instances of the generalized update; this time, the updater gg is eval. The exercise requires the updater to be monotonic, which constrains the user-provided evaluation function to be monotonic too.

From Forward.agda, lines 173 through 181
173
174
175
176
177
178
179
180
181
        private module _ (k : String) (e : Expr) where
            open VariableValuesFiniteMap.GeneralizedUpdate vars isLatticeᵛ (λ x  x) (λ a₁≼a₂  a₁≼a₂) (λ _  eval e) (λ _ {vs₁} {vs₂} vs₁≼vs₂  eval-Mono e {vs₁} {vs₂} vs₁≼vs₂) (k  [])
                renaming
                    ( f' to updateVariablesFromExpression
                    ; f'-Monotonic to updateVariablesFromExpression-Mono
                    ; f'-k∈ks-≡ to updateVariablesFromExpression-k∈ks-≡
                    ; f'-k∉ks-backward to updateVariablesFromExpression-k∉ks-backward
                    )
                public

We finally write the analyze function as the composition of the two bulk updates:

Instantiating with the Sign Lattice

Thus far, I’ve been talking about the sign lattice throughout, but implementing the Agda code in terms of a general lattice L and evaluation function eval. In order to actually run the Agda code, we do need to provide an eval function, which implements the logic we used above, in which a zero-sign variable xx minus one was determined to be negative. For binary operators specifically, I’ve used the table provided in the textbook; here they are:

Cayley tables for abstract interpretation of plus and minus

Cayley tables for abstract interpretation of plus and minus

These are pretty much common sense:

The Agda encoding for the plus function is as follows, and the one for minus is similar.

From Sign.agda, lines 76 through 94
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
plus : SignLattice  SignLattice  SignLattice
plus ⊥ᵍ _ = ⊥ᵍ
plus _ ⊥ᵍ = ⊥ᵍ
plus ⊤ᵍ _ = ⊤ᵍ
plus _ ⊤ᵍ = ⊤ᵍ
plus [ + ]ᵍ [ + ]ᵍ = [ + ]ᵍ
plus [ + ]ᵍ [ - ]ᵍ = ⊤ᵍ
plus [ + ]ᵍ [ 0ˢ ]ᵍ = [ + ]ᵍ
plus [ - ]ᵍ [ + ]ᵍ = ⊤ᵍ
plus [ - ]ᵍ [ - ]ᵍ = [ - ]ᵍ
plus [ - ]ᵍ [ 0ˢ ]ᵍ = [ - ]ᵍ
plus [ 0ˢ ]ᵍ [ + ]ᵍ = [ + ]ᵍ
plus [ 0ˢ ]ᵍ [ - ]ᵍ = [ - ]ᵍ
plus [ 0ˢ ]ᵍ [ 0ˢ ]ᵍ = [ 0ˢ ]ᵍ

-- this is incredibly tedious: 125 cases per monotonicity proof, and tactics
-- are hard. postulate for now.
postulate plus-Monoˡ :  (s₂ : SignLattice)  Monotonic _≼ᵍ_ _≼ᵍ_ (λ s₁  plus s₁ s₂)
postulate plus-Monoʳ :  (s₁ : SignLattice)  Monotonic _≼ᵍ_ _≼ᵍ_ (plus s₁)

As the comment in the block says, it would be incredibly tedious to verify the monotonicity of these tables, since you would have to consider roughly 125 cases per argument: for each (fixed) sign ss and two other signs s1s2s_1 \le s_2, we’d need to show that s +^ s1s +^ s2s\ \hat{+}\ s_1 \le s\ \hat{+}\ s_2. I therefore commit the faux pas of using postulate. Fortunately, the proof of monotonicity is not used for the execution of the program, so we will get away with this, barring any meddling kids.

From this, all that’s left is to show that for any expression e, the evaluation function:

eval:Expr(VariableSign)Sign \text{eval} : \text{Expr} \to (\text{Variable} \to \text{Sign}) \to \text{Sign}

is monotonic. It’s defined straightforwardly and very much like an evaluator / interpreter, suggesting that “abstract interpretation” is the correct term here.

From Sign.agda, lines 176 through 184
176
177
178
179
180
181
182
183
184
    eval :  (e : Expr)  VariableValues  SignLattice
    eval (e₁ + e₂) vs = plus (eval e₁ vs) (eval e₂ vs)
    eval (e₁ - e₂) vs = minus (eval e₁ vs) (eval e₂ vs)
    eval (` k) vs
        with ∈k-decᵛ k (proj₁ (proj₁ vs))
    ...   | yes k∈vs = proj₁ (locateᵛ {k} {vs} k∈vs)
    ...   | no _ = ⊤ᵍ
    eval (# 0) _ = [ 0ˢ ]ᵍ
    eval (# (suc n')) _ = [ + ]ᵍ

Thought it won’t happen, it was easier to just handle the case where there’s an undefined variable; I give it “any sign”. Otherwise, the function simply consults the sign tables for + or -, as well as the known signs of the variables. For natural number literals, it assigns 0 the “zero” sign, and any other natural number the “++”.

To prove monotonicity, we need to consider two variable maps (one less than the other), and show that the abstract interpretation respects that ordering. This boils down to the fact that the plus and minus tables are monotonic in both arguments (thus, if their sub-expressions are evaluated monotonically given an environment, then so is the whole addition or subtraction), and to the fact that for two maps m₁ ≼ m₂, the values at corresponding keys are similarly ordered: m₁[k] ≼ m₂[k]. We saw that above.

(Click to expand the proof that the evaluation function for signs is monotonic)
From Sign.agda, lines 186 through 223
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
    eval-Mono :  (e : Expr)  Monotonic _≼ᵛ_ _≼ᵍ_ (eval e)
    eval-Mono (e₁ + e₂) {vs₁} {vs₂} vs₁≼vs₂ =
        let
            -- TODO: can this be done with less boilerplate?
            g₁vs₁ = eval e₁ vs₁
            g₂vs₁ = eval e₂ vs₁
            g₁vs₂ = eval e₁ vs₂
            g₂vs₂ = eval e₂ vs₂
        in
            ≼ᵍ-trans
                {plus g₁vs₁ g₂vs₁} {plus g₁vs₂ g₂vs₁} {plus g₁vs₂ g₂vs₂}
                (plus-Monoˡ g₂vs₁ {g₁vs₁} {g₁vs₂} (eval-Mono e₁ {vs₁} {vs₂} vs₁≼vs₂))
                (plus-Monoʳ g₁vs₂ {g₂vs₁} {g₂vs₂} (eval-Mono e₂ {vs₁} {vs₂} vs₁≼vs₂))
    eval-Mono (e₁ - e₂) {vs₁} {vs₂} vs₁≼vs₂ =
        let
            -- TODO: here too -- can this be done with less boilerplate?
            g₁vs₁ = eval e₁ vs₁
            g₂vs₁ = eval e₂ vs₁
            g₁vs₂ = eval e₁ vs₂
            g₂vs₂ = eval e₂ vs₂
        in
            ≼ᵍ-trans
                {minus g₁vs₁ g₂vs₁} {minus g₁vs₂ g₂vs₁} {minus g₁vs₂ g₂vs₂}
                (minus-Monoˡ g₂vs₁ {g₁vs₁} {g₁vs₂} (eval-Mono e₁ {vs₁} {vs₂} vs₁≼vs₂))
                (minus-Monoʳ g₁vs₂ {g₂vs₁} {g₂vs₂} (eval-Mono e₂ {vs₁} {vs₂} vs₁≼vs₂))
    eval-Mono (` k) {vs₁@((kvs₁ , _) , _)} {vs₂@((kvs₂ , _), _)} vs₁≼vs₂
        with ∈k-decᵛ k kvs₁ | ∈k-decᵛ k kvs₂
    ...   | yes k∈kvs₁ | yes k∈kvs₂ =
            let
                (v₁ , k,v₁∈vs₁) = locateᵛ {k} {vs₁} k∈kvs₁
                (v₂ , k,v₂∈vs₂) = locateᵛ {k} {vs₂} k∈kvs₂
            in
                m₁≼m₂⇒m₁[k]ᵛ≼m₂[k]ᵛ vs₁ vs₂ vs₁≼vs₂ k,v₁∈vs₁ k,v₂∈vs₂
    ...   | yes k∈kvs₁ | no k∉kvs₂ = ⊥-elim (k∉kvs₂ (subst (λ l  k ∈ˡ l) (all-equal-keysᵛ vs₁ vs₂) k∈kvs₁))
    ...   | no k∉kvs₁ | yes k∈kvs₂ = ⊥-elim (k∉kvs₁ (subst (λ l  k ∈ˡ l) (all-equal-keysᵛ vs₂ vs₁) k∈kvs₂))
    ...   | no k∉kvs₁ | no k∉kvs₂ = IsLattice.≈-refl isLatticeᵍ
    eval-Mono (# 0) _ = ≈ᵍ-refl
    eval-Mono (# (suc n')) _ = ≈ᵍ-refl

That’s all we need. With this, I just instantiate the Forward module we have been working with, and make use of the result. I also used a show function (which I defined) to stringify that output.

From Sign.agda, lines 225 through 229
225
226
227
228
229
    module ForwardWithEval = ForwardWithProg.WithEvaluator eval eval-Mono
    open ForwardWithEval using (result)

    -- For debugging purposes, print out the result.
    output = show result

But wait, result? We haven’t seen a result just yet. That’s the last piece, and it involves finally making use of the fixed-point algorithm.

Invoking the Fixed Point Algorithm

Our Info\text{Info} lattice is of finite height, and the function we have defined is monotonic (by virtue of being constructed only from map updates, which are monotonic by Exercise 4.26, and from function composition, which preserves monotonicity). We can therefore apply the fixed-point-algorithm, and compute the least fixed point:

From Forward.agda, lines 235 through 238
235
236
237
238
        open import Fixedpoint ≈ᵐ-dec isFiniteHeightLatticeᵐ analyze (λ {m₁} {m₂} m₁≼m₂  analyze-Mono {m₁} {m₂} m₁≼m₂)
            using ()
            renaming (aᶠ to result; aᶠ≈faᶠ to result≈analyze-result)
            public

With this, analyze is the result of our forward analysis!

In a Main.agda file, I invoked this analysis on a sample program:

testCode : Stmt
testCode =
     "zero"  (# 0)  then
     "pos"  ((` "zero") Expr.+ (# 1))  then
     "neg"  ((` "zero") Expr.- (# 1))  then
     "unknown"  ((` "pos") Expr.+ (` "neg")) 

testProgram : Program
testProgram = record
    { rootStmt = testCode
    }

open WithProg testProgram using (output; analyze-correct)

main = run {0} (putStrLn output)

The result is verbose, since it shows variable signs for each statement in the program. However, the key is the last basic block, which shows the variables at the end of the program. It reads:

{"neg" ↦ -, "pos" ↦ +, "unknown" ↦ ⊤, "zero" ↦ 0, }

Verifying the Analysis

We now have a general framework for running forward analyses: you provide an abstract interpretation function for expressions, as well as a proof that this function is monotonic, and you get an Agda function that takes a program and tells you the variable states at every point. If your abstract interpretation function is for determining the signs of expressions, the final result is an analysis that determines all possible signs for all variables, anywhere in the code. It’s pretty easy to instantiate this framework with another type of forward analysis — in fact, by switching the plus function to one that uses AboveBelow ℤ, rather than AboveBelow Sign:

plus : ConstLattice  ConstLattice  ConstLattice
plus ⊥ᶜ _ = ⊥ᶜ
plus _ ⊥ᶜ = ⊥ᶜ
plus ⊤ᶜ _ = ⊤ᶜ
plus _ ⊤ᶜ = ⊤ᶜ
plus [ z₁ ]ᶜ [ z₂ ]ᶜ = [ z₁ Int.+ z₂ ]ᶜ

we can defined a constant-propagation analysis.

{"neg" ↦ -1, "pos" ↦ 1, "unknown" ↦ 0, "zero" ↦ 0, }

However, we haven’t proved our analysis correct, and we haven’t yet made use of the CFG-semantics equivalence that we proved in the previous section. I was hoping to get to it in this post, but there was just too much to cover. So, I will get to that in the next post, where we will make use of the remaining machinery to demonstrate that the output of our analyzer matches reality.