Basic definition and solution to the biased bisect problem #
We construct a solution to the following problem:
A software has $n+1$ versions $v_0,v_1,\dots,v_n$, and $n$ changes in between. It is discovered one of the change broke a feature of the software, but we don't know which one did. We only know $v_0$ is fine and $v_n$ is broken. How do we quickly find the broken change?
The answer to this classic question is to do binary search. But what if the cost of performing a software test depends on the outcome? For example, a broken software will result in a system crash and requires much longer time to reboot. Which version should we test first, and next?
Let s and t be the cost of a successful and a failed test. The expected cost $F(n)$ is
$$ F(n) = \begin{cases} 0 & n = 1 \\ \min_{1 ≤ w ≤ n - 1} \{\frac{w}{n} (F(w) + t) + \frac{n - w}{n} (F(n - w) + s)\} & n \ge 2 \end{cases} $$
where $w$ is the first version to test. To simplify a little bit, we normalize $F(n)$ with
$$ E(n) = n F(n) $$
where $E(n)$ satisfies
$$ E(n) = \begin{cases} 0 & n = 1 \\ \min_{1 ≤ w ≤ n - 1} \{E(w) + E(n - w) + w t + (n - w) s\} & n \ge 2 \end{cases} $$
Main statements #
IsOptimalCostℤ
andIsOptimalStratℤ
are the target properties of the solution.IsOptimalCost
andIsOptimalStrat
are the modified properties that extend the domain to ℝ.E
is the optimal cost function of the solution with domain in ℝ.E_IsOptimalCost
verifies this is the solution toIsOptimalCost
.
wₛₑₜ
is the optimal strategy function of the solution with domain in ℝ.wₘᵢₙ
is the minimal optimal strategy.wₘₐₓ
is the maximal optimal strategy.wₗᵢ
is the principal strategy.W_IsOptimalStrat
verifies this is the solution toIsOptimalStrat
.
Eℤ
restrictsE
toℤ
.Eℤ_IsOptimalCost
verifies this is the solution toIsOptimalCostℤ
.
wℤ
restrictswₛₑₜ
toℤ
Wℤ_IsOptimalStrat
verifies this is the solution toIsOptimalStratℤ
.
A lot of statements requires positive numbers, so we define a convenient class to pass them around.
Throughout the file, we will use a pair of real positive parameters $s$ and $t$.
We start with the lattic Λ = ℕ × ℕ
and assign each lattice point $(p, q)$ a value
$δ = ps + qt$. Visually, this is drawing a line passing the point with a
fixed slope (namely -s/t) and measures how far away it is from the origin.
All possible $δ$ makes up the set Δ
. One can notice that the "shape" of this set
is different depending on whether $s/t$ is rational or not:
- For irrational $s/t$, each lattice point will get a assigned a unique $δ$, and
Δ
get more dense when we are futher away from the origin. - For rational $s/t$, a line of slope $-s/t$ can pass multiple lattice points, and eventually the gap between $δ$ is stabilized at a fixed value $\gcd(s, t)$.
The set Δ
is symmetric for $s$ and $t$. We will explore this symmetry a lot later on.
Another property we will explore is homogeneity: parameters $(l s, l t)$ is closely related to $(s, t)$, and the associated objects is either the same, or scaled by $l$.
For each lattice point, we can assign a $δ$. As previously mentioned, this is injective only when $s/t$ is irrational.
Similarly, δₚ
is also symmetric, but one needs to swap the coordinates of the input.
We can draw a line with slope $-s/t$ and only consider lattice points enveloped by the line, including those on the line. Equalently, this is considering only points whose assigned $δ$ is less or equal to a given value. We call these subsets as "ceiled".
As an important example, the subset ceiled by $0$ only includes the point $(0, 0)$
If the ceiling is negative, Λceiled
is the empty set.
δₚ
maps all ceiled lattice points to all ceiled $δ$
We would like to prove that Δceiled
is finite.
We first introduce bounded natural numbers, and their products
and show their finiteness.
Λceiled
is always inside a rectangle region, hence finite
Equations
- instFintypeElemProdNatΛceiledOfPosReal s t ceil = Fintype.ofFinite ↑(Λceiled s t ceil)
Consequently Δceiled
well-ordered.
We now can show the whole set Δ is also well-ordered. Although Δ is an infinite set and can become arbitrarily dense for larger elements, its base, as indicated by the ceiled variation, behaves friendly for the order.
Δ always has the smallest element 0
We also introduce "floored" subsets, the complement of ceiled ones. These subsets contain elements where $δ$ is larger than a certain threshold.
Obviously, floored sets are also symmetric.
... and homogeneous.
Floored sets are still infinite, but are well-ordered as subsets.
Floored sets are always non-empty due to the unboundness of Δ.
Since Δ is well-ordered, it is possible to sort all elements and enumerate them starting from the smallest one (0).
We first define the find the next $δ'$ given an element $δ$ using floored subsets.
Note that this function also accepts input outside of Δ
. It simply finds the
smallest $δ$ that's larger than the input.
Again the symmetry is passed on to δnext
.
δnext
is homogeneous.
δnext
is weakly increasing w.r.t floor
δnext
will always output an element larger than the input.
δnext
also effectively gives the "gap" between the input δ and the next δ'.
There is no additional lattice point between this gap,
which means Λceiled
is inert for any bound given between the gap.
We can define the sequence δₖ
by sorting all elements in Δ
.
The index $k$ will also be used a lot for other related sequences.
δₖ
is obviously strictly increasing.
δₖ
is also symmetric.
δₖ
always starts with 0.
δₖ
is homogeneous.
All δₖ
are obviously elements in Δ
.
Together with δₖ_surjΔ
, this shows δₖ
is a bijection between Δ
and ℕ
.
We introduce a new kind of subset of the lattice:
lattice points whose assigned $δ$ is exactly a given constant.
It can be empty if the given constant is not in Δ
.
As one can notice, this subset is a sub-singleton when $s/t$ is irrational, but we won't expand on it here.
This subset is again symmetric with lattice coordinates swapped.
If the line is negative, it won't cover any lattice points.
Elements in Λline
is allowed to shift in coordinates and change their $δ$ by $s$.
Note that this is not saying Λline
of $δ$ and of $δ + s$ are one-to-one.
When shifting $δ$ by $s$, it can potentially introduce a new element with $p' = 0$. This element
is ruled out by the $p' = p + 1 ≥ 1$ in the statement.
By symmetry, we can state similarly for $t$ and $q$.
The line subset at $δ = 0$ gives the singleton $(0, 0)$.
Equations
- instFintypeElemProdNatΛlineOfPosReal s t δ = Fintype.ofFinite ↑(Λline s t δ)
Now we assign each lattice point with another value $J$, which is the Pascal triangle where $p$- and $q$-axies are the sides of the triangle.
Just like the Pascal triangle, Jₚ
follows the recurrence relation.
It should be noted that if we embed $Λ$ in ℤ × ℤ and assign $J = 0$ to the rest of the points, all points still follow this recurrence relation except at $(0, 0)$. This defect will show up again later.
A gross bound for Jₚ
to decompose it to a product of $f(p)$ and $g(q)$.
On Λ, $J$ are all nonzero.
$J$ itself is symmatrical for swapped coordinates.
We can evaluate $J$ for a given $δ$, by summing up $J$ of all points passed by the line.
The evaluation on the line is symmetric for $s$ and $t$.
A helper function to zero the value if the input is zero.
Jline
can be shifted by $s$. The sum will however be affected by the potential point
on the $p = 0$ boundary, hence the equality needs to remove such point.
A similar statement can be said for $t$
Derived from the recurrence of binomial coefficents,
Jline
is also recurrent, except for at $δ = 0$.
At $δ = 0$, Jline gives the "seed" 1 that induces all other values.
Since we have defined the sequence δₖ
for all elements in Δ
,
we can map them to a sequence Jₖ
by Jline
The sequence Jₖ
is symmetric.
The sequence Jₖ
is non-zero.
We also define a pair of sequence Jsₖ
and Jtₖ
similar to Jₖ
,
but the line is shifted by $s$ or $t$.
The shifting can make some line no longer pass any lattice points,
so some Jsₖ
and Jtₖ
are zero.
Just like Jline
for Λline
, we can define Jceiled
for Λceiled
which sums over all lattices bounded by $δ$.
Jceiled
is symmetric.
... and homogeneous.
Jceiled
is weakly increasing with regard to $δ$.
As $δ$ grows, Λceiled can either remain unchanged for include new points.
While Jceiled
is only weakly increasing, and one can't deduce the relation of two $δ$ from their Jceiled
,
it is possible give a relation between $δ$ if one of them is δₖ
The growth of Jceiled
is precisely described by Jline
.
Another way to view this is to say Jceiled = ΣJline
for all lines in the bound.
Since there are gaps between $δ$, Jceiled
stops growing when inside these gaps.
We can also derive a few variants of this lemma:
Now we can define the sequence nₖ
as partial sums of Jₖ
.
The first element n₀
starts at $1$ for reasons we will see later.
This essentially comes from the defect of binomial coefficient at $(0, 0)$.
nₖ
will be the n-coordinate of the vertices of several piecewise functions we will introduce.
nₖ
is also symmetric.
... and homogeneous.
The first two elements of nₖ
are always 1 and 2.
nₖ
grows faster than $k$ it self.
And obviously, nₖ
is strictly increasing.
As a quick corollary, nₖ
are all positive.
Just as we used Jₖ
to define nₖ
, we also use Jsₖ
and Jtₖ
to define
partial sum sequences wₖ'
and wₖ
, respectively.
(The reason wₖ
corresponds to $t$ is mostly historical)
The starting point w₀
= 1 is an artifact, as we will see it doesn't follow
nice properties we will soon see.
The real starting point of this sequence is w₁
= 1.
Recurrence formula of wₖ
: by swapping $s$ and $t$, $w$ becomes $n - w$
This is the first property that shows w₀
doesn't follow the pattern.
A more sensible definition of w₀
that follows the Symmetry can be
w₀ = 1/2
when $s = t$w₀ = c
if $s > t$ else1 - c
But these definitions doesn't add much value to our further arguments, so we will just leavew₀
semantically undefined.
(The equivalent formula wₖ s t k + wₖ' s t k = nₖ s t k
might be more
suitable to be the recurrence formula. This is stated this way for
historical reasons)
Here is a pretty important property of wₖ
and wₖ'
:
Elements of wₖ
and wₖ'
sequence all come from nₖ
.
This means wₖ
and wₖ'
effectively sets up mapping from nₖ
to itself.
It can be showed that this mapping is weakly monotone and contracting,
and we will prove a weaker version of this later.
With sequence δₖ, nₖ, and wₖ introduced, we will construct the following functions:
First, the "cost differential" function $dE(n): [1, ∞) → ℝ$
↑ dE(n)
|
| |-J₀-|-J₁--|---J₂---|-------J₃------|
|
| |
δ₃ --|-- *===============∘
| |
δ₂ --|-- *========∘
| |
| |
δ₁ --|-- *=====∘
| |
| |
δ₀ --+-----*====∘-----|--------|---------------|--------→ n
0 n₀ n₁ n₂ n₃ n₄
(=1)
The function is defined like a stair case. By convension, each interval is defined with left point closed: $$ dE( [n_k, n_{k+1}) ) = δ_k $$
Second, the "strategy" function $w(n): [2, ∞) → P(ℝ)$.
↑ w(n)
|
| |-J₀-|-J₁--|---J₂---|-------J₃------|
| /
w₄ --|-- *----------*- --|--
| /##########/ |
| /##########/ |
| /##########/ | Jt₃
| /##########/ |
w₃ --|-- *------*----------* --|--
| /######/ | Jt₂
w₂ --|-- *---*------* --|--
| /###/ | Jt₁
w₁ --|-- *---* --|--
+----------|-----|--------|---------------|--------→ n
0 n₀ n₁ n₂ n₃ n₄
(=1) (=2)
We first anchor all points $(n₁, w₁)$, $(n₂, w₂)$, ...
and then connect them with parallelogram with an angle of 45°
The parallelogram can be degenerated if Jt
$ = 0$ or Jt
$ = J$.
Then all points enveloped, including the boundary, are in w(n)
Again, because w₀
is semantically undefined,
$w(n)$ is only defined starting from n₁
$ = 2$.
We also write w(n) = [wₘᵢₙ(n), wₘₐₓ(n)]
But before we can define these functions, we need to define how to find $k$ for a given real input $n$.
We define kceiled
as the set of natural numbers $k$ for which nₖ
$ ≤ n$.
kceiled
is also obviously symmetric and finite.
... and homogeneous.
kceiled
is finite, which allows us to take maximum value later.
Equations
- instFintypeElemNatKceiled s t n = Fintype.ofFinite ↑(kceiled s t n)
And obviously, it is also symmetrical.
... and homogeneous.
kₙ
and nₖ
are basically inverse functions to each other.
One can recover the $k$ by composing kₙ
and nₖ
.
Any $n ≥ 1$ should give a valid $k$.
Mean while, $n < 1$ never gives a valid $k$.
Now the cost differential function is defined by clamping to the nearest $k$ and find δₖ
.
... which is symmetric.
... homogeneous.
... and weakly increasing.
The following three lemma show the nice property of wₖ when applied to dE
:
The domain $n ∈ [1, ∞)$ is divided by wₖ k
and wₖ (k + 1)
into three regions:
dE( [1, wₖ k ) ) < δₖ - t
dE( [wₖ k, wₖ (k + 1)) ) = δₖ - t
dE( [wₖ (k + 1), ∞ ) ) > δₖ - t
In other words, wₖ
captures exactly where dE = δₖ - t
(while nₖ
captures where dE = δₖ
)
Note that because 1 ≤ wₖ k ≤ wₖ (k + 1)
are week inequalities,
the intervals listed above can degenerate.
There are similar properties with wₖ'
and δₖ - s
, but the proof is omitted.
As a corollary, we show that wₖ
is not only a monotone mapping from nₖ
to itself,
but also under the mapping, wₖ (k)
and wₖ (k + 1)
are either same, or two nₖ (k')
next to each other.
By symmetry, the same holds for wₖ'
.
The strategy function $w$ is defined by finding wₖ
after clamping to the nearest $k$
The parallelogram is formed by taking certain min and max.
Just like wₖ
, $w(n)$ is bounded within $[1, n - 1]$.
We also define a third kind of $w$ function wₗᵢ
,
which is the diagonals of parallelograms formed by wₘᵢₙ
and wₘₐₓ
.
We also define the dual version wₗᵢ'
We could have done the same for wₘᵢₙ
and wₘₐₓ
,
but we omitted them as they don't add much value.
wₗᵢ
as the diagnonal, is always between wₘᵢₙ
and wₘₐₓ
.
With this, we have the complete ordering:
1 ≤ wₘᵢₙ ≤ wₗᵢ ≤ wₘₐₓ ≤ n - 1
As usual, wₗᵢ
is symmetric
... and has recurrence formula.
It is symmetric
... and weakly increasing w.r.t $w$
We show that wₘᵢₙ
and wₘₐₓ
indicates where dD
is negative, zero, or positive.
In these theorems, we conviniently ignored boundary points at w = wₘᵢₙ
or w = wₘₐₓ
.
dD
value at those points can be found, but it doesn't add much value for our further arguments.
Here is a more useful version with the correction term $s + t$
Now we can construct our main function, the cost function E (n)
↑ E (n)
|
| |-J₀-|-J₁--|---J₂---|-------J₃------|
|
| ·* --|--
| ·· |
| · |
| · |
| ·· |
| · |
| · | (δ₃+s+t)*J₃
| ·· |
| | | | | · |
| | | | | · |
| | | | | · |
E₃ --|-- | | | ··*·--- --|--
| | | | · |
| | | | ·· | (δ₂+s+t)*J₂
| | | | ·· |
E₂ --|-- | | ·*·------------ --|--
| | | · |
| | | ·· | (δ₁+s+t)*J₁
E₁ --|-- | ·*·------------------ --|--
| | ·· | (δ₀+s+t)*J₀
E₀ --+-----*·---|-----|--------|---------------|-----|--→ n
0 n₀ n₁ n₂ n₃ n₄
(=1)
We first pin the vertices Eₖ
on this function.
... which is symmetric.
... which is symmetric.
... and can be expressed as an integral.
While E (n)
itself is defined as partial sum of Jₖ * (δₖ + s + t)
,
we can also show the composed mapping E (w(n))
is the partial sum of Jtₖ * (δₖ + s)
.
And here is the strategy evaluation function.
... which is symmetric.
... and is the integral of the strategy evaluation differential function.
We will now prove several version of the recurrence formula on E
.
Eventually, we reached the major conclusion:
The cost equals the strategy evaluation at the optimal strategy wₗᵢ
But because D
has zero derivative dD
between wₘᵢₙ
and wₘₐₓ
all $w$ in between gives cost = strategy evaluation
And using the fact that the derivative dD
is negative/positive outside the range,
we conclude that the strategy evaluation is larger than the cost everywhere else.
Therefore, the interval bounded by wₘᵢₙ
and wₘₐₓ
idicates where E = D
.
Let's make it its own function
Let's summarize our result in a high level
For any possible cost function $E(n): [1, ∞) → ℝ$
We can define a strategy evaluation function StratEval
$\{E\}(n, w)$
A cost function $E$ is called optimal if the min value of StratEval
is $E$ itself,
and a strategy function $w$ is called optimal if it is the set for StratEval
to reach $E$.
While Eℤ
is easy to understand, we need to show that
wₘᵢₙℤ
and wₘₐₓℤ
remains the same value when lifted.
We can then define the integer version of the optimal criteria,
and proof the optimality of Eℤ
and Wℤ
.
Equations
- IsOptimalCostℤ Efun s t = ∀ n ≥ 2, IsLeast (StratEvalℤ Efun s t n '' Set.Icc 1 (n - 1)) (Efun n)