Basic definition and solution to the biased bisect problem #
We construct a solution to the following problem:
A software has $n+1$ versions $v_0,v_1,\dots,v_n$, and $n$ changes in between. It is discovered one of the change broke a feature of the software, but we don't know which one did. We only know $v_0$ is fine and $v_n$ is broken. How do we quickly find the broken change?
The answer to this classic question is to do binary search. But what if the cost of performing a software test depends on the outcome? For example, a broken software will result in a system crash and requires much longer time to reboot. Which version should we test first, and next?
Let s and t be the cost of a successful and a failed test. The expected cost $F(n)$ is
$$ F(n) = \begin{cases} 0 & n = 1 \\ \min_{1 ≤ w ≤ n - 1} \{\frac{w}{n} (F(w) + t) + \frac{n - w}{n} (F(n - w) + s)\} & n \ge 2 \end{cases} $$
where $w$ is the first version to test. To simplify a little bit, we normalize $F(n)$ with
$$ E(n) = n F(n) $$
where $E(n)$ satisfies
$$ E(n) = \begin{cases} 0 & n = 1 \\ \min_{1 ≤ w ≤ n - 1} \{E(w) + E(n - w) + w t + (n - w) s\} & n \ge 2 \end{cases} $$
Main statements #
IsOptimalCostℤandIsOptimalStratℤare the target properties of the solution.IsOptimalCostandIsOptimalStratare the modified properties that extend the domain to ℝ.Eis the optimal cost function of the solution with domain in ℝ.E_IsOptimalCostverifies this is the solution toIsOptimalCost.
wₛₑₜis the optimal strategy function of the solution with domain in ℝ.wₘᵢₙis the minimal optimal strategy.wₘₐₓis the maximal optimal strategy.wₗᵢis the principal strategy.W_IsOptimalStratverifies this is the solution toIsOptimalStrat.
EℤrestrictsEtoℤ.Eℤ_IsOptimalCostverifies this is the solution toIsOptimalCostℤ.
wℤrestrictswₛₑₜtoℤWℤ_IsOptimalStratverifies this is the solution toIsOptimalStratℤ.
A lot of statements requires positive numbers, so we define a convenient class to pass them around.
Throughout the file, we will use a pair of real positive parameters $s$ and $t$.
We start with the lattic Λ = ℕ × ℕ and assign each lattice point $(p, q)$ a value
$δ = ps + qt$. Visually, this is drawing a line passing the point with a
fixed slope (namely -s/t) and measures how far away it is from the origin.
All possible $δ$ makes up the set Δ. One can notice that the "shape" of this set
is different depending on whether $s/t$ is rational or not:
- For irrational $s/t$, each lattice point will get a assigned a unique $δ$, and
Δget more dense when we are futher away from the origin. - For rational $s/t$, a line of slope $-s/t$ can pass multiple lattice points, and eventually the gap between $δ$ is stabilized at a fixed value $\gcd(s, t)$.
The set Δ is symmetric for $s$ and $t$. We will explore this symmetry a lot later on.
Another property we will explore is homogeneity: parameters $(l s, l t)$ is closely related to $(s, t)$, and the associated objects is either the same, or scaled by $l$.
For each lattice point, we can assign a $δ$. As previously mentioned, this is injective only when $s/t$ is irrational.
Similarly, δₚ is also symmetric, but one needs to swap the coordinates of the input.
We can draw a line with slope $-s/t$ and only consider lattice points enveloped by the line, including those on the line. Equalently, this is considering only points whose assigned $δ$ is less or equal to a given value. We call these subsets as "ceiled".
As an important example, the subset ceiled by $0$ only includes the point $(0, 0)$
If the ceiling is negative, Λceiled is the empty set.
δₚ maps all ceiled lattice points to all ceiled $δ$
We would like to prove that Δceiled is finite.
We first introduce bounded natural numbers, and their products
and show their finiteness.
Λceiled is always inside a rectangle region, hence finite
Equations
- instFintypeElemProdNatΛceiledOfPosReal s t ceil = Fintype.ofFinite ↑(Λceiled s t ceil)
Consequently Δceiled well-ordered.
We now can show the whole set Δ is also well-ordered. Although Δ is an infinite set and can become arbitrarily dense for larger elements, its base, as indicated by the ceiled variation, behaves friendly for the order.
Δ always has the smallest element 0
We also introduce "floored" subsets, the complement of ceiled ones. These subsets contain elements where $δ$ is larger than a certain threshold.
Obviously, floored sets are also symmetric.
... and homogeneous.
Floored sets are still infinite, but are well-ordered as subsets.
Floored sets are always non-empty due to the unboundness of Δ.
Since Δ is well-ordered, it is possible to sort all elements and enumerate them starting from the smallest one (0).
We first define the find the next $δ'$ given an element $δ$ using floored subsets.
Note that this function also accepts input outside of Δ. It simply finds the
smallest $δ$ that's larger than the input.
Again the symmetry is passed on to δnext.
δnext is homogeneous.
δnext is weakly increasing w.r.t floor
δnext will always output an element larger than the input.
δnext also effectively gives the "gap" between the input δ and the next δ'.
There is no additional lattice point between this gap,
which means Λceiled is inert for any bound given between the gap.
We can define the sequence δₖ by sorting all elements in Δ.
The index $k$ will also be used a lot for other related sequences.
δₖ is obviously strictly increasing.
δₖ is also symmetric.
δₖ always starts with 0.
δₖ is homogeneous.
All δₖ are obviously elements in Δ.
Together with δₖ_surjΔ, this shows δₖ is a bijection between Δ and ℕ.
We introduce a new kind of subset of the lattice:
lattice points whose assigned $δ$ is exactly a given constant.
It can be empty if the given constant is not in Δ.
As one can notice, this subset is a sub-singleton when $s/t$ is irrational, but we won't expand on it here.
This subset is again symmetric with lattice coordinates swapped.
If the line is negative, it won't cover any lattice points.
Elements in Λline is allowed to shift in coordinates and change their $δ$ by $s$.
Note that this is not saying Λline of $δ$ and of $δ + s$ are one-to-one.
When shifting $δ$ by $s$, it can potentially introduce a new element with $p' = 0$. This element
is ruled out by the $p' = p + 1 ≥ 1$ in the statement.
By symmetry, we can state similarly for $t$ and $q$.
The line subset at $δ = 0$ gives the singleton $(0, 0)$.
Equations
- instFintypeElemProdNatΛlineOfPosReal s t δ = Fintype.ofFinite ↑(Λline s t δ)
Now we assign each lattice point with another value $J$, which is the Pascal triangle where $p$- and $q$-axies are the sides of the triangle.
Just like the Pascal triangle, Jₚ follows the recurrence relation.
It should be noted that if we embed $Λ$ in ℤ × ℤ and assign $J = 0$ to the rest of the points, all points still follow this recurrence relation except at $(0, 0)$. This defect will show up again later.
A gross bound for Jₚ to decompose it to a product of $f(p)$ and $g(q)$.
On Λ, $J$ are all nonzero.
$J$ itself is symmatrical for swapped coordinates.
We can evaluate $J$ for a given $δ$, by summing up $J$ of all points passed by the line.
The evaluation on the line is symmetric for $s$ and $t$.
A helper function to zero the value if the input is zero.
Jline can be shifted by $s$. The sum will however be affected by the potential point
on the $p = 0$ boundary, hence the equality needs to remove such point.
A similar statement can be said for $t$
Derived from the recurrence of binomial coefficents,
Jline is also recurrent, except for at $δ = 0$.
At $δ = 0$, Jline gives the "seed" 1 that induces all other values.
Since we have defined the sequence δₖ for all elements in Δ,
we can map them to a sequence Jₖ by Jline
The sequence Jₖ is symmetric.
The sequence Jₖ is non-zero.
We also define a pair of sequence Jsₖ and Jtₖ similar to Jₖ,
but the line is shifted by $s$ or $t$.
The shifting can make some line no longer pass any lattice points,
so some Jsₖ and Jtₖ are zero.
Just like Jline for Λline, we can define Jceiled for Λceiled
which sums over all lattices bounded by $δ$.
Jceiled is symmetric.
... and homogeneous.
Jceiled is weakly increasing with regard to $δ$.
As $δ$ grows, Λceiled can either remain unchanged for include new points.
While Jceiled is only weakly increasing, and one can't deduce the relation of two $δ$ from their Jceiled,
it is possible give a relation between $δ$ if one of them is δₖ
The growth of Jceiled is precisely described by Jline.
Another way to view this is to say Jceiled = ΣJline for all lines in the bound.
Since there are gaps between $δ$, Jceiled stops growing when inside these gaps.
We can also derive a few variants of this lemma:
Now we can define the sequence nₖ as partial sums of Jₖ.
The first element n₀ starts at $1$ for reasons we will see later.
This essentially comes from the defect of binomial coefficient at $(0, 0)$.
nₖ will be the n-coordinate of the vertices of several piecewise functions we will introduce.
nₖ is also symmetric.
... and homogeneous.
The first two elements of nₖ are always 1 and 2.
nₖ grows faster than $k$ it self.
And obviously, nₖ is strictly increasing.
As a quick corollary, nₖ are all positive.
Just as we used Jₖ to define nₖ, we also use Jsₖ and Jtₖ to define
partial sum sequences wₖ' and wₖ, respectively.
(The reason wₖ corresponds to $t$ is mostly historical)
The starting point w₀ = 1 is an artifact, as we will see it doesn't follow
nice properties we will soon see.
The real starting point of this sequence is w₁ = 1.
Recurrence formula of wₖ: by swapping $s$ and $t$, $w$ becomes $n - w$
This is the first property that shows w₀ doesn't follow the pattern.
A more sensible definition of w₀ that follows the Symmetry can be
w₀ = 1/2when $s = t$w₀ = cif $s > t$ else1 - cBut these definitions doesn't add much value to our further arguments, so we will just leavew₀semantically undefined.
(The equivalent formula wₖ s t k + wₖ' s t k = nₖ s t k might be more
suitable to be the recurrence formula. This is stated this way for
historical reasons)
Here is a pretty important property of wₖ and wₖ':
Elements of wₖ and wₖ' sequence all come from nₖ.
This means wₖ and wₖ' effectively sets up mapping from nₖ to itself.
It can be showed that this mapping is weakly monotone and contracting,
and we will prove a weaker version of this later.
With sequence δₖ, nₖ, and wₖ introduced, we will construct the following functions:
First, the "cost differential" function $dE(n): [1, ∞) → ℝ$
↑ dE(n)
|
| |-J₀-|-J₁--|---J₂---|-------J₃------|
|
| |
δ₃ --|-- *===============∘
| |
δ₂ --|-- *========∘
| |
| |
δ₁ --|-- *=====∘
| |
| |
δ₀ --+-----*====∘-----|--------|---------------|--------→ n
0 n₀ n₁ n₂ n₃ n₄
(=1)
The function is defined like a stair case. By convension, each interval is defined with left point closed: $$ dE( [n_k, n_{k+1}) ) = δ_k $$
Second, the "strategy" function $w(n): [2, ∞) → P(ℝ)$.
↑ w(n)
|
| |-J₀-|-J₁--|---J₂---|-------J₃------|
| /
w₄ --|-- *----------*- --|--
| /##########/ |
| /##########/ |
| /##########/ | Jt₃
| /##########/ |
w₃ --|-- *------*----------* --|--
| /######/ | Jt₂
w₂ --|-- *---*------* --|--
| /###/ | Jt₁
w₁ --|-- *---* --|--
+----------|-----|--------|---------------|--------→ n
0 n₀ n₁ n₂ n₃ n₄
(=1) (=2)
We first anchor all points $(n₁, w₁)$, $(n₂, w₂)$, ...
and then connect them with parallelogram with an angle of 45°
The parallelogram can be degenerated if Jt$ = 0$ or Jt$ = J$.
Then all points enveloped, including the boundary, are in w(n)
Again, because w₀ is semantically undefined,
$w(n)$ is only defined starting from n₁$ = 2$.
We also write w(n) = [wₘᵢₙ(n), wₘₐₓ(n)]
But before we can define these functions, we need to define how to find $k$ for a given real input $n$.
We define kceiled as the set of natural numbers $k$ for which nₖ$ ≤ n$.
kceiled is also obviously symmetric and finite.
... and homogeneous.
kceiled is finite, which allows us to take maximum value later.
Equations
- instFintypeElemNatKceiled s t n = Fintype.ofFinite ↑(kceiled s t n)
And obviously, it is also symmetrical.
... and homogeneous.
kₙ and nₖ are basically inverse functions to each other.
One can recover the $k$ by composing kₙ and nₖ.
Any $n ≥ 1$ should give a valid $k$.
Mean while, $n < 1$ never gives a valid $k$.
Now the cost differential function is defined by clamping to the nearest $k$ and find δₖ.
... which is symmetric.
... homogeneous.
... and weakly increasing.
The following three lemma show the nice property of wₖ when applied to dE:
The domain $n ∈ [1, ∞)$ is divided by wₖ k and wₖ (k + 1) into three regions:
dE( [1, wₖ k ) ) < δₖ - t
dE( [wₖ k, wₖ (k + 1)) ) = δₖ - t
dE( [wₖ (k + 1), ∞ ) ) > δₖ - t
In other words, wₖ captures exactly where dE = δₖ - t (while nₖ captures where dE = δₖ)
Note that because 1 ≤ wₖ k ≤ wₖ (k + 1) are week inequalities,
the intervals listed above can degenerate.
There are similar properties with wₖ' and δₖ - s, but the proof is omitted.
As a corollary, we show that wₖ is not only a monotone mapping from nₖ to itself,
but also under the mapping, wₖ (k) and wₖ (k + 1) are either same, or two nₖ (k') next to each other.
By symmetry, the same holds for wₖ'.
The strategy function $w$ is defined by finding wₖ after clamping to the nearest $k$
The parallelogram is formed by taking certain min and max.
Just like wₖ, $w(n)$ is bounded within $[1, n - 1]$.
We also define a third kind of $w$ function wₗᵢ,
which is the diagonals of parallelograms formed by wₘᵢₙ and wₘₐₓ.
We also define the dual version wₗᵢ'
We could have done the same for wₘᵢₙ and wₘₐₓ,
but we omitted them as they don't add much value.
wₗᵢ as the diagnonal, is always between wₘᵢₙ and wₘₐₓ.
With this, we have the complete ordering:
1 ≤ wₘᵢₙ ≤ wₗᵢ ≤ wₘₐₓ ≤ n - 1
As usual, wₗᵢ is symmetric
... and has recurrence formula.
It is symmetric
... and weakly increasing w.r.t $w$
We show that wₘᵢₙ and wₘₐₓ indicates where dD is negative, zero, or positive.
In these theorems, we conviniently ignored boundary points at w = wₘᵢₙ or w = wₘₐₓ.
dD value at those points can be found, but it doesn't add much value for our further arguments.
Here is a more useful version with the correction term $s + t$
Now we can construct our main function, the cost function E (n)
↑ E (n)
|
| |-J₀-|-J₁--|---J₂---|-------J₃------|
|
| ·* --|--
| ·· |
| · |
| · |
| ·· |
| · |
| · | (δ₃+s+t)*J₃
| ·· |
| | | | | · |
| | | | | · |
| | | | | · |
E₃ --|-- | | | ··*·--- --|--
| | | | · |
| | | | ·· | (δ₂+s+t)*J₂
| | | | ·· |
E₂ --|-- | | ·*·------------ --|--
| | | · |
| | | ·· | (δ₁+s+t)*J₁
E₁ --|-- | ·*·------------------ --|--
| | ·· | (δ₀+s+t)*J₀
E₀ --+-----*·---|-----|--------|---------------|-----|--→ n
0 n₀ n₁ n₂ n₃ n₄
(=1)
We first pin the vertices Eₖ on this function.
... which is symmetric.
... which is symmetric.
... and can be expressed as an integral.
While E (n) itself is defined as partial sum of Jₖ * (δₖ + s + t),
we can also show the composed mapping E (w(n)) is the partial sum of Jtₖ * (δₖ + s).
And here is the strategy evaluation function.
... which is symmetric.
... and is the integral of the strategy evaluation differential function.
We will now prove several version of the recurrence formula on E.
Eventually, we reached the major conclusion:
The cost equals the strategy evaluation at the optimal strategy wₗᵢ
But because D has zero derivative dD between wₘᵢₙ and wₘₐₓ
all $w$ in between gives cost = strategy evaluation
And using the fact that the derivative dD is negative/positive outside the range,
we conclude that the strategy evaluation is larger than the cost everywhere else.
Therefore, the interval bounded by wₘᵢₙ and wₘₐₓ idicates where E = D.
Let's make it its own function
Let's summarize our result in a high level
For any possible cost function $E(n): [1, ∞) → ℝ$
We can define a strategy evaluation function StratEval$\{E\}(n, w)$
A cost function $E$ is called optimal if the min value of StratEval is $E$ itself,
and a strategy function $w$ is called optimal if it is the set for StratEval to reach $E$.
While Eℤ is easy to understand, we need to show that
wₘᵢₙℤ and wₘₐₓℤ remains the same value when lifted.
We can then define the integer version of the optimal criteria,
and proof the optimality of Eℤ and Wℤ.
Equations
- IsOptimalCostℤ Efun s t = ∀ n ≥ 2, IsLeast (StratEvalℤ Efun s t n '' Set.Icc 1 (n - 1)) (Efun n)