1 Jukasa

Co250 Assignment 17

As I mentioned last week in Bad Faith Asserted by Excess Insurer, there are certain circumstances in which an individual or entity other than the policyholder can assert a bad faith claim against an insurance company. This situation recently arose before the Washington Court of Appeals in Unigard Ins. Co. v. Mutual of Enumclaw Ins. Co., 250 P.3d 121 (April 4, 2011) in an appeal regarding prejudgment interest, jury instructions and clean up costs.

The underlying case arose from Charles Engelmann’s purchase of property in 1979. The property had formerly been used as a dry cleaning facility. Mr. Engelmann later sold the property to Newmarket I, a general partnership. Years later, the Washington State Department of Ecology notified Newmarket that it might be liable for the release of hazardous substances at the property under the Model Toxics Control Act, Chapter RCW 70.105D. Based on the prospect of state enforcement, Newmarket entered into a voluntary clean-up program. An investigation of the property revealed soil and groundwater contamination. Newmarket incurred clean-up costs.

Newmarket then sued Mr. Engelmann and other former owners of the property for contribution. Mr. Engelmann tried to tender his defense to Mutual of Enumclaw Insurance Company (“Mutual”), the carrier of his homeowner’s policy during the time that he owned the property. Mutual denied coverage and refused to defend.

Left to deal with the claims on his own, Engelmann entered into a settlement agreement with Newmarket. He agreed to pay Newmarket $20,000 and to assign Newmarket all his rights against Mutual of Enumclaw. In return, Newmarket released Engelmann from all claims except to the extent they could be satisfied through the assignment of rights. Engelmann did not admit liability. The agreement expressly stated that the settling parties denied liability “for any and all claims related to the Site, the Facility and the Contribution Action.” Newmarket designated the rights it had obtained from Engelmann to its own insurer, Unigard Insurance Company. Unigard sued Mutual of Enumclaw for breach of contract, bad faith, and violation of the Consumer Protection Act. The parties each moved for partial summary judgment on the issue of liability. The trial court granted Unigard’s motion.

The properly executed assignment of rights enabled Newmarket to file a bad faith claim against an insurer that was not its own.

Please consider that the decision above is specific to the state of Washington. Other jurisdictions may rule otherwise.

Lecture Notes by Anthony Zhang.

CO250

Introduction to optimization.

3/1/17

In this course, we will be studying linear programming, integer programming, the Simplex algorithm, and the concept of duality, as well as various ways to efficiently approximate answers to these sorts of problems.

An abstract optimization problem (also known as a "P") is a problem in which we are trying to maximize or minimize a certain objective function - finding the inputs to the function, subject to certain constraints, such that the value of the function is maximised. In other words, given A \subseteq \mb{R}^n (the set of all feasible inputs, also known as the feasible region) and a function f: A \to \mb{R} (the objective function), find x \in A such that f(x) is the global maximum/minimum.

Optimization problems show up in almost every industry - hardware companies might want to simplify their circuit layouts, hotels might want to fill the most rooms possible at the highest price, and a factory might want to generate the most profit given constraints on materials and labour. Also, you can use it to play Pandemic optimally.

There are three types of abstract optimization problems:

  • Linear programming problems (also known as LPs) are abstract optimization problems where A is representable as a set of linear equations/inequalities, and f is a linear function.
  • Integer programming problems (also known as IPs) are linear programming problems where A \subseteq \mb{Z}^n.
  • Non-linear programming problems (also known as NLPs) are abstract optimization problems where A cannot be represented as a set of linear equations/inequalities, or f is non-linear.

Abstract optimization problems are usually given as prose descriptions of the problems, which we then turn into abstract mathematical models/programs/problems. Then, we can take advantage of the many powerful computer solvers available today to automatically find optimal solutions.

Suppose a company produces either plastic carrots (which sell for $5 each and take 4 minutes to make) or rubber potatoes (which sell for $8 each and take 8 minutes to make). The company has 500 minutes available in the rest of the day, and the machine that makes either of these products can only be used up to 100 times - how many of each product should be made to maximize profit?

First, we define our decision variables - the actual values we are trying to find. Let c be the number of carrots, and p be the number of potatoes.
Profit can now be defined as a function of those decision variables: f(c, p) = 5c + 8p.
Next, we can define constraints to specify A. There are only 500 minutes left, and the time spent depends on the number of each product made, so 4c + 8p \le 500. There are only 100 products left that we can make, and this is simply the number of carrots and potatoes, so c + p \le 100. Additionally, c and p must be non-negative, since they represent the number of each product.
We now have a mathematical model of the problem: "maximize 5c + 3p subject to 4c + 3p \le 500 and c + p \le 100".

A feasible solution to an abstract optimization problem is any x \in A - any variable assignment such that the constraints of the problem are satisfied, like c = 50, p = 10. An optimal solution is one that actually solves the problem - a feasible solution that results in the global maximum/minimum of the objective function. Likewise, feasible solutions to prose descriptions of the problem are assignments to the unknowns, like "make 50 carrots and 10 potatoes".

To show that a mathematical model of a problem is correct (properly represents the prose description of the problem), we usually define a bijection between feasible solutions for the mathematical model and feasible solutions for the prose description of the problem, such that the bijection preserves cost - each x \in A for the mathematical model must have f(x) equal to the the profit in the prose description for its corresponding feasible solution.

For example, show that the model from the above example is correct:

4c + 3p \le 500 \land c + p \le 100 can be trivially bijected to "carrots take 4 minutes to make and potatoes take 8 minutes, and the machine can only be used to make 100 products".
Clearly, f(c, p) = 5c + 8p has the same value as the profit since "carrots sell for $5 each and potatoes sell for $8 each".

5/1/17

In this course we will mostly focus on affine optimization problems, since solving general optimization problems is much harder.

An affine function is one that can be written as f(\vec{x}) = \vec{a} \cdot \vec{x} + \beta where \vec{a} \in \mb{R}^n and \beta \in \mb{R}. If \beta = 0, the function is also linear.

A linear program is an abstract optimization problem of the form \min \set{f(x) : x \in \mb{R}^n \wedge \left(\forall 1 \le i \le m, g_i(x) \le b_i\right)} where f is affine, g_1, \ldots, g_m are all linear, and m is finite. We might also replace \min with \max depending on the problem.

Usually we write this as: "minimize f(x) subject to g_1 \le b_1, \ldots, g_m \le b_m", or "maximize f(x) s.t. g_1 \le b_1, \ldots, g_m \le b_m". It's also common to use \vec{x} \le \mb{0} as shorthand for x_1 \ge 0, \ldots, x_n \ge 0.

Suppose an oil company in the next 4 months needs to supply 5000L, 8000L, 9000L, and 6000L, respectively, and oil costs them 0.75/L, 0.72/L, 0.92/L, and 0.90/L, respectively. The company has an oil tank that can store up to 4000L, and starts off with 2000L. Oil is bought at the beginning of the month, is used throughout the month, and is all dumped into the tank at the end of the month. What quantity of oil should be bought in each month to minimize cost?

Clearly, the inputs are the quantity of oil bought in each month, b_1, b_2, b_3, b_4. For convenience, we will also define the tank value at the beginning of each month, t_1, t_2, t_3, t_4. Clearly, 0 \le t_1 \le 4000, 0 \le t_2 \le 4000, 0 \le t_3 \le 4000, 0 \le t_4 \le 4000, since the capacity of the tank is 4000L.
The objective function we are minimizing is f(b_1, b_2, b_3, b_4) = 0.75b_1 + 0.72b_2 + 0.92b_3 + 0.90b_4.
Clearly, the tank level in month i is t_1 = 2000, t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000, since the tank level is just the leftover oil after supplying the last month's oil.
Clearly, in order to supply enough oil, we must have t_1 + b_1 \ge 5000, t_2 + b_2 \ge 8000, t_3 + b_3 \ge 9000, t_4 + b_4 \ge 6000. However, the first three constraints are redundant with the previously defined constraints t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000 and 0 \le t_1, 0 \le t_2, 0 \le t_3, 0 \le t_4, so we only need the t_4 + b_4 \ge 6000 constraint.
These encode all of the problem constraints, so we have "minimize 0.75b_1 + 0.72b_2 + 0.92b_3 + 0.90b_4 subject to 0 \le t_1 \le 4000, 0 \le t_2 \le 4000, 0 \le t_3 \le 4000, 0 \le t_4 \le 4000 and t_1 = 2000, t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000 and t_4 \ge 6000".
If we use a linear programming solver, we get b_1 = 3000, b_2 = 12000, b_3 = 5000, b_4 = 6000, which gives us the minimum cost 20890.

Same thing, but minimise the maximum amount of oil purchased in any single month, and then show that the model is correct assuming that the above example's model was correct:

From the previous example, we have the same constraints 0 \le t_1 \le 4000, 0 \le t_2 \le 4000, 0 \le t_3 \le 4000, 0 \le t_4 \le 4000 and t_1 = 2000, t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000 and t_4 \ge 6000.
However, the objective function is different - let M be the maximum amount of oil bought in a single month, and we can minimize this value. We can define M to be the maximum amount of oil bought in a single month by adding the constraints b_1 \le M, b_2 \le M, b_3 \le M, b_4 \le M.
These encode all of the problem constraints, so we have "minimize M subject to 0 \le t_1 \le 4000, 0 \le t_2 \le 4000, 0 \le t_3 \le 4000, 0 \le t_4 \le 4000 and t_1 = 2000, t_2 = t_1 + b_1 - 5000, t_3 = t_2 + b_2 - 8000, t_4 = t_3 + b_3 - 9000 and t_4 \ge 6000 and b_1 \le M, b_2 \le M, b_3 \le M, b_4 \le M".
Now to show correctness. Suppose that M is an optimal solution that has values b_1, b_2, b_3, b_4. Since M is a feasible solution, M \ge \max\set{b_1, b_2, b_3, b_4}, and since M is an optimal solution, it is the lowest such value that satisfies that, so M = \max\set{b_1, b_2, b_3, b_4}. Therefore, M is also the minimum maximum amount of oil purchased in any single month.

10/1/17

Integer programs are just linear programs plus a integrality constraint - a constraint that all of the variables must be integers. When we formulate integer programs, we just add the constraint x_1, \ldots, x_n \text{ are integral}.

An integer program is mixed if there are still non-integral variables, and pure if it only has integral variables. Integer programs are harder to solve than LPs.

The size of a linear/integer program generally refers to either the number of variables, of the number of constraints. The running time of an IP/LP solver algorithm is the number of steps it takes, as a function of the size of the IP/LP.

Efficient algorithms are polynomially. All LPs can be solved in polynomial time, though with a high exponent. Solving integer programs, however, are in NP, and so are highly unlikely to be solvable in polynomial time.

The knapsack problem is a special case of integer programs. Given a set of n items, with values v_1, \ldots, v_n and weights w_1, \ldots, w_n, how do we maximize the total value of the items we can carry in a knapsack given that the total weight cannot exceed m? From CS341, we know that if the number of items can be fractional, we can just use a greedy algorithm.

As an integer program, we have: maximize v_1 x_1 + \ldots = v_n x_n subject to w_1 x_1 + \ldots + w_n x_n \le m, x_1, \ldots, x_n \text{ are integral}.

Write the IP for a knapsack problem where we have n = 4 and either x_1 + x_2 is at least 4, or x_3 + x_4 is at least 4, and we can only send x_3 if we send at least one x_4:

We're trying to maximize v_1 x_1 + v_2 x_2 + v_3 x_3 + v_4 x_4.
Clearly, we can express x_1 + x_2 \ge 4 AND x_3 = x_4 \ge 4, but how do we express x_1 + x_2 \ge 4 OR x_3 + x_4 \ge 4?
One way to do this is by adding a binary variable y to our IP, constrained such that 0 \le y \le 1 and y \text{ is integral}.
We can now express x_1 + x_2 \ge 4 \lor x_3 + x_4 \ge 4 as x_1 + x_2 \ge 4y, x_3 + x_4 \ge 4(1 - y). Whenever y is 0, 4y is 0, so x_1 + x_2 will always be true (since x_1 and x_2 are non-negative), while 4(1 - y) is 4, so the second constraint becomes x_3 + x_4 \ge 4. Whenever y is 1, 4y is 4, so the first constraint becomes x_1 + x_2 \ge 4, while the second constraint is always true (since x_3 and x_4 are non-negative).
Now, how do we express sending x_3 only if we send at least one x_4? We want to express that x_4 = 0 \implies x_3 = 0.
One way to do this is to multiply x_4 by a large enough value so that if it's 1 or more, the resulting value will definitely be greater than x_3. For example, x_3 must be no greater than \floor{\frac m {w_3}}, so we can add the constraint x_3 \le \floor{\frac m {w_3}} x_4 to do this. When x_4 = 0, \floor{\frac m {w_3}} x_4 is also 0, so x_3 must be 0 as well. When x_4 \ge 1, \floor{\frac m {w_3}} x_4 \ge \floor{\frac m {w_3}}, so x_3 \le \floor{\frac m {w_3}} x_4 is always true.
Therefore, we can write this as "maximize v_1 x_1 + v_2 x_2 + v_3 x_3 + v_4 x_4 subject to w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4 \le m, x_1 + x_2 \ge 4y, x_3 + x_4 \ge 4(1 - y), 0 \le y \le 1, y \text{ is integral}, x_3 \le \floor{\frac m {w_3}} x_4, x_1, x_2, x_3, x_4 \ge 0, x_1, x_2, x_3, x_4 \text{ is integral}".

The above example demonstrates a common trick of using a binary variable to express logical constraints involving disjunction. The binary variable lets us make constraints that apply when it has one value, and always be true for other values.

Scheduling is one of the most common optimization problems in industry. Suppose we have the coffee shop that is open on weekdays, hires workers that work for 4 consecutive weekdays per week (can wrap around weeks, and ignoring weekends), and requires 3, 5, 9, 2, and 7 workers to meet demand on Monday through Friday, respectively. How do we hire the smallest number of workers that meets this demand, and how do we schedule them to meet this demand?

First, we could define 5 variables, x_1, \ldots, x_5, for the number of workers who should start working on each weekday. Then, the objective becomes to minimize x_1 + \ldots + x_5.

What are the feasible solutions? Well, the constraints are that the number of workers starting on each day must be non-negative, and that the demand is met each day. To ensure we have enough demand for each day, we can define five constraints for each weekday. For example, for Monday demand to be met, we need the number of workers who start on Monday, Friday, Thursday, or Wednesday to be at least 3, since Monday demand is 3 and on Monday, we have workers from up to 4 days ago: x_1 + x_5 + x_4 + x_3 \ge 3.

So, the problem becomes: "minimize x_1 + \ldots + x_5 subject to x_1 + x_5 + x_4 + x_3 \ge 3, x_2 + x_1 + x_5 + x_4 \ge 5, x_3 + x_2 + x_1 + x_5 \ge 9, x_4 + x_3 + x_2 + x_1 \ge 2, x_5 + x_4 + x_3 + x_2 \ge 7, and x_1, \ldots, x_5 \ge 0".

Suppose we wanted to constrain a variable x to be one of k_1, \ldots, k_n. One way we could do this is to have binary variables y_1, \ldots, y_n, where y_i represents x = k_i - adding a constraint of the form x = k_1 y_1 + \ldots + k_n y_n. Then, we can ensure that exactly one of them is true by adding the constraint y_1 + \ldots y_n = 1.

12/1/17

Another common type of linear programming problem in the real world is graph optimization. For example, how do we find the shortest path on a map? How do we optimize a microchip layout?

For the shortest path on a map example, we can associate street intersections with vertices, and streets with edges, weighted by the distance (a non-negative real number) along its length between intersections.

An s, t-walk in a graph is a sequence of edges s v_1, v_1 v_2, \ldots, v_k t. An s, t-path is an s, t-walk such that non-consecutive edges in the sequence don't share any vertices (we never revisit the same vertex twice).

The length of an s, t-path is the sum of the weights of the edges in the path. So, given a graph G = \tup{V, E}, a weight function w: E \to \mb{R}, and two vertices s, t, we want to find the minimum length s, t-path. How do we write this as an IP?

Suppose we have tasks t = t_1, \ldots, t_n and machines m = m_1, \ldots, m_n, and c_{m_i, t_j} represents the cost of having machine m_i perform task t_j. How do we match machines to tasks to minimize total cost?

Create a graph with one vertex for each job and edges between them weighted by the corresponding cost.
We want to find the perfect (every vertex in the graph is incident to an edge in the matching) matching (subset of edges such that none share vertices) in the bipartite graph with the minimum total weight.
Let \delta(v) be a function that returns the set of all edges incident to a given vertex. Then a set of edges M is a perfect matching if and only if \forall v \in V, \magn{M \cap \delta(v)} = 1.
For our IP, we can have a binary variable x_{\text{machine}, \text{task}} for every edge \tup{\text{machine}, \text{task}}. Now we can simply minimize \sum_{m_i \in m, t_j \in t} c_{m_i, t_j} x_{m_i, t_j}.
To constrain solutions to perfect matchings only, we can add the constraints \sum_{m_i \in m} \sum_{t_j \in \delta(m_i)} x_{m_i, t_j} = 1 and \sum_{t_j \in t} \sum_{m_i \in \delta(t_j)} x_{m_i, t_j} = 1 (for each vertex, choose exactly one incident edge).

17/1/17

The shortest path problem: given a graph G = \tup{V, E}, what is the s, t-path that has the minimum total length? We want to formulate this as an IP.

Recall from MATH239 that a cut in a graph is a partitioning of V into two disjoint subsets, often written as just one of the sets. In other words, a cut is a pair \tup{S, T} such that S \subseteq V, T \subseteq V, S \cap T = \emptyset, S \cup T = V.

An s, t-cut is a set of edges rather than a pair of sets of vertices. Let \delta(S) = \set{\set{u, v} \in E : u \in S, v \notin S} for S \subseteq V. Then given s \in S and t \notin S, \delta(S) is an s, t-cut. To enumerate all s, t-cuts, we can simply find all subsets of V that contain s but not t, then apply \delta to those sets of vertices.

Clearly, if we remove the edges contained in the s, t-cut from the graph, s and t would no longer be connected. Therefore, every s, t-path must contain at least one edge from every s, t-cut.

Also, if S \subseteq E contains at least one edge from every s, t-cut, then S also is a superset of an s, t-path. This is because if S had an edge from every s, t-cut and didn't have an s, t-path, the set of all vertices R that can be reached from s is an s, t-cut (since it includes s but not t, as no s, t-path exists by assumption), so S must include an edge from \delta(R). However, S can't contain any edges from \delta(R), because if it did contain an edge \set{u, v} then both u and v should have been in R (since they're reachable from s). Therefore, S must also contain an s, t-path.

Define binary variables x_1, \ldots, x_m for each edge e_1, \ldots, e_m being or not being in the shortest path, respectively. Let w_1, \ldots, w_n be the weights of each edge e_1, \ldots, e_m, respectively.

Clearly, we want to minimize x_1 w_1 + \ldots + x_m w_m (i.e., the length of the path). What are the feasible solutions (i.e., supersets of s, t-paths)?

To be an s, t-path, the edges of the path must include at least one edge from every s, t-cut. This is a lot easier to write as constraints: we have one constraint for each s, t-cut \delta(S), and each constraint ensures that at least one of the edge variables T \subseteq \set{x_1, \ldots, x_m} corresponding to \delta(S) is true.

For example, given the graph G = \tup{\set{a, b, s, t}, \set{ab, as, at, bs, bt}} with weights w_1, \ldots, w_5, we might write the shortest path problem as "minimize x_1 w_1 + \ldots + x_5 w_5 subject to x_2 + x_4 \ge 1 (for the s, t-cut \set{sa, sb}), x_1 + x_2 + x_3 \ge 1 (for the s, t-cut \set{ab, as, at}), x_1 + x_2 + x_5 \ge 1 (for the s, t-cut \set{ab, as, bt}), x_3 + x_5 (for the s, t-cut \set{at, bt}), and x_1, \ldots, x_m are non-negative integers".

Note that we don't need an upper bound on x_1, \ldots, x_m. This is because if it was greater than zero, then it wouldn't be an optimal solution.

Nonlinear programs are problems of the form "minimize f(x) subject to g_1(x) \le 0, \ldots, g_m(x) \le 0" where f, g_1, \ldots, g_m: \mb{R}^n \to \mb{R}. This is a superset of linear programs and integer programs. Basically, we can do arbitrary constraints, math permitting.

For example, we can minimize Euclidean distance by using an objective function (x_1 - p_1)^2 + (x_2 - p_2)^2 + (x_3 - p_3)^2. Another example is constraining variables to integers: simply add the constraint \sin(\pi x) = 0 to ensure that x is an integer, or x(1 - x) = 0 to ensure that x is either 0 or 1.

It's even possible to write Fermat's last theorem as a single NLP: "minimize (a^n + b^n - c^n)^2 subject to \sin(\pi a) = 0, \sin(\pi b) = 0, \sin(\pi c) = 0, \sin(\pi n) = 0, a, b, c \ge 1, n \ge 3" - the objective function solution is 0 if and only if a^n + b^n = c^n, and the theorem is true if and only if the optimal value is greater than 0.

23/1/17

What does it mean to solve an optimization problem? What is a solution to an optimization problem?

Consider the LP "maximize 2x_1 + 3x_2 subject to x_1 + x_2 \le 1, x_1, x_2 \ge 0". Clearly, in this case the optimal solution is simply x_1 = 1, x_2 = 0. However, in general, it is more difficult to say what exactly can be considered a solution.

A feasible solution is an assignment of values to each of the variables in the LP such that all of the constraints are satisfied - regardless of the objective function value. A feasible optimization problem is one that has at least one feasible solution, and an infeasible optimization problem is one that has none. An optimal solution is a feasible solution where the objective function is maximized (for a maximization problem), or minimized (for a minimization problem).

However, an optimization problem having a feasible solution doesn't mean that it has an optimal solution - the maximum/minimum objective function value could be unbounded, like for "maximize x subject to x \ge 0". If for any feasible solution there exists another with a higher/lower objective function value (depending on whether it's a maximization/minimization problem), the optimization problem is unbounded, and otherwise, it is bounded.

Consider the optimization problem "maximize x subject to x < 1". Clearly, this is feasible (x = 0 is a solution), and bounded (x \le 1 because x < 1), yet it has no optimal solution - the objective function can always get closer to 1 for any given value of x! The same occurs for the optimization problem "minimize \frac 1 x subject to x \ge 1". Note that neither of these are LPs, because one uses a strict inequality, and the other a division.

LPs are a bit better behaved than general optimization problems. According to the Fundamental Theorem of Linear Programming, an LP either has at least one optimal solution, is infeasible, or is unbounded.

An LP solver should therefore output one of the following: an optimal solution to the input LP (plus proof that it's optimal), a proof that the LP is infeasible, or a proof that the LP is unbounded.

Consider the linear system A\vec x = \vec b, \vec x \ge \vec 0. If we can find a \vec y such that {\vec y}^T A \ge {\vec 0}^T and {\vec y}^T \vec b < {\vec 0}^T, then the system can't have any solutions. In other words, we can add/subtract/scale certain equality constraints in the linear system together to get a new constraint such that the coefficients of the constraint are all non-negative, yet the right hand side is negative. Here, \vec y simply represents the fact that we're adding/subtracting/scaling rows in A and elements in \vec b to form a new constraint.

If we have such a constraint (left side coefficients non-negative, right side negative), then the left side of that constraint must be a non-negative value (since the elements of \vec x are also non-negative) and the right side is negative (by assumption), so this constraint is not satisfiable - the LP must be infeasible.

According to Farkas' Lemma, if there's no feasible solutions, then a \vec y that satisfies those conditions must exist. Therefore, a value of \vec y such that {\vec y}^T A \ge {\vec 0}^T and {\vec y}^T \vec b < 0 is a proof of infeasibility. For an LP in SEF, a value \vec y such that {\vec y}^T A = {\vec 0}^T and {\vec y}^T \vec b \ne 0 works just as well for proof purposes - a linear combination of constraints that results in the left side being zero and the right side nonzero.

For example, consider an LP with constraints 3x_1 - 2x_2 = 6, 2x_1 - x_2 = 2. If we choose y = \begin{bmatrix} -1 \\ 2 \end{bmatrix}, then we get {\vec y}^T A = \begin{bmatrix} 1 & 0 \end{bmatrix} and {\vec y}^T \vec b = -4. This means that the two constraints, with the first one subtracted from the second one doubled, imply the constraint x_1 = -2, which is always infeasible because \vec x \ge \vec 0.

To prove that an optimal solution \vec x for an LP is actually optimal, we simply need to prove that for any feasible solution \vec x', the objective function value is less or equal to the objective function value for \vec x - basically, proving that the upper bound of the objective function occurs at \vec x. This is a proof of optimality. We will look at how to construct these proofs systematically later on, using the strong duality theorem.

Consider the LP "maximize 2x_1 + 3x_2 subject to x_1 + x_2 \le 1, x_1, x_2 \ge 0". Clearly, any feasible solution x_1, x_2 must be

To prove that an LP is unbounded, we construct a family of feasible solutions \vec x(t) for all t \ge 0, then show that as t goes to infinity, so does the value of the objective function. In other words, we need to prove that for any arbitrarily high objective function value (assuming a maximization problem), we can construct a variable assignment that is both a feasible solution, and results in that objective function value - a proof of unboundedness.

This family of feasible solutions is always a line in the linear space. Therefore, a family of feasible solutions \vec x(t) = \vec x_0 + t \vec r such that for any z, there exists a t such that \vec x(t) = z is a proof of unboundedness. In matrix form, a proof of unboundedness is simply a value of \vec x_0, \vec r such that \vec x_0 \ge \vec 0, \vec r \ge \vec 0, A \vec x_0 = \vec b, A \vec r = \vec 0, {\vec c}^T \vec r > 0. Also, if the LP is unbounded, those values of \vec x_0, \vec r must exist.

Consider the LP "maximize x subject to x \ge 0". Clearly, for any objective function value z \ge 0, we can construct an assignment x = z, which must be a feasible solution since z \ge 0. Additionally, the objective function value for the feasible solution x must be z. Therefore, the LP is unbounded, and a proof for that would be \vec x_0 = \begin{bmatrix} 0 \end{bmatrix}, \vec r = \begin{bmatrix} 1 \end{bmatrix}.

Basically, whichever outcome an LP has, there must exist a proof that that outcome is the case.

A free variable is a variable that is not constrained.

An LP is in standard equality form (SEF) if and only if all of the following are true:

  • The LP is a maximization problem.
  • Every variable is constrained to be non-negative.
  • All constraints are equality constraints.

In other words, an LP in SEF is a problem of the form "maximize \vec c \cdot \vec x subject to A \vec x = \vec b, \vec x \ge \vec 0", where A is an n by m coefficients matrix and \vec x, \vec b, \vec c \in \mb{R}^m.

Every LP has an "equivalent" LP that is in SEF. By equivalent, we mean the original LP is unbounded if and only if the equivalent LP is unbounded, the original LP is infeasible if and only if the equivalent LP is infeasible, and an optimal solution of one can easily be used to compute an optimal solution of another.

To convert an LP into its equivalent in SEF:

  • If the LP is a minimization problem, turn it into a maximization problem by inverting the objective function: "minimize f(x) subject to ..." becomes "maximize -f(x) subject to ...".
  • Replace constraints of the form c_1 x_1 + \ldots c_k x_k \le b with c_1 x_1 + \ldots c_k x_k + s = b where s \ge 0 is a new variable.
  • Replace constraints of the form c_1 x_1 + \ldots c_k x_k \ge b with c_1 x_1 + \ldots c_k x_k - s = b where s \ge 0 is a new variable.
  • Replace free variables (variables that don't have upper or lower bounds) with a - b where a \ge 0, b \ge 0 are new variables. Their difference can have any value, positive or negative.

Once this is done, we have an equivalent LP in SEF. Also, we can easily convert optimal solutions for the LP in SEF into optimal solutions for the original LP, simply by taking the values of the corresponding variables and ignoring the newly introduced variables.

25/1/17

We can actually solve LPs by hand, by building an intuition for what would maximise the objective function, and trying to satisfy the constraints with that in mind.

For example, suppose we have the LP "maximise 4x_1 + 3x_2 + 7 subject to 3x_1 + 2x_2 + x_3 = 2 and x_1 + x_2 + x_4 = 1 and x_1, x_2, x_3, x_4 \ge 0". Clearly, x_1 = 0, x_2 = 0, x_3 = 2, x_4 = 1 is a feasible solution, because if we set x_1 = x_2 = 0, the constraints become x_2 = 2, x_4 = 1. This has objective value 7. How do we find a feasible solution with a higher objective function value?

To get a better objective function value, we need to either increase x_1 or x_2, the two variables in the objective function. To keep things simple, we'll just arbitrarily choose to modify only one of the variables for now - maximize x_1 while keeping x_2 = 0.

Now the LP becomes "maximise 4x_1 + 7 subject to 3x_1 + x_3 = 2 and x_1 + x_4 = 1 and x_1, x_3, x_4 \ge 0". To maximize this, we just need to maximize x_1. Clearly, the LP is feasible whenever x_3 = 2 - 3x_1 and x_4 = 1 - x_1 - we have a feasible solution for every possible value of x_1 where all the non-negativity constraints are satisfied.

Since x_3 \ge 0 and x_3 = 2 - 3x_1, 2 - 3x_1 \ge 0, so x_1 \le \frac 2 3. Since x_4 \ge 0 and x_4 = 1 - x_1, 1 - x_1 \ge 0, so x_1 \le 1. Therefore, since we also know that x_1 \ge 0, 0 \le x_1 \le \frac 2 3. Since we're trying to find the largest value of x_1 that still satisfies the constraints, x_1 = \frac 2 3. This gives us an objective function value of \frac 8 3 + 7 with the solution x_1 = \frac 2 3, x_2 = 0, x_3 = 0, x_4 = \frac 1 3, which is better than our original solution.

Turns out, however, this solution is still not optimal. Additionally, we can't increase x_2 without decreasing x_1 at this point, so we can't just repeat the trick with x_2 this time instead of x_1. However, it would be possible to use the "maximize one variable while keeping others in the objective function the same" trick again if we rewrote the LP in what's known as canonical form. We will look at how to do this later.

The basic idea behind the simplex algorithm:

  1. Find a feasible solution \vec x. If not found, give a proof of infeasibility and stop.
  2. Rewrite the LP in canonical form.
  3. If \vec x is optimal, give a proof of optimality and stop.
  4. If \vec x is unbounded, give a proof of unboundedness and stop.
  5. Find a feasible solution with a higher objective function value than \vec x.
  6. Repeat steps 2-6 until the algorithm terminates.

The main questions are: What's canonical form and how do we rewrite LPs in that form? Why does it ensure that we can find a better feasible solution?

4/2/17

The column sub-matrix notation allows us to select a subset of columns from a matrix into a new matrix: if A = \begin{bmatrix} a_{1, 1} & \ldots & a_{1, m} \\ \vdots & \vdots & \vdots \\ a_{n, 1} & \ldots & a_{n, m} \end{bmatrix} and B = \set{b_1, \ldots, b_k}, 1 \le b_i \le m, then A_B = \begin{bmatrix} a_{1, b_1} & \ldots & a_{1, b_k} \\ \vdots & \vdots & \vdots \\ a_{n, b_1} & \ldots & a_{n, b_k} \end{bmatrix}. As a short form, we can also do A_j where 1 \le j \le m, which means "column j of the matrix A", a vector of size n.

For a vector \vec x = \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}, we also often use the subscript to denote the row sub-matrix \vec x_B = \begin{bmatrix} x_{b_1} \\ \vdots \\ x_{b_k} \end{bmatrix}. However, this isn't that common and it's generally preferred to write it out in full to avoid confusion.

Bases and basic solutions

A basis of a matrix is a subset of column indices B \subseteq \set{1, \ldots, m} such that A_B is a square matrix and A_B is non-singular. In other words, B is a set of size n, and the columns of A_B are linearly independent (so they can't be written as linear combinations of each other). Not all matrices will have a basis; for example, a 2 by 2 zero matrix (the only value of B that could result in a square A_B is \set{1, 2}, but A_B is singular).

We often denote the inverse of the basis as \overline B = \set{1, \ldots, m} - B - the column indices for columns that are not in the basis B.

From MATH136 we learned that the number of linearly independent columns in a matrix is the same as the number of linearly independent rows. This means that the number of independent columns is between 0 and the number of independent rows. Therefore, if we have a square matrix A_B, it can only possibly be a basis if all of the rows of A are independent; if there were less than n independent rows, there would also be less than n independent columns, so A_B must be singular. Therefore, a matrix has a basis if and only if all of its rows are independent, and that basis is a set of n independent columns in the matrix (which is a maximal set of independent columns in the matrix).

Suppose we have an LP in standard equality form: "maximize {\vec c}^T \vec x subject to A \vec x = \vec b". If B is a basis of A, then a variable x_i is a basic variable for B if i \in B, or a non-basic variable for B if i \notin B (i.e., i \in \overline B). Essentially, the coefficients of each non-basic variable can be written as a linear combination of the coefficients of the basic variables.

A basic solution for an LP and a basis B is a solution \vec x for the LP such that A \vec x = \vec b and all non-basic variables are set to 0. For the basic solution for a basis B = \set{b_1, \ldots, b_k}, the LP becomes "maximize {\vec c}^T \vec x subject to A_B \begin{bmatrix} x_{b_1} \\ \vdots \\ x_{b_k} \end{bmatrix} = \vec b".

Since B is a basis, A_B must be invertible, so A_B^{-1} A_B \begin{bmatrix} x_{b_1} \\ \vdots \\ x_{b_k} \end{bmatrix} = A_B^{-1} \vec b, or \vec x_B = A_B^{-1} \vec b. Interestingly, this means that a basis B has exactly one basic solution, and that solution is a vector \vec x where {\vec x}_B = A_B^{-1} \vec b and {\vec x}_{\overline B} = \vec 0.

Note that a basic solution doesn't necessarily satisfy the non-negativity constraints \vec x \ge \vec 0. A basic solution \vec x is feasible if and only if it does satisfy this constraint \vec x \ge \vec 0, because by definition it already satisfies A \vec x = \vec b. Solutions can be any combination of feasible/not-feasible and basic/not-basic.

A basic solution\vec x for an LP (without a specified basis) is a feasible solution that is a basic solution for the LP for one or more bases for that LP. So for some set of n linearly independent columns \vec x has corresponding values that result in the constraints being satisfied, and for all other columns \vec x has the element 0.

Suppose again that we have an LP in standard equality form: "maximize {\vec c}^T \vec x subject to A \vec x = \vec b". If the rows of A are dependent (at least one row can be written as a linear combination of others), then either the LP is infeasible, or A has a redundant constraint, which we can remove. Here's why: since there are dependent rows, one of the rows could be written as a linear combination of others, so we could cancel out its coefficients by adding/subtracting/scaling constraints to get the new constraint 0x_1 + \ldots + 0x_m = k. If k \ne 0, then the constraint is clearly not satisfiable, so the LP is infeasible, and if k = 0, then that constraint is redundant because it just says 0 = 0.

Canonical form

Suppose again that we have an LP in standard equality form: "maximize {\vec c}^T \vec x subject to A \vec x = \vec b". Suppose B is a basis for A.

The LP is in canonical form if and only if A_B = I (the column sub-matrix is the identity matrix) and c_{b_1} = \ldots = c_{b_k} = 0 (the objective function only contains non-basic variables).

Since A_B = I for an LP in canonical form with respect to a basis B, A_B^{-1} = I^{-1} = I, so the basic solution for B is just \vec x such that {\vec x}_B = \vec b and {\vec x}_{\overline B} = \vec 0.

Interestingly, it's always possible to write an LP into canonical form given a basis, such that the new LP has the same feasible region and feasible solutions have the same objective function value as the original LP. How do we do this?

Suppose we have the LP in SEF "maximize \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} \vec x subject to \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix} \vec x = \begin{bmatrix} 1 \\ 2 \end{bmatrix} and \vec x \ge \vec 0". One basis for this LP is B = \set{2, 3} - it's a square matrix, and its columns are linearly independent.

To convert this LP into canonical form given B, we start by making A_B into the identity matrix. To do this, we subtract the first equality constraint from the second, and then swap them to get the equality constraints \begin{bmatrix} -1 & 1 & 0 & 3 \\ 1 & 0 & 1 & -1 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}.

In general, to do this we left multiply both sides of A \vec x = \vec b by A_B^{-1} to get A_B^{-1} A \vec x = A_B^{-1} \vec b. Since A_B^{-1} A is the matrix formed by multiplying A_B^{-1} with the columns of A individually, the columns in B will end up as columns of the identity matrix, so (A_B^{-1} A)_B is indeed the identity matrix. Because A_B is non-singular, A_B^{-1} A \vec x = A_B^{-1} \vec b has the exact same solutions for \vec x as A \vec x = \vec b, too.

Now, we need to make \begin{bmatrix} c_{b_1} \\ \vdots \\ c_{b_k} \end{bmatrix} = \vec 0 - all basic variables should be removed from the objective function. To do this, we'll take a linear combination \vec y of the equality constraints, to get {\vec y}^T \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix} \vec x = {\vec y}^T \begin{bmatrix} 1 \\ 2 \end{bmatrix}, or 0 = -{\vec y}^T \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix} \vec x + {\vec y}^T \begin{bmatrix} 1 \\ 2 \end{bmatrix}.

We'll add the objective function to both sides of that, to get \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} \vec x + {\vec y}^T \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix} \vec x = \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} \vec x + {\vec y}^T \begin{bmatrix} 1 \\ 2 \end{bmatrix}. Rearranging, we get \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} \vec x = \left(\begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} - {\vec y}^T \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix}\right) \vec x + {\vec y}^T \begin{bmatrix} 1 \\ 2 \end{bmatrix}.

Clearly, the left side is the objective function, and it's equal to the right side. However, we can manipulate \vec y until the value that's being multiplied with \vec x has zero entries for our basic variables! Basically, we want to choose \vec y such that {\vec y}^T \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix}_{\set{2, 3}} = \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix}_{\set{2, 3}}. In other words, y_1 0 + y_2 1 = 0, y_1 1 + y_2 1 = 2. Solving this system, we get \vec y = \begin{bmatrix} 2 \\ 0 \end{bmatrix}.

Now \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} \vec x = \left(\begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} - {\vec y}^T \begin{bmatrix} 1 & 0 & 1 & -1 \\ 0 & 1 & 1 & 2 \end{bmatrix}\right) \vec x + {\vec y}^T \begin{bmatrix} 1 \\ 2 \end{bmatrix} becomes \begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} \vec x = \left(\begin{bmatrix} 0 & 0 & 2 & 4 \end{bmatrix} - \begin{bmatrix} 2 & 0 & 2 & -2 \end{bmatrix}\right) \vec x + 2 = \begin{bmatrix} -2 & 0 & 0 & 6 \end{bmatrix} \vec x + 2. So for feasible solutions, the objective function is exactly equal to \begin{bmatrix} -2 & 0 & 0 & 6 \end{bmatrix} \vec x + 2.

In general, we take an arbitrary linear combination of the constraints {\vec y}^T A \vec x = {\vec y} \vec b, and then add the objective function to both sides to get {\vec c}^T \vec x + {\vec y}^T A \vec x = {\vec c}^T \vec x + {\vec y} \vec b, and rearrange to get {\vec c}^T \vec x = \left({\vec c}^T - {\vec y}^T A\right) \vec x + {\vec y} \vec b.

Then, we solve the linear system {\vec c}_B - {\vec y}^T A_B = {\vec 0}^T for \vec y. We can transpose both sides to write the formula as A_B^T \vec y = {\vec c}_B, then multiply both sides by (A_B^{-1})^T to get \vec y = (A_B^{-1})^T {\vec c}_B (this is equivalent because A_B is guaranteed to be non-singular). Then, we subsitute that value of \vec y back into {\vec c}^T \vec x = \left({\vec c}^T - {\vec y}^T A\right) \vec x + {\vec y} \vec b to get another way to write the objective function for feasible values of \vec x.

So in general, given the LP in SEF "maximize {\vec c}^T \vec x subject to A \vec x = \vec b and \vec x \ge \vec 0", the corresponding LP in canonical form for a basis B can be computed as follows:

  1. Let A' = A_B^{-1} A and \vec b' = A_B^{-1} \vec b.
  2. Let \vec c' = \left({\vec c}^T - \vec y^T A\right) \vec x and z' = \vec y^T \vec b where \vec y = (A_B^{-1})^T {\vec c}_B.
  3. The new LP is "maximize {\vec c'}^T \vec x + z' subject to A' \vec x = \vec b' and \vec x \ge \vec 0".

Aside: sometimes in this course we'll use the notation A^{-T}, it represents inverse of transpose ((M^T)^{-1}) or transpose of inverse ((M^{-1})^T), which are equivalent to each other.

Simplex Algorithm

Suppose we have an LP "maximize {\vec c}^T \vec x subject to A \vec x = \vec b and \vec x \ge \vec 0" in canonical form for a basis B, and a basic feasible solution \vec x. The simplex algorithm requires us to find a better feasible solution from this, but how do we do that?

Essentially, to get a better feasible solution from an LP:

  1. Pick a non-basic variable x_k \in \overline B with a positive coefficient in the objective function (c_k > 0).
    • If there are no non-basic variables with a positive coefficient in the objective function, we've found the optimal solution! This is because the basic solution already has non-basic variables set to 0 (the smallest possible value). Since the LP is in canonical form, {\vec c}_B = \vec 0, and {\vec c}_{\overline B} \le 0, and \vec x \ge 0, the objective function \vec c \cdot \vec x + z = {\vec c}_{\overline B} \cdot {\vec x}_{\overline B} + z must be maximized at {\vec x}_{\overline B} = \vec 0.
  2. Find the largest value of x_k that keeps \vec x within the feasible region (i.e., satisfying \vec x \ge 0), by changing x_k and the basic variables, while keeping all the non-basic variables other than x_k at 0. Let x_k' be that value of x_k and x_{b_1}', \ldots, x_{b_k}' be the values of the basic variables that allow that feasible solution.
    • Since the columns in the basis form an identity matrix, A_B {\vec x}_B = {\vec x}_B. Since the other non-basic variables have 0 coefficients, A \vec x = \vec b is just {\vec x}_B + x_k A_k = \vec b, or {\vec x}_B = \vec b - x_k A_k.
    • So to keep \vec x feasible, we just need to find a value such that \vec x \ge \vec 0. Since \vec x was already a solution, we only need to ensure that condition holds for the variables we changed - the basic variables and x_k: {\vec x}_B \ge 0, x_k \ge 0.
    • This can be written as \vec b - x_k A_k \ge \vec 0, x_k \ge 0. If we solve for x_k, we get x_k = \min\set{\frac{b_i}{A_{i, k}} : 1 \le i \le n, A_{i, k} > 0}.
    • If the maximum is infinite, then the LP is unbounded. To prove this, we let \vec x(t) be a function that outputs solutions to the LP, where {\vec x(t)}_B = \vec b - t A_k and {\vec t}_{\overline B} = \vec 0, and show that {\vec c}^t \vec x(t) \to \infty as t \to \infty. The certificate of unboundedness would then be \tup{B, A_k}.
    • Also, the maximum is infinite exactly when A_k \le \vec 0, so if all of the coefficients for x_k are negative, the LP is unbounded. This is because if any element A_{i, k} was positive, eventually x_k A_{i, k} exceeds b_i for a finitely large values of x_k, which is an upper bound for x_k.
  3. Now we have a new feasible solution \vec x' where x_i' = \begin{cases} x_k' &\text{ if } i = k \\ x_i' &\text{ if } i \in B \\ x_i &\text{ otherwise} \end{cases}. In other words, the solution where we maximized x_k and changed the basic variables starting from \vec x. Additionally, this new feasible solution has a greater or equal objective function value.
    • Note that this new feasible solution is also a basic feasible solution!
    • The new feasible solution is basic because when we maximized x_k, it was no longer 0, so it becomes part of the basis for the new feasible solution. At the same time, when we maximize x_k, at least one of the basic variables now has value 0 (if the basic variables ended up all positive, then x_k could have been increased more by decreasing them, so x_k wouldn't have been maximized - a contradiction!) - one of the basic variables that are now 0 are not part of the new feasible solution's basis.
    • Here's why the new feasible solution's basis is in fact a basis: first, we removed one of the existing columns in favor of column k, so the number of columns hasn't changed. Second, the new column must be linearly independent of the other columns in the basis, because the basic variable that was excluded from the new basis was originally non-zero, so it was necessary to form a linear combination of columns equal to x_k's column.

So, we started with an LP in canonical form for a feasible basis, and used it to obtain a better feasible basis (or a proof of unboundedness). We can then rewrite the LP in canonical form again and repeat this process until we either find an optimal solution or prove that it's unbounded. This is the simplex algorithm.

To summarize the procedure of finding an improved feasible basis given an existing feasible basis for an LP in canonical form:

  1. Rewrite the LP "maximize {\vec c}^T \vec x subject to A \vec x = \vec b and \vec x \ge \vec 0" in canonical form for a feasible basis B. Clearly, canonical-form LPs can be written as "maximize {\vec c}_N^T {\vec x}_N subject to {\vec x}_B A_N {\vec x}_N = \vec b and \vec x \ge \vec 0".
    1. Let A' = A_B^{-1} A and \vec b' = A_B^{-1} \vec b.
    2. Let \vec c' = \left({\vec c}^T - A_B^{-1} {\vec c}_B A\right) \vec x and z' = A_B^{-1} {\vec c}_B \vec b.
    3. The new LP is "maximize {\vec c'}^T \vec x + z' subject to A' \vec x = \vec b' and \vec x \ge \vec 0".
  2. Compute \vec x where {\vec x}_B = {\vec b}_B and {\vec x}_{\overline B} = \vec 0 - the basic solution for our basis B, which must be feasible since B is a feasible basis.
  3. If {\vec c}_N \le 0, stop and output \vec x as the optimal solution and B as a proof of optimality. Note that the optimal solution output by this algorithm is always a basic feasible solution!
  4. Choose a k \in \overline B such that c_k > 0 - a non-basic variable with a positive coefficient in the objective function.
  5. If A_k \le 0, stop and output a certificate of unboundedness \tup{B, A_k}.
  6. Let x_k' = \min\set{\frac{b_i}{A_{i, k}} : 1 \le i \le n, A_{i, k} > 0} and {\vec x}_B' = \vec b - x_k A_k and {\vec x}_{\overline B - \set{k}}' = \vec 0.
  7. Choose one of the basic variables r \in B such that x_r' = 0 and c_r > 0. Let B' = (B \cup \set{k}) - \set{r} be the new basis, formed by removing the basic variable that was forced to 0 and adding the non-basic variable we chose with a positive coefficient in the objective function.
  8. Repeat starting from step 1 until the algorithm terminates, with B' instead of B and \vec x' instead of \vec x.

This is essentially the entire simplex algorithm - it takes in an LP in SEF and a feasible basis, and outputs an optimal solution or proof that the LP is unbounded. To actually use this to solve arbitrary LPs though, we need to find a feasible basis first. This will be the next topic.

The only thing left to do is to ensure that the simplex algorithm terminates - how do we know it won't get stuck in a cycle of bases? To do this, we can use Bland's Rule, where whenever we need to choose a variable the leftmost variable in each case (the variable x_i with the smallest possible index i). This affects which variable we choose in step 4 and 7 above.

;wip: Why does Bland's rule work? proof outlined in exercise 10, page 69 of textbook

One of the interesting results we found is that the optimal solution given by the simplex algorithm is always a basic feasible solution, and that the simplex algorithm, with Bland's rule always gives us the optimal solution if there is one. LPs can have infinite feasible solutions, but the number of bases is finite, and so there are a finite number of basic feasible solutions.

This tells us that it's actually possible to find optimal solutions LPs by just enumerating every basis, finding the basic solution for each, and choosing the one that's both feasible and has the highest objective function value. However, this would be really inefficient - the number of possible bases goes up exponentially with respect to the number of constraints. That said, the simplex algorithm can also display exponential behaviour for LPs like the Klee-Minty cube, though it's a lot more efficient on real-world LPs.

11/2/17

Finding a feasible basis for an arbitrary LP

One of the first steps before the simplex algorithm is, given an LP in SEF, to find a feasible basis, or detect that there is none. How do we find a basic feasible solution for an arbitrary LP in SEF? In other words, how do we find an \vec x \ge 0 such that A \vec x = \vec b (the objective function doesn't matter since we're just trying to find a basic feasible solution)?

Now we'll construct an auxilary problem - an LP that has a known feasible basis, whose solutions can be converted into a feasible basic solution for the original LP. We can then solve the auxilary problem using the simplex algorithm (since we have the auxilary LP and a feasible basis for that auxilary LP), and then convert the optimal solution to that auxilary LP into a basic feasible solution for our original LP.

The auxilary LP for an LP in SEF "maximize {\vec c}^T \vec x subject to A \vec x = \vec b and \vec x \ge \vec 0" (where A is an n-row-m-column matrix) can be constructed as follows:

  1. Flip the signs of every constraint that has a negative constant value (a negative value in \vec b), so that \vec b \ge 0. Let the new, sometimes-flipped constraints be given as A' \vec x = \vec b'.
    • For example, x_1 + 2x_3 - 3x_5 = -4 should become -x_1 - 2x_3 + 3x_5 = 4, while x_1 + 2x_3 - 3x_5 = 8 should stay the same.
    • This is necessary in order to ensure that the basic solution for the basis we'll choose later is feasible.
  2. Let A'' = A' \| I and \vec c' be a vector such that c_i' = \begin{cases} 0 &\text{if } 1 \le i \le m \\ 1 &\text{if } m + 1 \le i \le m + n \end{cases} (i.e., \vec c' \cdot \vec x = x_{m + 1} + \ldots + x_{m + n}).
    • A'' is basically A' with an n by n identity matrix appended on the right. In other words, for each constraint 1 \le i \le n of the form A_{i, 1}' x_1 + \ldots + A_{i, m}' x_m = b_i, we add a new variable x_{m + i} to it to get A_{i, 1}' x_1 + \ldots + A_{i, m}' x_m + x_{m + i} = b_i.
    • We can think of the newly introduced variables x_{m + 1}, \ldots, x_{m + n} as "error" variables - extra values that are needed to satisfy the constraint. When all of these are 0, we just get the original constraints back.
    • c' basically means that our objective function is the sum of those new variables.
  3. Let the auxilary LP be "minimize {\vec c'}^T \vec x subject to A'' \vec x = \vec b' and \vec x \ge \vec 0". Alternatively, to make sure the auxilary LP is in SEF, we can write it as "maximize {-\vec c'}^T \vec x subject to A'' \vec x = \vec b' and \vec x \ge \vec 0".
    • Note that this LP obviously has the basis B = \set{m + 1, \ldots, m + n} (i.e., the basis formed by the new variables introduced for each constraint). This is a basis because there are n such new variables, and because the entries form an identity matrix in A'', which means A_B'' = I, which by definition has linearly independent columns.
    • The corresponding basic solution is clearly \vec x such that {\vec x}_B = \vec b' and {\vec x}_{\overline B} = \vec 0. This is always feasible because \vec b' \ge 0 and it always satisfies A'' \vec x = \vec b', since that's equivalent to A_B'' {\vec x}_B = I {\vec x}_B = {\vec x}_B = \vec b.

Now we have the auxilary LP and a feasible basis for the auxilary LP B = \set{m + 1, \ldots, m + n}, so we can find an optimal solution using the simplex algorithm. Clearly, this LP is bounded, because it's a minimization problem where the objective function has all positive coefficients. Therefore, since we know that the LP isn't infeasible (since we have a feasible solution already), and the LP isn't unbounded, it must have an optimal solution!

The optimal solution to the auxilary LP gives us a feasible solution to the original LP, or tells us the original LP is infeasible. If any of the error variables x_{m + 1}, \ldots, x_{m + n} are non-zero, that means the original LP is infeasible, since if the original LP was feasible, it could've satisfied the constraints with all of the error variables set to 0, which would result in a smaller objective function value than we got.

Likewise, if all of the error variables are zero, then we satisfied the constraints without using any of the error variables, so \begin{bmatrix} x_1 \\ \vdots \\ x_m \end{bmatrix} is a feasible solution for the original LP. Additionally, since it's the output of the simplex algorithm, it's also a feasible basic solution, so we can construct a feasible basis from \set{i : x_i > 0, 1 \le i \le n}. This isn't guaranteed to give us n columns, but we can add in other column indices from the original LP until we do get a basis, checking them for linear independence each time.

Therefore, we can solve an arbitrary LP as follows:

  1. Convert the LP into SEF, and solve its auxilary LP using the simplex algorithm to find a feasible basic solution and a feasible basis for the LP.
  2. Use the simplex algorithm to solve the original LP with the feasible basis found in step 1.

This is called the two-phase algorithm.

Now we can actually prove the fundamental theorem of LPs using the simplex algorithm! Since we proved without using that theorem that the simplex algorithm works, and the simplex algorithm outputs either an optimal solution, proof of unboundedness, or proof of infeasibility, we've proved that an arbitrary LP must fall into one of those three categories.

Half-spaces and convexity

In the geometric interpretation of LPs, the feasible region is the space of all possible solutions, \mb{R}^m where m is the number of variables in the LP.

A polyhedronP is a set of vectors that can be defined by P = \set{\vec x : A \vec x \le \vec b}, where A is a matrix and \vec b is a vector. Think of it as a solid n-dimensional shape - a bounded area of the n-dimensional space.

The feasible region of an LP is always a polyhedron. Given an LP "maximize {\vec c}^T \vec x subject to A \vec x = \vec b and \vec x \ge \vec 0", we can write its feasible region as \set{\vec x: \begin{bmatrix} A \\ -A \\ -I \end{bmatrix} \le \begin{bmatrix} \vec b \\ -\vec b \\ \vec 0 \end{bmatrix}}. Basically, this is just saying A \vec x \le \vec b, A \vec x \ge \vec b, and \vec x \ge 0.

A hyperplane is a set of vectors that can be defined by H = \set{\vec x : \vec a \cdot \vec x = \beta}, where \vec a is a non-zero vector and \beta \in \mb{R}. Think of it as an n-dimensional plane - an n - 1-dimensional space within the n-dimensional space. This is by definition the solution to a linear equation.

A halfspace is a set of vectors that can be defined by F = \set{\vec x : \vec a \cdot \vec x \le \beta}, much like a hyperplane. Think of it as the space on one side of a hyperplane, or half of the n-dimensional space. This is by definition the solution to a linear inequality.

A polyhedron can be thought of as an intersection of a finite number of halfspaces - each linear constraint can be thought of as a halfspace where points in the halfspace are those that satisfy that particular constraint, and points outside don't.

Recall that \vec x \cdot \vec y = \magn{x} \magn{y} \cos(\theta), where \theta is the minimum angle between \vec x and \vec y.

The hyperplane H = \set{\vec x : \vec a \cdot \vec x = 0} is the set of all vectors that are orthogonal to \vec a - the ones for which \theta = 90 \deg so \cos \theta = 0. In the same way, the halfspace F = \set{\vec x : \vec a \cdot \vec x \le 0} is the set of all vectors that are on the side of H that doesn't contain \vec a - the ones for which \theta \ge 90 \deg so \cos \theta \le 0.

The translate of a set of vectors S for some vector \vec p is the set \set{\vec x + \vec p : \vec x \in S}.

The hyperplane \set{\vec x : \vec a \cdot \vec x = \beta} is a translate of the hyperplane \set{\vec x : \vec a \cdot \vec x = 0}, by any vector \vec p such that \vec a \cdot \vec p = \beta.

The dimension of a hyperplane\set{\vec x : \vec a \cdot \vec x = \beta} is simply the dimension of \set{\vec x : \vec a \cdot \vec x = 0}, which is a vector space - its dimension is n - \mathrm{rank}(\vec a) = n - 1.

A line through two vectors\vec x, \vec y is defined as \set{\vec x + k(\vec y - \vec x), k \in \mb{R}}. A line segment is a subset of that, defined as \set{\vec x + k(\vec y - \vec x), 0 \le k \le 1}.

A set S \subseteq \mb{R}^n is convex if and only if, for all \vec x, \vec y \in S, the line segment between \vec x and \vec y is a subset of S (i.e., \forall x \in S, \forall y \in S, \set{\vec x + k(\vec y - \vec x), k \in \mb{R}} \subseteq S).

The union of two convex sets S_1 \cup S_2 is not necessarily convex - consider S_1 = \set{0}, S_2 = \set{1}, for example. However, the intersection of two convex sets S_1 \cap S_2 is convex. This is easily proved: any two points \vec x, \vec y in S_1 \cap S_2 is in both S_1 and S_2, by definition. Therefore, the line segment between them is a subset of both S_1 and S_2, and so the line segment is a subset of S_1 \cap S_2, as required.

All halfspaces are convex. This is also easy to prove:

Let F = \set{\vec x : \vec a \cdot \vec x \le \beta} be a halfspace.
Let \vec x_1, \vec x_2 \in F, and \vec v \in \set{\vec x_1 + k(\vec x_2 - \vec x_1), 0 \le k \le 1} be an arbitrary point on the line segment between \vec x_1 and \vec x_2.
So \vec a \cdot \vec v = \vec a \cdot \vec x_1 + \vec a \cdot k(\vec x_2 - \vec x_1) = (1 - k) \vec a \cdot \vec x_1 + k \vec a \cdot \vec x_2 for some 0 \le k \le 1.
Since \vec x_1 and \vec x_2 are in the halfspace, \vec a \cdot \vec x_2 \le \beta and \vec a \cdot \vec x_2 \le \beta \le \beta.
So \vec a \cdot \vec v = (1 - k) \vec a \cdot \vec x_1 + k \vec a \cdot \vec x_2 \le (1 - k) \beta + k \beta = \beta.
Therefore, \vec v \in F, and since \vec v is an arbitrary element of the line segment, the line segment must be a subset of F, which means that F is convex.

As mentioned earlier, a polyhedron is an intersection of a finite number of halfspaces. Since halfspaces are convex, and intersections of convex sets are convex, all polyhedra are convex. Since the feasible region of an LP is a polyhedron, it must also be convex - this is actually why solving LPs is relatively easy!

Extreme points

A point \vec v is properly contained in a line segment L = \set{\vec x + k(\vec y - \vec x), 0 \le k \le 1} if and only if \vec v \in L - \set{\vec x_1, \vec x_2} - when it's part of the line segment but isn't an endpoint.

Suppose we have a convex set S. A point \vec v \in S is an extreme point if and only if there is no line segment L \subseteq S that properly contains \vec v - if the point has to be one of the endpoints of any line segment in S that contains it.

Intuitively, an extreme point is one that's on a "curve" on the surface of the set. For a convex set that looks like a 2D square, the extreme points at the four corners. For a circle, every point is an extreme point - a convex set doesn't necessarily have a finite number of extreme points! A halfspace actually has no extreme points.

Suppose we have an LP with the constraints x_1 \le 3, x_2 \le 2 and x_1, x_2 \ge 0. The solution space is actually 2D for this LP, so each constraint forms a 2D halfspace (a hemiplane). If we plot the feasible region of this LP, we actually get a 3 by 2 square where the bottom left corner is the origin. Here, the extreme points are \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 3, 0 \end{bmatrix}, \begin{bmatrix} 3 \\ 2 \end{bmatrix}, \begin{bmatrix} 2 \\ 0 \end{bmatrix}.

What's interesting is that all of these points, and only these points, satisfy two different constraints with equality at the same time. For example, for \begin{bmatrix} 3 \\ 2 \end{bmatrix}, the constraints x_1 \le 2 and x_2 \le 2 are satisfied with equality, so x_1 = 2 and x_2 = 2. In fact, this generalizes into multiple variables.

For an LP, given a constraint \vec p \cdot \vec x \le k and a solution \vec x, the constraint is tight if and only if it is satisfied with equality by \vec x - when \vec p \cdot x = k.

For an LP with constraints A \vec x \le \vec b, the tight constraints for a solution \vec x are denoted \overline A \vec x \le \overline{\vec b}, where \overline A and \overline{\vec b} are the rows of A and the entries of \vec b corresponding to constraints that are tight for \vec x.

Now we have the tools to define what an extreme point of a polyhedron is in a better way: given a point \vec x in a polyhedron \set{\vec x \in \mb{R}^m : A \vec x \le \vec b}, \vec x is an extreme point if and only if \mathrm{rank}(\overline A) = m. In other words, there's a theorem that says that an extreme point for an m-dimensional polyhedron is a point in the polyhedron that satisfies m linearly independent constraints with equality.

Now we'll prove the above:

Assume \mathrm{rank}(\overline A) = m. Suppose we have an arbitrary point \vec x in an arbitrary polyhedron P = \set{\vec x \in \mb{R}^m : A \vec x \le \vec b}.
Suppose \vec x is not an extreme point. Then \vec x is properly contained by a line segment L = \set{\vec x + k(\vec y - \vec x), 0 \le k \le 1} such that L \subseteq P, with some endpoints \vec v_1, \vec v_2. So there exists a 0 < k < 1 such that \vec x = \vec v_1 + k(\vec v_2 - \vec v_1) = k\vec v_2 + (1 - k)\vec v_1.
Clearly, the tight constraints \overline{\vec b} = \overline A \vec x = \overline A (k\vec v_2 + (1 - k)\vec v_1) = k \overline A \vec v_2 + (1 - k) \overline A \vec v_1.
Since L \subseteq P, \vec v_1 \in P and \vec v_2 \in P, \overline A \vec v_2 \le \overline{\vec b} and \overline A \vec v_1 \le \overline{\vec b}.
For any vectors \vec a, \vec b, \vec b, if \vec a = k \vec b + (1 - k) \vec c and \vec b \le \vec a and \vec c \le \vec a, then it must be that \vec a = \vec b = \vec c (if the weighted average of two numbers is greater or equal to both numbers, it must be equal to those numbers). This is easily proven by showing that \vec a = k\vec b + (1 - k)\vec c \le k\vec a + (1 - k)\vec a = \vec a, so k\vec b + (1 - k)\vec c = k\vec a + (1 - k)\vec a, so \vec b = \vec a and \vec c = \vec a.
Therefore, \overline{\vec b} = \overline A \vec v_1 = \overline A \vec v_2.
Since \mathrm{rank}(\overline A) = m, \overline A is invertible. Since \overline{\vec b} = \overline A \vec v_1 = \overline A \vec v_2, \vec v_1 = \vec v_2.
This is a contradiction, because then \vec x can't possibly be contained in L, if the line segment only has its endpoints. Therefore, if \mathrm{rank}(\overline A) = m, then \vec x must be an extreme point. Basically, we showed that if m independent constraints are satisfied by a point, then any line segment containing that point will only contain that point as both endpoints.
Assume \mathrm{rank}(\overline A) < m. So there must exist a vector \vec d such that \overline A \vec d = \vec 0.
Let \epsilon > 0. Let L be the line segment with endpoints \vec x - \epsilon \vec d and \vec x + \epsilon \vec d. Clearly, \vec x is properly contained by L, so we just need to show that L \subseteq P to prove that \vec x is not an extreme point.
Clearly, for the tight constraints \overline A, \overline A (\vec x - \epsilon \vec d) = \overline A \vec x - \epsilon \overline A \vec d. Since \overline A \vec d = \vec 0, \overline A \vec x - \epsilon \overline A \vec d = \overline A \vec x = \overline{\vec b}. Therefore, \vec x - \epsilon \vec d satisfies the tight constraints. The same argument shows that \vec x + \epsilon \vec d also satisfies the tight constraints.
Clearly, for any non-tight constraint \vec a \cdot \vec x \le \beta, \vec a \cdot (\vec x - \epsilon \vec d) = \vec a \cdot \vec x - \epsilon \vec a \cdot \vec d. Since \vec x \in P, we know that \vec a \cdot \vec x < \beta (the constraint is non-tight, so it isn't satisfied by equality, so we can use strict inequalities), but \vec a \cdot \vec d could be anything.
However, since \vec a \cdot \vec x < \beta, there must exist an \epsilon > 0 such that \vec a \cdot \vec x - \epsilon \vec a \cdot \vec d < \beta. Therefore, \vec x - \epsilon \vec d satisfies an arbitrary non-tight constraints, and by the same argument, so does \vec x + \epsilon \vec d.
Since all the tight constraints and arbitrary non-tight constraints are satisfied by \vec x - \epsilon \vec d and \vec x + \epsilon \vec d, so both are in P. Since P is convex and the endpoints are in P, L \subseteq P, so we have a line in P that properly contains \vec x.
Therefore, \vec x is not an extreme point if \mathrm{rank}(\overline A) < m.

Suppose we now have the feasible region of an LP P = \set{\vec x : \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & 3 \end{bmatrix} \vec x = \begin{bmatrix} 2 \\ 4 \end{bmatrix}, \vec x \ge 0}. Recall that the feasible region of an LP is always a polyhedron. How do we check if \vec v = \begin{bmatrix} 2 \\ 4 \\ 0 \end{bmatrix} is an extreme point?

First, we can rewrite P in the form \set{\vec x : A \vec x = \vec b}: \set{\vec x : \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & 3 \\ -1 & 0 & 1 \\ 0 & -1 & -3 \\ -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & -1 \end{bmatrix} \vec x = \begin{bmatrix} 2 \\ 4 \\ -2 \\ -4 \\ 0 \\ 0 \\ 0 \end{bmatrix}}.

Now we take the constraints that are satified with equality by \vec v - namely, constraints 1, 2, 3, 4, and 7. So \overline A = \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & 3 \\ -1 & 0 & 1 \\ 0 & -1 & -3 \\ 0 & 0 & -1 \end{bmatrix}. Clearly, \mathrm{rank}(\overline A) = 3, and the LP has 3 variables, so \vec v must be an extreme point.

An interesting consequence of the above theorem is that if we have the feasible region of an LP in SEF P = \set{\vec x : A \vec x = \vec b, \vec x \ge 0}, then \vec x \in P is an extreme point if and only if it's a basic feasible solution for that LP. In other words, the basic feasible solutions for an LP are the extreme points in its feasible region. This is easily proven from the previous theorem and the definition of a basic feasible solution.

Because the simplex algorithm moves between different bases until it reaches the optimal one, it's actually moving between extreme points in n-dimensional space. That means we could enumerate all the extreme points in the feasible region by enumerating all the basic feasible solutions.

1/3/17

Previously we looked at the shortest-path problem formulated as an IP - given a weighted graph G and s, t \in V(G), find the path from s to t that minimizes the total weight. Given a shortest-path problem and an s, t-path P, how do we prove if P is optimal? How do we find an optimal P, anyways?

The cardinality special case is the shortest path problem where all edge weights are 1 - we're trying to find the s, t-path P with the fewest edges. How do we show that P is an optimal solution, if it is?

Recall that any s, t-cut must contain one of the edges of P - if it didn't, then we could remove the edges in the cut and P would still connect s and t, which is a contradiction because removing the edges of an s, t-cut must disconnect s and t. As a result, if S \subseteq E(G) has an edge from every s, t-cut, then S contains an s, t-path.

If we can construct s, t-cuts \delta(U_1), \ldots, \delta(U_k) such that each one is disjoint from the other, then since an s, t

Leave a Comment

(0 Comments)

Your email address will not be published. Required fields are marked *