@ Large Scale Integer Programming Survey
This survey explores techniques for solving very large scale integer programming problems in a distributed manner, and with dynamically changing constraint coefficients. I will focus on the initial problems of partitioning the integer program, techniques for pre-solving sub-problems, techniques best applicable to the sub-domain of programs with dynamically changing coefficient matrices (Bender decomposition). I then explore modern MILP solver software, the techniques used in each package, and the specifics of branch and bound and cutting plane implementations in these systems, and conclude with a review of the Lagrangian relaxation technique for integer programs. What will be missing from this survey is a review of the networking and distribution component for the problem that I’m discussing, this individual component would require a second survey or similar size, and depth to address adequately.
[1] Discusses solving L.Ps so large that “the A matrix cannot be stored in memory” (where A is the constraint coefficient matrix) through a modified version of the simplex algorithm, and the ways in which column generation and cutting planes can be applied to the sub-problems generated as part of this process. Similar techniques can be applied to I.Ps (the modifications necessary for this are also discussed). During the application of simplex a standard L.P represented as (min cx s.t. Ax = b, x ≥ 0) is partitioned into basic(B) and nonbasic elements(N) this changes the original formulation into (min cBxB + cNxN s.t. BxB + NxN = b, xB , xN ≥ 0) [1:p1]. A basis matrix B is then defined to be optimal if cN − cBB−1N ≥ 0 and B−1b ≥ 0. If (cN − cBB−1N) is non-negative for a given B then introducing a nonbasic element to B offers no further reduction to the current objective value, which proves the optimality of the solution. Simplex chooses an initial basis, and defines reduced costs for each non-basic variable rj = cj − cBB−1Aj. Variables are allowed to enter and leave the basis, and the reduced costs are modified to account for these changes, when it is the case that the reduced costs of all the variables in the non-basic set are greater than or equal to 0 then the current basis is optimal. Column generation, cutting plane techniques, Dantzig-Wolf decomposition and Bender’s decomposition can be used to determine which variables should enter the basis and in what order. L.Ps that are so large that the constraint matrix will not fit in memory can be partitioned using the above properties of the simplex algorithm into disparate basic sets each attempting to build up an optimal basic matrix by adding variables from their respective non-basic sets. If any one of the instances encounters a case where the reduced costs calculated for elements in its non-basic set are all non-negative, an optimal solution has been found. Column generation can be used to find non-basic elements to add to the basic set for problems that have many more variables than constraints, and cutting planes methods can be used when the reverse is true.
Column generation starts from an initial sets of variables assumed to be in the basis, and then finds variables from the non-basic set to add to the basic set by iteratively solving the relaxed LP in which only the variables in the current basis are considered (both in the objective and in the constraints). The process stops when an optimal basis (as defined above in bold) is identified. The exact nature of the sub-problems generated, the order in which variables from N are chosen, and the mechanics of variables entering (and in some cases leaving) the basis are problem dependent. For example [1:p3] presents an example where the variables with the most negative reduced costs are iteratively chosen until an optimal basis is found, there is however no set way of choosing variables that is superior to other methods for all problem cases. Sometimes problem sizes can be so large that elements may be forced to leave the basis, again there is no set method for this process that has been found to be superior in all cases.
Since cutting planes methods are best applied to problems with vastly more constraints than variables [1:p3] notes that it can be applied to the duals of problems for which column generation seems well suited. Let’s suppose that we are solving the dual of the problem introduced in the first paragraph of this section, then our basis will contain constraints rather than variables, cutting planes will find an optimal solution by gradually including constraints into the basis rather than variables. Similar to column generation cutting planes iteratively solves relaxations of the original problem defined by a given basis. However unlike column generation cutting planes will solve a sub-problem of the relaxation, then test whether the solution is feasible in the initial, un-relaxed problem. If a solution is feasible in the original problem it is guaranteed to be the optimal solution, otherwise the violated constraints are identified, and added to the basis, and the resulting sub-problem is again solved. It seems clear that a solution to a problem relaxed in this manner should be optimal if it is feasible (since the original objective function is applied even to the sub-problems). Further, if certain constraints from the original problem act to make a solution infeasible, then including those constraints into the new sub-problem ensures that progress is made towards a solution.
[1] Concludes with a discussion of alternative decompositions for problems of a specific form. In particular Danzig-Wolfe decomposition is described as method for solving problems with a block diagonal matrix structure. The solution method given for this case is basically column generation with the stipulation that the master problem be reformulated into Danzig-Wolfe form. Bender’s decomposition is mentioned as a method to solve stochastic problem with significantly more constraints than variables. [1:p6]
The main focus of this paper was the introduction of problem partitioning through the selection of basis variables or basis constraints, and the introduction of column generation, and cutting planes as methods which can be used to solve these sub-problems. Though focusing on the simplex method for solving LPs, it’s clear that both cutting planes and column generation can be used to solve IP sub-problems in the same way as presented in this paper.
[1] Mentions Bender’s decomposition briefly, while [8] gives a higher level view of this technique, and presents an example of its use in industry. [8] Describes Benders decomposition, an alternative technique to Frank-Wolf decomposition, and an approach conceptually similar to Lagrangean decomposition. Rather than partitioning the constraint set into difficult and easier constrains as Lagrangean decomposition did Benders decomposition decomposes the variables into hard and easy variable sets. The hard constraints form a master problem, and each solution of the master problem (a valid setting of the hard variables) then gets solved by a secondary problem that attempts to fix only the easy variables by solving the dual subproblem on only the easy variables. If an optimal solution is found in this second stage then this is the optimal solution for the full problem. If a setting of the first stage variables produces an infeasible or unbounded second stage subproblem, then additional constraints are added to the problem, and the first stage is repeated. If a second stage problem is unbounded, then constraints known as “Benders feasibility cuts” are added to the problem, these constraints help enforce the feasibility of the master problem. If at the second stage an optimal solution is found that defies constraints found in the master problem, those constraints that are defied are added into the relaxed second stage problem, these added constraints are called “Benders optimality cuts” as they bound optimal solutions found at a relaxed problem to a more feasible region. The remainder of [8] explores a particular use of Bender’s decomposition on an radiological problem, and is specific to that problem. I believe that the high level description of the algorithm was the most valuable information from this paper, and that a more detailed description of the algorithm is best sought in an alternative source.
Before any of the techniques described so far should be applied, it makes sense to pre-process the problem instance to make sure that all obvious simplifications of the problem have been made, [2] talks about such techniques.
[2] Discusses MILP pre-processing, the techniques it describes simplify MILP problems from their original form to a form which will produce optimal solutions to the original problem but in less time than if the original problem were solved directly. Secondly pre-solvers are able to gather meta-data regarding the problem instance that will serve to guide the solution method of the instance. For example pre-processing of a given problem instance may determine that Danzig-Wolfe decomposition may be applied to the problem instance. Basic pre-processing may be able to throw out a problem immediately by finding it obviously infeasible or unbounded. [2] Discusses basic and advanced pre-processing techniques, with a concentration on basic pre-processing, as advanced pre-processing is often simply the iterated application of basic techniques on probed instances of the original problem.
Removal:
Under certain conditions either constraints or variables my simply be removed from the problem instance. If all the variable coefficients in a constraint i are 0, and the bi entry for the minimization is non-negative, then the constraint is removed. If the same is true, but the bi entry is negative, then the entire problem is infeasible. If one of the rows in the constraint matrix is a singleton row, then that constraint may be removed, and the domain of the variable in question may be constricted.
If the constraint coefficient matrix has a 0 column then the variable represented by that column can be removed, and the respective term in the objective function can be replaced by the constant lower bound of that variable. If there is a singleton column, with the singleton value at row j, and the variable is unbounded, then both the variable, and the constraint at which that value occurs can be removed. The representative term for that variable in the objective function will then need to be replaced by solving for the value of Xj at constraint i (with respect to all the other variables), and substituting then that value into the Xj term of the objective function. Variable Xj is the that variable with the singleton column, and constraint i is the constraint at which this singleton value occurs.
Rearrangement:
Rearrangement is mentioned fleetingly, and the main idea is to rearrange variable (and hence column) order, and constraint (and hence row) order of the constraint coefficient matrix. The reason to do this is that certain solution methods will benefit from a certain ordering of the variables/constraints. For example if variables are fixed as in branch and bound, it may be beneficial to keep all binary variables adjacent, and at the beginning or end of the coefficient matrix, so that their elimination from the formulation is faster. Rearrangement is highly dependent on the solution technique used to solve the instance.
Granularity:
Granularity calculation is a very powerful pre-processing step as it can quickly determine whether a problem is feasible or bounded in even non-obvious instances. “Granularity is defined as the minimum difference (greater than 0) between the values of activity that a constraint or objective function may have at two different feasible points.” [2:p2]. Here ‘activity’ means the scalar value of the objective function/ constraint after the variable substitutions representing the solution are made. “For a constraint that has only integer-constrained variables, then granularity is at least the greatest common divisor (GCD) of all the coefficients.” [2:p2]. If some coefficients are non integer there will always be some a 10k term where if all coefficients in the constraint or objective function were multiplied by the term., then all coefficients would become integer. The entire problem is infeasible if for any equality constraint the right hand side of the constraint is not a multiple of the GCD. Every inequality constraint can be simplified be decreasing the right hand side of the constraint the the nearest multiple of the GCD less than or equal to its present value. Furthermore, if the objective value of a candidate solution and the objective value of an optimal solution, is less than the granularity of the problem, then the candidate solution is also an optimum solution.
Constraint & Variable Duplication Elimination:
Whenever 2 rows of the constraint matrix are identical the problem can either be found to be infeasible, or one of the rows may be eliminated. The right hand sides of the two constraints indicate which of these two cases occurs, if the constraints are in conflict, then the problem is infeasible, otherwise one of the constraints must subsume the other. Similarly, if one row of the coefficient matrix is a multiple of another row this case can be reduced to the previous case with simple algebra. If two columns in the constraint matrix are found to be equivalent, or multiples of each other then the variables represented by these columns may be merged into one, and therefor one of the columns will be eliminated. To do this merging, the bounds on the domain of the new variable will have to allow all possible settings of the two merged variables, [2:p3] describes these new bounds.
Bound Improvement:
Both constraints, and variables can have their bounds tightened in this process. [2:p3-4] describes how optimal bounds for both constraints and variables in a problem may be found, and notes that in both cases the problem that must be solved to produce the optimal bounds for either constraints or variables can be as hard as the original problem itself. Finding such optimal bounds is therefore reserved for special problem formulations in which it is easy to detect that these alternative problems will be significantly easier to solve than the original. There is a simple alternative method to calculating tighter bounds that though not optimal, can be calculated in time proportional to the size of the constraint matrix. For a given constraint i calculate a lower bound as follows: Take the sum of the product of all positive coefficients aij and their respective lower bounds lj add that to the sum of the product of all negative coefficients aij and their respective upper bounds uj. Calculate an upper bound as follows: Take the sum of the product of all positive coefficients aij and their respective upper bounds uj add that to the sum of the product of all negative coefficients aij and their respective lower bounds lj. Call these new bounds Li and Ui. Li simply calculates the lowest possible value of the left hand side of a constraint i by setting all variables such that the term in which they in the objective function takes on as low a value as possible. Similarly Ui finds the highest possible value of the left hand side. If bi < Li for any constraint, then the problem is infeasible. If bi = Li for some constraint, then all variables in that constraint are bound to the values which produce Li, the constraint can be remove and the variables appearing therein are bound. The calculated Li and Ui bounds for all the constraints can be used to tighten the bounds of the variables in the problem. I can summarize this process thus: for a given constraint i and a variable j, find the smallest value of j which given any setting of the other variables satisfies constraint i. Do this for all constraints i. Take the minimum of these values, and it is the new lower bound for variable j. This new lower bound may be greater than the current lower bound for the current variable, if that is so then the lower bound can be tightened to the newly calculated lower bound. If the lower bound calculated is equal to the current upper bound for the variable, then the variable is set and the column can be removed. If the lower bound calculated is greater than the current upper bound for the variable then the problem is infeasible. A similar process can be used to tighten all upper-bounds on all variables in the problem.
[2] Explores some obvious extensions to these techniques by applying each in turn to instances of the problem where variables have been fixed, for example as during a branch and bound process. A more interesting advanced pre-processing step is constraint derivation. Constraint derivation is a process similar to variable fixing, or iterative branch and bound. The process can best be described by example. If two binary variables are fixed to 1 simultaneously, and the resulting problem instance is found to be infeasible in a prepossessing step, then we have the additional constraint the at least one of these two variables must be set to 0. Following this model, we can fix certain variable and test whether that setting leads to infeasibility, if so a constraint may be derived. These logical constraints can then in turn be combined to gain even more insight into the problem feasibility space. However there can be an exponential number of derivable constraints, and the specifics of which variables to fix and in what combination is dependent on problem specifics, and what logical constraints would be most beneficial for the problem method being utilized.[2] Also cites [3] and [4] as sources for rigorous methods of implication derivation.
[2] Concludes with a discussion of problem classifications where the problems are know to have structure such that pre-processing techniques may be very gainfully applied. Ordered sets, set partitioning, set packing, knapsack problems, generalized upper bound constraints, and variables with lower and upper bounds are all categories of problems whose structure can guide pre-processing. A concluding note of the paper is that problem instances which are solved after pre-processing can be converted back to a valid solution of the original problem by reversing the stack of actions that results in the new problem formulation, and applying these transformations to the solution.
Once a problem is pre-processed, it can be solved by a MILP solver, at the heart of a MILP solver is the branch and cut algorithm that is explored in [5].
[5] Discusses modern MILP solvers the paper gives a terse description of the branch and cut algorithm central to modern solvers, it then discusses the specifics of several packages, and the notable features of each. Modern solvers use a combination of LP relaxation, branch & bound, and cutting plane algorithms to find solutions. The article overviews the branch and bound algorithm. As seen in class, branch and bound iteratively solves the LP relaxation of the problem, chooses a integer variable assigned to a real value, and generated 2 sub-problems one where the integer variable is assigned to the next largest value after the real value in the previous solution, and another where the variable is assigned to the nearest integer value less than the real value found in the LP relaxation. Branch and bound is then applied to each of the two sub-problems generated. The best feasible solution to the original problem is kept and called the incumbent. Sub-problem nodes may be fathomed if their LP relaxed objective function value is lower than that of the current incumbent, or if they are infeasible.
Because LP relaxations do “not naturally well approximate, in general the convex hull of (mixed-) integer solutions of MILPs”[5:p2] cutting plane algorithms are used to tighten each relaxation with additional linear inequalities. Cutting planes are used in the hope that less nodes will need to be ex paned as part of the branch and bound, and that nodes may be fathomed earlier due to either infeasibility or inferiority. Cutting planes iteratively solve the separation problem which is “given a solution s that is feasible in an LP relaxation of the MILP but infeasible in the original MILP find a linear inequality that is satisfied in all feasible solutions of the MILP, but not satisfied in the LP relaxation.”. The addition of such an inequality into the constraint set has the effect of tightening the original relaxation and better modeling the convex hull. Thus modern solvers apply branch and bound with LP relaxation, and constraint tightening via cutting-planes at each node, “without iteratively strengthening the approximation of the convex hull of (mixed-) integer solutions provided by LP relaxation, current ennumerative schemes would fail in solving difficult MILP instances.”. Several classes of separation algorithms are used in current solvers, the group comprising Gomory cuts, and mixed integer rounding cuts act similarly to branch and bound in identifying inequalities by exploiting the integrality of the variables of a single constraint. Clique and cover inequalities are a separate group of cutting plane techniques that can be applied. Branching strategies can be tuned by choosing a node selection algorithm (common examples are best-bound first and depth first), and choosing a variable selection order and example is the strong branching technique. Primal heuristic are an attempt to satisfice the problem by using various rounding techniques to get a feasible MILP solution from an LP relaxation. Improvement heuristics can then execute local search on the rounded solution to obtain a better solution in that neighborhood.
In many of the software packages later discussed many if not all of these variables can be set and tuned, primal heuristics, node and variable selection orders for the branch and bound execution etc can all be modified to suite the needs of a particular problem instance. A last note that [5] makes before comparing packages is that many frameworks implement a parallel version of branch and abound, with the modification that synchronization points are set up so that reproducible behavior is ensured for each execution. [5:p6-8] Contains a terse but useful description of several solver packages, below I will summarize what I believe are the most important points for the most notable packages. CPLEX- full featured suit of pre-processing techniques, cutting plane strategies, search strategies, and branch and bound heuristics. Can be paralelleized trivially, and it caches solutions close to the optimal solution found.Gurobi- a competitor to CPLEX, it implements most of the same features, in Gurobi parallel executions are always deterministic. XPRESS-MP- offers an option to search for special structures on which to branch. BLIS is built on top of the Coin-OR parallel search framework, and so can run on distributed memory platforms, this could include a general internet of commodity servers. Coin-OR uses its open source LP solver (Clp) and cuts are generated from the COIN cut library (CGL). CBC is the Coin-OR solver used for shared-memory parallelism. lp_solve is an open source solver notable for the enormous collection of interfaces through which it can be used. SCIP is notable for being the fastest non commercial MILP solver. Finally the symphony solver is notable for its high degree of customization, it’s use of COIN-OR’s CGL and clp, and has a warm start capability for branch and bound, being able to start a branch and bound process from a previously calculated branch and bound tree even after problem data has been modified.
In the branch and cut algorithm, LP relaxation is used at each node to obtain a bound at that node, an alternative relaxation technique is Lagrangian relaxation, and that technique is explored in [6] and [7].
Both [6] and [7] are overviews of Lagrangian relaxation by Marshall Fisher, and they are similar enough that I shall summarize and overview the major points of both papers in this section. Lagrangian relaxation is a systematic method for utilizing a useful feature of many MILP problems, that feature is that they “can be modeled as a relatively easy problem complicated by a set of side constraints” [7:pg2]. In Lagrangian relaxation, constraints are take from the constraint set, and through dualization, and the use of a vector of Lagrangian multipliers they are incorporated into the objective function. The reasoning is that if a problem becomes easy to solve when the more difficult constraints are removed from the constraint set, then this is a relaxation worth attempting. Lagrangian relaxation can be used directly instead of linear relaxation in a branch an bound execution, and this is indeed the main use that Fisher proposes in both [6] and [7]. The example problem used in [7] is
max(cx)
Ax <= b.
Dx <= e
x>=0 and integral.
Suppose that it is known that Ax <= b is the set of constraints that make this problem difficult to solve. Through Lagrangian relaxation we can remove these constraints from the constraint set, and place them into the objective function. Suppose again that we have a vector u of non-negative Lagrangian multipliers that penalizes variable settings where the solution is unsatisfied. The Lagrangian relaxation for this problem is then:
max( cx + u(b - Ax) )
Dx <= e
x>=0 and integral.
Note that if (b - Ax) >= 0, then the constraint that has been replaced has been satisfied, since the multipliers in u are non-negative then the vector product here will be non-negative, and the maximization becomes cx + k, where k is a positive number. This is roughly equivalent to saying that the situation where the constraint is satisfied results in the original problem with the constraint removed. If however (b - Ax) < 0 then the constraint is not satisfied, then the vector product will produce a negative term that will penilize the maximization.
The main issues to look at when attempting a Lagrangean optimization are:
1) Which constraints to relax, and in what order ?
2) Where to get a vector of good Lagrangean multipliers ?
3) How to get an initial solution for the problem ?
The answer to 1, is that constraints should be chosen the elimination of which would lead to a problem instance that fits into a known class of IP problems that are not very difficult to solve. Barring that constraints should be chosen for elimination based on how easy the resulting problem is estimate to be by some heuristic. There are a few solutions for question 2 presented in both papers, however in a nutshell it is wise to use the general purpose procedure called the “sub-gradient method” Question 3 is specific to the problem instance, and is shouldn't be too hard to come up with any feasible solution for some problem. If all variable settings for all variables other than the one being brought into the objective function are examined, and graphed against the utility function of their dual term then the upper or lower envelope (depending on if it’s a maximization or a minimization) will contain the optimal value for values in the vector u. The envelope calculated will be differentiable except at points where multiple optimal values are present. Sub-gradient decent descent (or ascends depending on the direction of the problem) the gradient of this envelope, and at points where multiple optimal values occur chooses arbitrarily.
All this can be done by iteratively applying the function uk + 1 = max{ 0, uk - tk(b-Axk)}.
uk+1 is the value of the u vector at the k+1’th iteration, xk is the x vector at the k’th iteration. tk is the k’th scalar step size. Step size is a value that “should converge to 0, but not too quickly” [7: p14], it can be optimally determined for different problem instances, but a formula that has been effectively used for many cases is:
[7:p14]
The sub-gradient method is usually terminated after a fixed number of iterations, there is no set bound for it to reach an optimal solution.
[7] States that Geoffrion showed that the bound produced by any Lagrangian relaxation will always be at least as tight as bounds produced by linear relaxation. However Lagrangian relaxation is more expensive computationally so its use needs to be more selective. “With careful choice of which constraints to dualize, Lagrangean relaxation can provide results that are significantly superior to LP-based branch and bound.” [7:p16] Which constraints are chosen for relaxation depends on the structure of the problem being solved, and the skill of the operator in finding easier sub-problems, as such it is both art and science. [7] presents a few templates of how this can be done, either by beginning with the core problem and testing the ease of the relaxations generated when different constraints are relaxed, or by starting with a known easy problem, and modifying it to look like the problem to be solved.
Both [7] and [6] discuss a branch an bound implementation which uses Lagrangean relaxation instead of linear relaxation. The implementation is basically identical to the regular implementation of branch and bound but with a different relaxation method, and as I discuss branch and bound in the context of another paper I will omit a review of this discussion. Lastly both [7] and [6] make the point that Lagrangean relaxations often result in good solutions that are almost feasible an application of an algorithm that will round variables in Lagrangean solution, and then improving such a solution via local search often results in a good feasible solution. Additionally, while both papers discuss iterated application of Lagrangean relaxation until optimality is found, both also mention that another useful application of the technique is as a “Lagrangean heuristic” where solvers stop once a set number of iterations have been hit.
.
Note: Sources with a * are core sources.
*[1] Introduction to large-scale linear programming and applications: STEPHEN J. STOYAN, MAGED M. DESSOUKY, XIAOQING WANG, Daniel J. Epstein, Wiley Encyclopedia of Operations Research and Management Science.
*[2] PRESOLVING MIXED–INTEGER LINEAR PROGRAMS: ASHUTOSH MAHAJAN, Wiley Encyclopedia of Operations Research and Management Science.
[3] Savelsbergh MWP. Preprocessing and probing techniques for mixed integer programming
problems. ORSA J Comput 1994;6:445–454.
[4] Atamturk A, Savelsbergh MWP. Conflict graphs in solving integer programming problems.
Eur J Oper Res 2000;121:40–55.
*[5] MILP SOFTWARE: JEFFREY T. LINDEROTH, ANDREA LODI DEIS, Wiley Encyclopedia of Operations Research and Management Science.
*[6] The Lagrangian Relaxation Method for Solving Integer Programming Problems: Marshall L. Fisher, MANAGEMENT SCIENCE Vol. 50, No. 12 Supplement, December 2004, pp. 1861–1871
*[7] An Applications Oriented Guide to Lagrangian Relaxation: MARSHALL L. FISHER, Interfaces, Vol. 15, No. 2 (Mar. - Apr., 1985), pp. 10-21.
*[8] Benders Decomposition Zeki Caner Taskin, Wiley Encyclopedia of Operations Research and Management Science.