After some work on singular equations and quasi-Newton methods (both were a central part of my PhD thesis in 1980) I became engrossed in automatic or algorithmic differentiation (AD) shortly after my move to Argonne National Laboratory in 1987. This research was as much an effort in searching out and bringing together different communities as doing mathematical analysis. After all there is nothing much to discover about the rules of (smooth) differentiation, except that they can be applied backward in what we call the reverse mode. Bert Speelpenning had that all in his thesis finished in 1980, though we all know now, there were lots of antecedents.
The only thing I got wrong in my first paper published in the proceedings of the ISMP in Tokyo 1989 was to grossly overestimate the complexity of higher derivatives. Also, with due respect to subsequent developers, ADOL-C, as I wrote it in 1988 during a two weeks visit at ZIB in Berlin, already had the key capabilities of AD tools based on operator overloading. Of course, there was no sparsity, no cross- country elimination, no parallelism and no checkpointing. All these things are important and required a lot of imagination and hard work to implement. I am proud to say that my former students Jean Utke, Uwe Naumann and Andrea Walther have played a major role in spreading the gospel by co-authoring books and developing top notch software. My latest technical contribution to AD as such was an article with Andrea Walther and Kshitij Kulshreshtha that demonstrates the backward stability of first derivative evaluations in some appropriate sense. My recent contributions to AD was a short survey on Automatic Differentiation for the Princeton Companion to Applied Mathematics edited by Nick Higham and a survey article in Pesquisa Operacional.
Since moving to Berlin in 2003 I have worked mainly on applications of AD in various areas of nonlinear scientific computing. The main thrust was a total quasi-Newton approach to nonlinear optimization, based on the theoretical observation that Jacobians and Hessians can be orders of magnitude more expensive than their products with vectors from the left or the right. However, at least for the problems in the popular test set CUTEr that is never true as their complete full Jacobians are never more than 20 times as expensive to evaluate as the underlying vector function itself. Another effort that has soft-stalled is the analysis and numerical solution of DAEs on the basis of the computational graph of the defining equation. This generalized structural analysis continues to be viewed as mathematically unsound by the DAE mainstream. Nevertheless there is hope in that the recent MANDEA proposal for a ETN in which my current university Yachay is a partner will have a broader look at structural analysis.
Since 2006 I have worked extensively with the groups of my former associate Nicholas Gauger, now at Aachen, and my colleague Thomas Slawig on general schemes for one-shot design optimization.
Our joint project within the DFG priority program SPP 1253 has come to an end in 2013. Again the focus was on flexible and adaptive procedures that minimize the human effort needed to turn a legacy simulation code into a reasonably efficient optimization code. The results of the project have been reported and are compared to similar approaches in a survey paper with Volker Schulz and his group.
The last four years I have been involved in an extensive collaboration with physicists on the application of Quasi-Monte-Carlo methods to problems from high energy physics. This was originally a project within the Center of Computational Sciences Adlershof and then became the topic of a three year DFG grant. From the mathematical side, the project has mainly been carried by my former doctoral student Hernan Leovey, who is a scientific jack of all trades and now works in the finance industry. An out growth of this effort is a collaboration between Frances Kuo, Ian Sloan, Hernan and me regarding the handling of integrands with kinks and jumps. While there are very encouraging numerical results the first paper analyzing and supporting them is yet awaiting completion.
The new departure:
Even though supposedly being an expert on differentiation in some sense, I never seriously looked at the tremendous development in nonsmooth analysis and generalized differentiation until some eight years ago. The convex origins of much of the theory seemed too restrictive and nothing appeared to be implementable in an automated AD like fashion. With generalized differentiation rules being merely inclusions rather than strict equalities, every family of problems apparently needed to be analyzed by a skillful mathematician to make it computationally implementable. While in some schools of mechanics and economics the Clarke calculus is now well understood and occasionally applied, non-smooth analysis remains a mystery to most computational practitioners and there is no general purpose software implementation available.
I think all that can be changed, at least under the following restrictions on the problem specific functions in question:
The functions of interest are set in finite dimensions and piecewise smooth.
The selection and evaluation of the smooth pieces is integrated in an evaluation procedure of uniformly limited length, invoking a finite set of elemental functions, including the absolute value function.
While these may appear to be serious limitations, one can allow, on the other hand, that:
• Nonsmoothness may be superimposed, so that generalized derivative cancellations can occur. The last effect cannot arise in case of standard complementarity equations and other simply switched systems, where nonsmooth elemental functions occur only independently of each other. Only in such cases do I consider the popular notion, that the evaluation of ‘just one’ generalized Jacobian poses no problem at all, to be justified. In fact, similar to the approach of Paul Barton and Kamil Khan, we have found a way to constructively calculate generalized Jacobians that are ‘conically active’, a property that is slightly stronger than ‘essentially active’ in the sense of Scholtes. Wherever the function in question is Frechet differentiable, only its proper Jacobian is conically active. The generalized derivative cancellation effect mentioned above is properly treated in this approach so that for example the derivative of |x|-|x| is automatically abtained as zero.
Our construction is based on the piecewise linearization (PL) of piecewise smooth(PS) functions in an AD like fashion and can be found in “On stable piecewise linearization and generalized algorithmic differentiation.”” This piecewise linear model can be viewed as globally defined generalized Taylor expansion with an error that is of second order in the distance to the development point. Moreover, in contrast to all other generalized derivative concepts, which are at best outer semi-continuous in the sense of multifunctions, our non-homogeneous piecewise linear model varies Lipschitz-continuously with respect to the development point. This is an absolutely indispensable property of tools for the construction of stable numerical algorithms. Here is schematic rendering of a scalar valued piecewiseline function due to Torsten Bosse.
Another essential aspect is the effective representation and manipulation of piecewise linear functions. The more or less standard, explicit representation by linear pieces on polyhedral subdomains is fraught with redundancy, inconsistency, and combinatorial complexity. Here we have proposed a representation in abs-normal form, which arises naturally from the linearization of an underlying piecewise smooth evaluation procedure. The abs-normal representation can be generated by minor extensions of standard AD tools, which has already been done for ADOL-C and Tapenade.
As of now we have considered three main applications for the piecewise linearization approach.
• (i) The unconstrained optimization of a piecewise smooth f : ℝn → ℝ
• (ii) The solution of equations F(x) = 0 for piecewise smooth F: ℝn → ℝn
• (iii) The numerical integration of ODEs x’ = F(x) for piecewise smooth F: ℝn → ℝn
In the non-smooth setting, constrained optimization can always be reduced to unconstrained optimization by exact penalization using the L1 or Linfty norm. Between (i) and (ii) lies least squares fitting for F: ℝn → ℝm, where special adaptions have already been considered but not yet implemented. Similarly, a specialization of (iii) shall yield new methods for numerical quadratures on Lipschitzian integrands.
A much more significant extension will be the generalization of (ii) and (iii) to algebraic and differential inclusions, where neither the original F nor its local piecewise linearizations are required to be continuous. A symmetric scenario of an algebraic inclusion is F = ∂ f with f as in (i). Moreover, it was also found, based on the analysis of and Hirriart-Urruty and Lemarecheal, that (i) can be solved by following a true steepest descent trajectory x′ ∈∂ F.
Generally, for (iii) with F the convex, outer semi-continuous envelop of a piecewise smooth function the theory of Filipov ensures the existence of some solution. However, that is usually not unique unless some regularization is applied. Even further beyond lie optimal control and the general calculus of variation, where F is a true multi-function, i.e., set valued on open parts of its domain.
With regards to (i) we have shown with Andrea Walther and her group that PS functions can be optimized effectively by successive piecewise linearization and that there are constructive first order necessary and second order sufficiency conditions for local minimizers. It is more or less clear how the introduction of exact or approximated curvature information can be used to develop SQP like iterative solvers that achieve quadratic or at least superlinear convergence under generic nondegeneracy conditions.
With regards to (ii) we have in Solving piecewise linear systems in abs-normal form
reviewed and reformulated a large variety of PL equation solvers and their convergence conditions. The weakest condition currently needed to guarantee that some solver finds at least one solution for any right hand side is coherent orientation. Of course for small perturbations of a given image y =F(x) only the local linear pieces need to be coherently oriented, i.e. have the same determinant sign. While small perturbations of coherently oriented PL functions need no longer be coherently oriented they must still be surjective. This is the basis of our Open Newton Theorem, that establishes quadratic convergence from within some neighborhood of any root at which the piecewise linearization is coherently oriented. The applicability of this result is contingent on the ability of solving surjective but not coherently oriented PL system, which is a very difficult task.
With regards to (iii) we have shown how the theoretical loss of one convergence order by the mid point and trapezoidal rule in the PS case can be avoided by suitable generalizations based on piecewise linearization. Moreover in the transversal case, one additional order can be gained by Richardson extrapolation of the generalized versions. For both methods there is a natural quadratic spline interpolant of the solution and a related error estimator that can be used for adaptive stepsize control.
The grand finale:
As of now I would like to dedicate the rest of my professional life to a project tentatively titled ‘Nonsmooth Numerics via Piecewise Linearization’
It should take at least 3 and hopefully no more than 5 years to become self-sustaining. The components should be in reverse order of completion
(i) Provision of font-ends from various AD tools to fill PLAN-C with piecewise linear models in abs normal form. Possibly some feed-back for outer loop solvers.
(ii) Extension of the core package PLAN-C for performing basic piecewise linear algebra tasks exploiting structure, parallelism etc.
(iii) Analytical ‘downloads’ of the existing non-smooth theory of Mordukhovich and others to the piecewise smooth case in our limited sense.
(iv) Packages for solving the three main applications considered above and hopefully some of their extensions, especially to algebraic and differential inclusions.
(v) A book similar to the SIAM-piece on AD. ( I feel that was really well done and held up in theory and sales. )
(vi) Generalization to discontinuous PL functions especially in the context of dynamical systems.
There is some hope that the AD community will generalize its approach to Algorithmic Piecewise Differentiation (APD). For that purpose and also the winning over of computational scientists we need good examples and a coherent terminology of the whole approach. For that the SIAM book project would be instrumental. I hope that at Yachay we will be able to set up a software development effort to make nonsmooth problems conveniently tractable for endusers in many fields.