Published: 31st December 2019 DOI: 10.4204/EPTCS.311 ISSN: 2075-2180 |
The second ARCADE workshop took place on August 26, 2019 in Natal, Brazil, co-located with the 27th International Conference on Automated Deduction (CADE-27).
ARCADE stands for Automated Reasoning: Challenges, Applications, Directions, Exemplary achievements, and this slogan captures the aim of the workshop series: to bring together key people from various sub-communities of automated reasoning—such as SAT/SMT, resolution, tableaux, theory-specific calculi, interactive theorem proving—to discuss the present, past, and future of the field. In this abstract we report on key data related to submissions and participation, and summarize the presentation sessions and the subsequent discussions.
The second ARCADE workshop was held as a satellite event of CADE-27 as a forum for discussions and for sharing ideas about current challenges, new application areas, and future directions of automated reasoning. In the spirit of the first ARCADE held in 2017 in Gothenburg, the workshop encouraged short non-technical position statements revolving around challenges, applications, directions, and exemplary achievements in automated reasoning. The corresponding extended abstracts were summarized in short presentations at the workshop, with plenty of time for discussion after each talk.
Out of twelve submitted abstracts, the program committee selected eleven for presentation at the event. With 33 registrations and even more attendees, ARCADE was one of the largest workshops at CADE. The event was divided into three sessions comprising five topic blocks. Each session consisted of two or (in a single case) one short talks, followed by a discussion of the presented topics, driven by questions posed by the authors. The accepted abstracts and proposed questions were published on the workshop’s web page in advance of the workshop to allow attendees to consider these points beforehand.
In this report we first recapitulate the scope and aims of ARCADE and briefly recall the history of the event. Then we present a high-level summary of the talks and associated discussions, and, finally, we end this report with a conclusion.
ARCADE aims to bring together experts from various sub-communities of automated reasoning to debate the present, past, and future of the field. Consequently, the workshop focuses on lively discussion: rather than presenting concluded work, authors are invited to inspire discussion. The title of the workshop is indicative of the encouraged topics:
Challenges: What are the next grand challenges for research on automated reasoning? By this we refer to problems, solving which would imply a significant impact (e.g., shift of focus) on the CADE community and beyond.
Applications: Where is automated reasoning already successfully applied in real-world (industrial) scenarios and where could automated reasoning be relevant in the future?
Directions: Based on the grand challenges and requirements from real-world applications, what are the research directions the community should promote? What bridges between the different sub-communities of CADE need to be strengthened? What new communities should be included (if any)?
Exemplary Achievements: What are the landmark achievements of automated reasoning whose influence reached far beyond the CADE community itself? What can we learn from those successes when shaping our future research?
Despite the focus of ARCADE on debate, it is also encouraged to submit an extended version of the abstracts in post-proceedings. As citations of ARCADE 2017 contributions have shown, ARCADE papers do get referenced to motivate later work. This illustrates that ARCADE indeed has the potential to shape ideas and trigger future research, and its submissions are valuable reference points for the community.
As mentioned above, each of the five sessions consisted of one or two short talks of roughly 15 minutes, followed by a discussion of the presented topics driven by questions posed by the authors. This format stimulated lively debates.
The talks in the first session broached the issue of semantic and syntactic inference restrictions for different domains. More precisely, Christoph Weidenbach asked the question to which extent these two paradigms may be unifiable. It was remarked that in contrast to SAT, syntactic restrictions are often essential for more expressive logics; on the other hand, semantic guidance may offer significant advantages, too. Hence a unified approach bears a great potential, but a combination is clearly a non-trivial endeavour.
John Hester proposed in his talk an approach for using axiom schemata to reason in first-order set theories which are not finitely axiomatizable, such as ZFC. He posed the questions whether reasoning in NBG is more effective than ZFC, and whether axiom schemata are actually required for proofs in practice.
The second session was devoted to theorem proving approaches which exploit machine learning techniques in some sense. Claudia Schon presented a novel approach to common sense reasoning in the CoRg system. In a preprocessing phase, knowledge graphs are used as background knowledge; the background knowledge together with target formulae is then fed into a theorem prover, and afterwards a machine learning component is used to predict the relevance of different models obtained. She asked the question whether knowledge graphs can compete with knowledge bases for common sense reasoning tasks, but also raised the more general issue of deduction vs. abduction in this setting.
Sarah Winkler posed questions about the right granularity of features in machine learning for term rewriting and theorem proving, relating to work in the software verification community where hand-crafted features describing program characteristics turned out to be highly useful. In particular, she asked the question of how structural features of theorem proving problems could look like.
Automation of higher-order reasoning was the topic of the third session, where Sophie Tourret reported on the achievements and challenges in the ambitious Matryoshka project. She raised the question to which extent and on which level should theorem proving components be formally verified. A lively discussion afterwards also emphasized the importance of the question what level of higher-order reasoning is actually required in practice.
The fourth session focused on automation of ethical reasoning. Christoph Benzmüller proposed the area of machine ethics as a new application area for automated reasoning. He presented the Explicit Normative Reasoning and Machine Ethics project, which aims to develop ethical and legal governance components for intelligent autonomous systems.
In the second talk of this session, Matthew Peveler proposed modelling of ethical principles as an application area for quantified modal logic. He emphasized that logics to model ethical theories exhibit different requirements than modal logics for other purposes, and made the point that also new approaches for proof automation are required.
The fifth session circled around the challenge of making automated reasoning tools more useful in practice. Martin Riener pointed out that automated first-order theorem provers are hardly used in tool chains in the way SAT and SMT solvers are. He posed the question why this is not the case, and how ATPs would need to be improved to make them more suitable for tool chains.
Next, Martin Suda filled in for Giles Reger in making the point that tailoring AR tools to competitions like CASC does not necessarily maximize their applicability in real-world scenarios and that other measures than just the total number of solved problems from some benchmark should be taken into account. A discussion about the suitability of TPTP as reference point to judge theorem provers followed.
Further new application areas were the topic of the last session. Pedro Quaresma presented challenges for automated reasoning tools within geometry software. In particular, he proposed computer-supported reasoning in geometry as a vehicle to expose students to automated reasoning in general.
Alessandro Gianola pointed out that the verification of data-aware processes offers interesting new problems to the AR community that have hardly been tackled yet. He explained how data-aware processes already constitute an application area for experimenting with automated reasoning techniques, but also stressed that they offer genuinely new research challenges for automated reasoning.
3rd of December, 2019 | Martin Suda Sarah Winkler |
This extended abstract presents the contributions to automated reasoning made in the context of the project Matryoshka, funded for five years by the European Research Council.
Interactive theorem proving (ITP) is concerned with using proof assistants
to write computer-checked proofs of theorems, generally expressed in a
variant of higher-order logic (HOL). Proof assistants are
expressive and versatile tools, and their success stories range from
software and hardware verification in [23] to pure
mathematics [22]. Despite the higher
trustworthiness of a machine-checked proof compared with a
hand-written one, only a very small fraction of today's software and
mathematical proofs is verified in a proof assistant. One recurrent
complaint about these tools is their lack of automation, which has let
to the creation of hammers
[16,26]
that attempt to delegate the proof obligations to fully automatic
theorem proving (ATP) tools.
Although some automatic theorem provers for HOL exist [14,33], they tend to underperform on problems generated from proof assistants [34]. Thus, hammers rely mostly on automatic provers for first-order logic (FOL), such as SMT (satisfiability modulo theories) solvers [11,17] and superposition-based theorem provers [27,32]. One unpleasant property of hammers is that they obfuscate the structure of terms and formulas through the application of a sequence of encodings. This complicates the task of first-order automatic provers since it is this very structure that they exploit to find proofs efficiently.
To some extent, the ITP and ATP research communities have been working in parallel, largely ignoring each other. Hammers have contributed to bringing the two communities closer. Two years ago, a new division was created in the CASC competition, featuring benchmarks generated by Isabelle's Sledgehammer [35], reflecting the growing interest of the ATP community in ITP. Next year, IJCAR will also encompass the ITP conference.
The Matryoshka project, whose general aim is to bridge the gap between ATP and ITP by strengthening higher-order proof automation, is part of this trend. So is our colleagues' work on Leo-III [33] and Vampire [10]. Since Matryoshka's start almost two and a half years ago, progress has been achieved both in terms of improving ATP with an eye on ITP applications and of using ITP as a vehicle for ATP research, taking the form of several workshops and many publications. This year's edition of the ARCADE workshop is a good opportunity to provide to the community a comprehensive view of our contributions and to sketch general avenues for future work.
To face the challenges offered by HOL starting from FOL, we have chosen as milestones the intermediate logics λ-free HOL and Boolean-free HOL. In λ-free HOL, the only higher-order features allowed are functional variables and partial application of functions. In Boolean-free HOL, λ-expressions are additionally allowed. This fragment is encoded in λ-free HOL by relying on combinators, and λ-free HOL itself is encoded in FOL using the applicative encoding [15]. In these two intermediate logics, variables cannot be of Boolean type. This is the only difference between Boolean-free and full HOL. Both λ-free and Boolean-free HOL are interpreted using Henkin-style semantics [9].
To improve automation in proof assistants, our strategy is to develop calculi for these fragments that extend first-order calculi in a graceful way, meaning that the new calculi should be almost as efficient at first-order reasoning as their first-order counterparts while being additionally able to cope with higher-order reasoning steps in a reasonable manner. Our starting points are the superposition calculus and SMT's CDCL(T) calculus, the most successful first-order approaches for solving hammer benchmarks.
The calculus that Bentkamp, Blanchette, Cruanes, and Waldmann [8] designed for λ-free HOL resembles the first-order superposition calculus. A key ingredient is a term order for λ-free terms. We have developed two such orders that generalize the familiar lexicographic path order (Blanchette, Waldmann, and Wand [13]) and the Knuth--Bendix order with argument coefficients (Becker, Blanchette, Waldmann, and Wand [6]). Both orders are compatible with function contexts, just as their first-order counterparts. In the higher-order case, however, one would also like to have compatibility with arguments (that is, $s \succ s'$ implies $s \: t \succ s' \: t$), but this is hard to obtain. We compensate for this deficiency by restricting the superposition rule to apply at argument subterms (e.g., $\mathsf{a}$ or $\mathsf{b}$ in $\mathsf{f}\>\mathsf{a}\>\mathsf{b}$, but not $\mathsf{f}$ or $\mathsf{f}\>\mathsf{a}$). Other superpositions are made unnecessary by an additional inference rule that adds arguments to partially applied functions.
Adding support for λ-terms imposes major modifications on the calculus (Bentkamp, Blanchette, Tourret, Vukmirović, and Waldmann [7]). Unification is no longer unitary, but infinitary, which means that inference rules generate streams of conclusions for a fixed set of premises. Handling these in practice requires dovetailing. Even worse, the structure of some kinds of variable-headed terms and λ-expressions (called fluid terms) can be fundamentally altered under substitution, which breaks the connection between superpositions on clauses and on their ground instances and makes the calculus incomplete. We restore completeness with a new inference rule in which fluid contexts are encoded by a fresh higher-order variable. To reach full HOL, this calculus must be further extended to handle Boolean variables, which can be replaced by arbitrary predicates under substitution, thus breaking the clausal structure of the main formula. Boolean encodings [15] and lazy clausification [21] are promising options to face this final challenge.
A prototype implementation of the λ-free calculus, based on the superposition prover Zipperposition, was implemented by Bentkamp [8]. This prover was then extended to Boolean-free HOL by Bentkamp, Vukmirović, and Tourret with promising results. Following on the same path, Vukmirović, with some guidance from Blanchette, Cruanes, and Schulz, also extended the high-performance E prover to λ-free HOL [33]. Partly by chance, the data structures in E were quite easy to adapt. Major changes include the replacement of the simple sort system by a proper type system. λ-free HOL terms are represented using E's existing term data structures. Types are represented as recursive structures and are maximally shared in a type bank inspired by E's shared term architecture. Updated unification and matching algorithms are able to return partial matches and unifiers. All of E's indexing data structures (perfect discrimination trees [24], fingerprint indices [30], and feature vector indices [31]) have successfully been adapted to λ-free HOL. As a result, the extended E runs at essentially the same speed as the first-order E. The next planned step is the extension of E to Boolean-free HOL.
Together with Barbosa, Barrett, Reynolds, and Tinelli from the CVC4 team, the Matryoshka team, and more specifically El Ouraoui, Fontaine, and Tourret are experimenting with different approaches for extending SMT to HOL. The first, pragmatic approach, targets existing state-of-the-art SMT solvers with large code bases and complex data structures optimized for the first-order case. In this approach, the SMT solver is extended with only minimal modifications to its core data structures and algorithms. The second approach, requiring substantial modifications, entails the redesign of the essential data structures and algorithms of the solver to work directly in λ-free HOL. The approaches have been respectively implemented in the CVC4 and veriT solvers [5].
The pragmatic approach is based on the applicative encoding of HOL [15]. A preprocessing phase eliminates λ-expressions using λ-lifting: Each λ-expression is replaced by a fresh function symbol, and a quantified formula is introduced to define the symbol in terms of the original expression. Then, the applicative encoding is used lazily to cope with partial applications on the ground level. Trigger-based quantifier instantiation is adapted to handle encoded HOL terms properly. A trick that proves useful in practice is to add axioms that help the instantiation technique to go slightly beyond pure λ-free HOL. More precisely, the ``store'' axiom (stating, for any unary function $\mathsf{f}$, the existence of a function that coincides with $\mathsf{f}$ except for one domain value) allows CVC4 to prove quite a few more formulas from the TPTP library when chosen instances are added [5]. In future work, the other instantiation methods, model-based, enumeration-based, and conflict-based, will be similarly adapted.
In the redesign approach, a suboptimal but simple classical congruence closure algorithm is extended to handle ground HOL terms. This algorithm handles partial applications natively, without relying on the applicative encoding. It is, however, necessary to add axioms to support extensionality. The trigger-based quantifier instantiation method is also adapted to cope with λ-free HOL terms. This involves the extension of the indexing techniques and of the matching algorithms. Store axioms are currently not used in the redesign approach, and we believe this partly explains (together with weaker instantiation techniques in general) the gap of efficiency between veriT (implementing the redesign approach) and CVC4 (implementing the pragmatic approach). Our ambition for the near future is to adapt the theory of the Congruence Closure with Free Variables algorithm [4] to λ-free HOL. An implementation of this algorithm would directly allow us to lift conflict-based instantiation to this logic. Redesigning enumerative instantiation techniques is also on our agenda.
An important aspect of automated reasoning at the service of interactive proof is that, beyond checking the validity (or unsatisfiability) of a formula, the solver should provide some evidence for the result. We advocate proofs for SMT solvers. In particular, the veriT SMT solver generates fairly comprehensive proofs, and we are working on the proof production capabilities of the solver to ease the burden on tools that would want to replay proofs. Barbosa, Blanchette, and Fontaine [3] recently developed a method to output explicit proofs for many of the preprocessing steps that usually happen inside SMT solvers. These results have been further polished, along with the proof format, mostly by Fleury and Schurr [2,20]. In Isabelle, they achieve a reconstruction success rate of about 99%. Our goal is to have a reconstruction success rate of 100%, with a solver that behaves gracefully, i.e., that behaves as much as possible in the same way with proof production enabled as when it is disabled.
The formal verification of ATP leads to a better understanding of its core mechanisms and strengthen its theoretical results. It also facilitates incremental research by allowing researchers to experiment with existing formalisms and easily identify the consequences of their modifications. Sometimes the outcome of verification is a formal framework that can be instantiated by future applications.
Fleury, with some help from Lammich and supervision from Blanchette and Weidenbach, formalized the conflict-driven clause learning (CDCL) calculus implemented in most modern SAT solvers [12]. Compared with other SAT solver verification works, this work emphasizes the stepwise refinement methodology and the connection between calculi variants described in the literature. It also considers clause forgetting, solver restarts, and incremental solving, which had not been the focus of formalization before.
Based on the CDCL formalization, Fleury implemented an optimized verified SAT solver called IsaSAT [18,19]. IsaSAT implements the two-watched-literal scheme and other imperative data structures. From a benchmark suite consisting of the preprocessed SAT Competitions problems from 2009 to 2017, IsaSAT solves 801 problems, compared with 368 for versat (the second fastest verified SAT solver) and 1388 for MiniSAT (the baseline reference for SAT solving) [18].
Schlichtkrull, Blanchette, Traytel, and Waldmann [29] formalized in the Isabelle/HOL proof assistant [25] a first-order prover based on ordered resolution with literal selection. They followed Bachmair and Ganzinger's account [1] from Chapter 2 of the Handbook of Automated Reasoning. The formal development covers the refutational completeness of two resolution calculi for ground clauses and general infrastructure for theorem proving processes and redundancy, culminating with a completeness proof for a first-order prover, called RP, expressed as transition rules operating on triples of clause sets. This material corresponds to the first four sections of Chapter 2. Interestingly, Bachmair and Ganzinger's main completeness result does not hold as stated, due to the improper treatment of inferences involving several copies of the same premise. The formalization uncovered numerous smaller mistakes.
In subsequent work [28], Schlichtkrull, Blanchette, and Traytel specified an executable prover that implements a fixed clause selection strategy and functional data structures, embodying the abstract prover described by Bachmair and Ganzinger. The executable prover is connected to the abstract prover through a chain of refinement steps.
One of the indispensable operations of realistic saturation theorem provers is deletion of subsumed formulas. Unfortunately, the well-known equivalence of dynamic and static refutational completeness holds only for derivations where all deleted formulas are redundant, and the usual definition of redundancy does not cover subsumed formulas. The fact that the equivalence of dynamic and static refutational completeness cannot be exploited directly is one of the main reasons why Bachmair and Ganzinger's refutational completeness proof for the RP prover is rather complicated and nonmodular.
Waldmann, Tourret, Robillard, and Blanchette are currently working on a generic framework for formal refutational completeness proofs of abstract provers that implement saturation proof calculi. The framework relies on a modular extension of arbitrary redundancy criteria, which in the end permits not only to cover subsumption deletion, but to model prover architectures in such a way that the static refutational completeness of the calculus immediately implies the dynamic refutational completeness of, say, an Otter loop or Discount loop prover implementing the calculus.
We thank Daniel El Ouraoui, Mathias Fleury, Hans-Jörg Schurr, and the anonymous reviewers for suggesting textual improvements. We also thank the ARCADE organizers, reviewers, and attendees for the discussions on this topic. The project receives funding from the European Research Council under the European Union's Horizon 2020 research and innovation program (grant agreement No. 713999, Matryoshka).
Automated first-order theorem provers have matured as standalone tools to the point that they can be used within a larger infrastructure like Isabelle's Sledgehammer. Nevertheless, there is a significant difference to the spread of SAT solvers, that occur in simple applications like configuration management but are reliably used in tight loops of larger tool chains, not the least in SMT Solvers or instantiation/AVATAR based ATPs. We cannot expect a similar level of integration due to the higher expressiveness of general purpose theorem proving. Nonetheless, here we will identify some aspects that could improve the acceptance in industry.
Automated theorem provers have seen use as back-ends in larger software packages (various hammers, TLAPS) but are rarely used within a tool chain. For decidable theories, SMT solvers certainly have better properties but in this context, we focus mostly on their use as general purpose theorem provers (in other words, any SMT-LIB logic that includes uninterpreted functions and quantifier support). The abil-ity to produce models still distinguishes SMT solvers but the support of full first order logic confronts them with similar problems as other ATPs. Many of these problems are of a technical nature: the integration of a theorem prover into an industrial project brings several external requirements into play that are easily dealt with in interactive modes. The software might need to run in a certain operating system or avoid optional components under the GPL license. The availability of certain prover features also often depends on such optional components. For example, an arbitrary precision library might only provide integers where its differently licensedalternative also provides unbounded real numbers. In interactive use, we can often restate the problem to avoid real numbers altogether. A human can also investigate the reasons for time outs easier. We summarize these problems under the titles common availability, standardized interfaces, andreliability. The analysis is centred around CVC4 [1], E Prover [7], iProver [3], SPASS [8], Vampire [4], veriT [2], and Z3 [5].
Most commonly, provers target Linux as the main operating system and are available in source form to be compiled manually. Static Linux binaries are also often available for download at the project homepage, with the exception of Vampire. SPASS and E are packaged in single Linux distributions (Debian/Ubuntu and Fedora respectively) whereas CVC4 and Z3 are widely available. An exception areAlt-Ergo, Zipperposition and Beagle which are usually installed via Opam and sbt, the respective sourcebased package systems for OCaml and Scala.
Apart from Z3, CVC4 and veriT, which offer native binaries, the support for Windows is usually provided via the Cygwin compatibility layer. MacOS is usually supported via the source based package manager Homebrew. Due to Apple's strong discouragement of static linking, this is normally the only way of distribution.
From an industrial perspective, external factors often determine the operating system or way of distribution required. When multiple provers are required, the most common denominator is often only the compilation from source. The introduction of automated builds and testing environments for multiple operating systems is a significant task that is hard to publish on. Although rare, there are research engineer grants that could be used to hire an expert that does this maintenance. Since most provers are underan open source license, one time engineering tasks like the adaptation to a yet unsupported operating system could be announced as a Google Code project.
The most common way of communication with a theorem prover is via plain text. This has the advantages that the input language remains stable and the prover can be easily exchanged for an alternative. The two main input formats, SMT-LIB and TPTP, have been converging with regard to the supported logics and theories: for example, they both define integer and real arithmetic and higher-order logic. There are differences though: only SMT-LIB defines special theories like bit-vectors, data-types or arrays, even though some provers accept them with a custom syntax whereas only TPTP supports polymorphictypes and modal operators. The choice of the language therefore restricts which features are efficiently supported.
A further restriction is the fact that no prover supports all features of either SMT-LIB or TPTP. This ties the tool chain even tighter to a particular prover, making it hard to use different encodings of the same problem. On the other hand, the close integration can not be used to share data-structures or avoid repeated parsing of input.
Another distinction regards the output formats. SMT-LIB does not specify a proof format but it has a definition of models in terms of a Herbrand universe extended with abstract values. These abstract values are solely defined by the solver and again tie the tool chain tightly to a particular prover. TPTP's proof format TSTP on the other hand specifies the proof DAG with the single semantic restriction that a conclusion logically follows from its premises. More detailed proofs can be constructed but require additional reasoning. If the proof replay is not tailored to a prover there is also a reasonable chance that it fails. TSTP only specifies a concrete format for finite models. In both cases, an industrial user needs to rely on prover specific behaviour that is often poorly documented.
Some of these issues can be mitigated with an intermediate layer such as pySMT. As an alternative, one can use an API, should it exist and when it has the correct language bindings. The API might change more frequently because it is tied more closely to the implementation.
In each case, only the ATP community can improve the situation. The efforts of harmonizing the SMT-LIB format with TIP seem to have been successful, making the benchmarks collections of both formats available to the other community. Due to the larger syntactic differences, this is not easy to achieve between TPTP and SMT-LIB but it is certainly possible continue integrating the features of the other side. The veriT SMT solver provides an excellent proof format that could be integrated into SMT-LIB. A stand-alone tool that converts between the formats would also be of great help. This could be a stepping stone of making the whole benchmark libraries of both formats available to the other. Currently, there is an overlap due to manual conversion efforts (for instance, the AUFNIRA/nasa is an adaption of the original statements using the array sort instead of an axiomatization). An full problem library that can state the problem in the logic and language required would make such a manual conversion effort unnecessary.
The most dreaded behaviour of a theorem prover which has not been mentioned so far is when it reports "unknown", either due to a timeout or when it rejects the problem as outside of its capabilities.^{1}
This can be due to an unfavourable encoding like expressing equality as a reflexive, symmetric and transitive relation. Other cases are not that obvious: arrays with Boolean values can replace a bit-vector but it depends on the decision procedures that are actually implemented which determine if the encoding is suitable to a problem. Decisions of the latter kind could be integrated into a framework like pySMT (also to compensate for missing theories) but in general this can only be solved by the creator of the tool chain.
Another way to obtain "unknown" is by enabling incomplete proof search strategies, be it particular instantiation techniques or axiom selection. In general, the number of options for theorem provers is immense and their interaction is hard to predict. Unfortunately, this situation is hard to improve. ATP developers can evaluate the effectiveness of a large number of combinations to find those that work well or eliminate some those that don't have any effect. How well this works depends strongly on how representative the benchmarks are for the application. Specialized tool chain developers might be able tune the options themselves but this can not expected from the average user.
A third possibility for a timeout is lack of knowledge about the (possible) model. A superposition based prover will not saturate the clause set generated by $\forall x : Int, y : Int . x < y$ and continue until it times out. Still, some learned similarity measures could select sufficiently distinct generated clauses that could be of interest for choosing a better suited set of options. A similar phenomenon occurs when SMT solvers run in a refinement loop. The clause above almost immediately produces a model $x = 0, y = 1$. A refined model would exclude these model values, obtaining a new model $x = -1, y = 2$ and so on. A human quickly recognizes that the problem has both infinitely many models and counter-models. Part of this information is also generated when the decision procedure -- here probably the Simplex algorithm -- is run and we should know more about the model theory of that decision procedure. How this knowledge can be preserved in the face of theory combination is not obvious. When we consider the clause $f(x) < f(y)$, whatever model we pick for $f$ overshadows any monotonicity intuitions we had for the simpler case. Nevertheless, it could be beneficial to extract more information from a stopped proof search and feed this back into the refinement loop.
We have focused on topics that could be considered as obstacles for industrial integration: possible runtime environment restrictions, the need to ship with predetermined parameter schedules or the general hardships of undecidable problems. Nevertheless, it is worth highlighting the impressive advancements that have already been made in that regard. Modern theorem provers come with an extensive manual that includes their inference rules. The level of detail in proofs has become significantly more detailed. Finally, there have already been successful integrations of general first-order provers in widely used software, ranging from backends for interactive proof assistants to software synthesis and test case generation. However we hope that future development can close the gap between the research and its application in industry.
^{1} For example, both CVC4 and Z3 report unknown when trying to find a model of $\exists a : {Array(Int,Int)}, b : Array(Int,Int). a \neq b$ but they can both construct a model for the inequality of constants $c \neq d$, which is how existential quantification is defined.
We propose building a community of
intelligent geometryresearchers, manifested by the creation of a living Intelligent Geometry Book, to introduce many more people to computer-supported reasoning.
On the one hand, participants at Big Proof 2019
workshop commented that students (mostly mathematics, but also computing) were
not being introduced to formal methods and computer-supported
reasoning at an early enough stage. On the other hand, for hundreds if
not thousands of years, geometry has been the tool of choice
for introducing formal reasoning. Hence we propose that
computer-supported reasoning in geometry is an ideal vehicle for
exposing many more people to computer-supported reasoning than is
currently the case.
The need for formal reasoning is not just a niche area in mathematics. The whole area of verified software relies on formal reasoning. Everyone who flies in or out of the UK has placed their lives in the hands of the air traffic controllers, supported by a computer system which has been formally verified, and the same is true of Line 14 (driverless) of the Paris Métro, or many railway systems. But the firms that develop such systems have an uphill struggle finding staff. They report that practically no recruit at the graduate level has been exposed to computer-supported reasoning.
Reuse of previous results is vital across science, but nowhere more
than in mathematics. However, in practice re-use of
implementations of ideas has proved much harder than re-use of
abstract ideas. The same algorithm can be implemented many
times. Some times this re-implementation is due to real improvements (as
we may provide a shorter or clearer proof of a theorem), but often it
is due to engineering
mismatches: different programming languages,
data formats etc.
Therefore this project aims to build a community of researchers, the Intelligent Geometry Community, experts in the representation and management of geometric knowledge, and knowledge-based intelligent software for exploratory and educational purposes. By pooling the experiences of the participants over the fundamental issues to be addressed by any project in this area and making results open source/open access, the network will lay the ground for advanced research projects contributing to the European research agenda. The creation of this community will be demonstrated by the creation of a living Intelligent Geometry Book, the iGeometryBook.
Hence the main challenge is to organise and harmonise the work of the participants and to integrate different techniques and tools developed by them, to facilitate collaborations within the network.
The iGeometryBook will be an intelligent environment, collaborative, adaptive and adaptable, containing formally represented machine-checked geometric knowledge, historically analysed, with embedded tools for computation, deduction, and knowledge processing. It is envisaged that the knowledge contained in the iGeometryBook will be created by the authors, with contributions from readers, using the embedded tools according to certain pre-designed mechanisms and remotely accessible and searchable with natural and visual languages interfaces.
Such a superset of a book
, freely available in all computational
platforms, adaptable, collaborative and adaptive to each and every
user's profiles, would bring together a whole new generation of
mathematical tools with impact at all levels: exploratory research,
applications on mathematics and education.
The need of information and communications technology (ICT) in a learning setting is well recognised, as said in Mathematics Education in Europe: Common Challenges and National Policies the use of ICT for teaching mathematics is prescribed or recommended in all European countries. The recommendations range from very specific instructions to more general guidelines [1,2,3,4,5]. In particular, three of the aims in [2] are
It is our thesis that an appropriate interactive geometry tool can support, not only each aim above separately, but also the interplay of these aims.
In terms of the challenges of ARCADE, this should also respond to
How can we attract young people to our field?
.
We briefly introduce the line of research on the verification of data-aware processes, with the intention of raising more awareness of it within the automated reasoning community. On the one hand, data-aware processes constitute a concrete setting for validating and experimenting with automated reasoning techniques. On the other hand, they trigger new genuine research challenges for researchers in automated reasoning.
Contemporary organizations rely more and more on business processes to describe, analyze, and regulate their internal work. Business process management (BPM) is now a well-assessed discipline at the intersection between operations management, computer science, and IT engineering. Its grand goal is to support managers, analysts, and domain experts in the design, deployment, enactment, and continuous improvement of processes [21].
One of the essential concepts in BPM is that of a process model. A process model explicitly describes which tasks have to be performed within the organization (such as check order) in response to external events (such as receive order request), and what are the allowed courses of execution (such as deliver order can only be executed if check order has been successfully completed). Several process modeling languages have been proposed for this purpose, such as BPMN [35], UML Activity Diagrams [26], and EPCs [1]. Verification and automated reasoning techniques are in this respect instrumental to formally analyze process models and ascertain their correctness before their actual deployment into corresponding BPM systems.
Traditionally, formal analysis of process models is limited to the process control flow, represented using variants of bounded Petri nets or finite-state transition systems (depending on how concurrency is interpreted). This, however, does not reflect the intrinsic, multi-perspective nature of processes and their corresponding models. In particular, process tasks are executed by resources based on decisions that depend on background and process-related data, in turn manipulated upon task execution.
In this multi-perspective spectrum, the last two decades have seen a huge body of research dedicated to the integration of data and process management to achieve a more comprehensive understanding on how data influence behavior, and how behavior impact data [38, 20, 37].
The corresponding development of formal frameworks for the verification of data-aware processes has consequently flourished, leading to a wide plethora of formal models depending on how the data and process components, as well as their interplay, is actually represented.
One stream of research followed the artifact-centric paradigm, where the main focus is that of persistent business objects (such as orders or loans) and their lifecycle [41, 8]. Here, variants of the same model are obtained depending on how such business objects are represented. Notable examples are: (i) relational data with different kinds of constraints [18, 6,32], (ii) relational data with numerical values and arithmetics [15, 19], (iii) tree-structured data [7]. Also more minimalistic models have been brought forward, capturing data-aware processes as a persistent data storage evolved through the application of (conditional) actions that may inject external, possibly fresh values through service calls reminiscent of uninterpreted functions. Two variants of this model have been studied, the first considering persistent relational data with constraints [5, 2], the second operating over description logic knowledge bases whose extensional data are interpreted under incomplete information, and updated in the style of Levesque functional approach [28, 14].
Another stream of research followed instead the more traditional activity-centric approach, relying on Petri nets as the underlying control-flow backbone of the process. Specifically, Petri net-based models have been enriched with: (i) data items locally carried by tokens [39, 31], (ii) data registers with numerical and non-numerical values [16], (iii) tokens carrying tree-structured data [4], and/or(iv) persistent relational data manipulated with the full power of FOL/SQL [17, 34].
Last but not least, the interplay between data and processes has been studied to build solid foundations for "many-to-many" processes, that is, processes whose tasks co-evolve multiple different objects related to each other (such as e-commerce companies where each order may correspond to multiple shipped packages, and each package may contain items from different orders). Implicit (data-driven) [3] and explicit (token-driven) [22] coreference and synchronization mechanisms have been proposed for this purpose.
On top of these formal models, several verification tasks have been studied. On the one hand, they consider different types of properties, ranging from fundamental properties such as reachability, safety, soundness and liveness, to sophisticated formulae expressed in linear- and branching-time first-order temporal logics [8]. On the other hand, they place different assumptions regarding how data can be manipulated, and whether there are read-only data whose configuration is not known. The resulting verification problems are all undecidable in general, and require to properly tame the infinity arising from the presence of data.
All in all, we believe this wide spectrum of verification problems constitutes an extremely interesting application area for automated reasoning techniques. On the one hand, data-aware processes constitute a concrete setting for experimenting symbolic techniques developed within automated reasoning, so as to enable reasoning on the evolution of data without explicitly representing them. In addition, given the applied flavor of BPM, the feasibility of assumptions and conditions imposed towards guaranteeing good computational properties (such as decidability or tractability) can be assessed in the light of enduser-oriented modeling languages and their corresponding modeling methodologies. On the other hand, data-aware processes trigger new, genuine research challenges for researchers in automated reasoning, arising from the subtle, yet necessary interplay between control-flow aspects and volatile and persistent data with constraints.
To substantiate this claim, we briefly describe next one particular verification problem where automated reasoning techniques are very promising.
Artifact systems formalize data-aware processes using three main components: (i) a read-only database that stores fixed, background information; (ii) a working memory that stores the evolving state of artifacts throughout their lifecycle; (iii) actions that inspect the read-only memory and the working memory, and consequently update the working memory. Different variants of this model, obtained via a careful tuning of the relative expressive power of its three components, have been studied towards decidability of verification problems parameterized over the read-only database (see, e.g., [18, 15, 7, 19, 11, 12, 9]). These are verification problems where a property is checked for every possible configuration of the read-only database, thus guaranteeing that the overall process operates correctly no matter how the read-only data are instantiated.
In the most recent variants of this model, the read-only database is equipped with key and foreign key constraints relating the content of different relations. At the same time, the working memory is relational, with each relation representing an artifact, in principle capable of storing unboundedly many tuples denoting instances of that artifact [19, 32].
In [11], we took inspiration from this approach, studying the model of so-called relational artifact systems (RASs). Notably, we connected RASs to the well-established model of array-based systems within the SMT tradition [24]. This is done in two steps. First, the schema of a read-only database is represented in a functional, algebraic fashion, where relations and constraints are captured using multiple sorts and unary functions. Second, each artifact relation within the working memory is treated as a set of arrays, where each array accounts for one component of the corresponding artifact relation. A tuple (i.e., artifact instance) in an artifact relation is then reconstructed by accessing all such arrays with the same index.
With these notions at hand, from a logical point of view the behavior of a RAS is specified via: (i) second order variables for artifacts components; (ii) first order variables for "data", ranging both on the sorts of the read-only database and on numerical (real, integer) domains. Thus, suitable combinations of (linear) arithmetics and EUF can be employed for reasoning about RAS systems. Non-determinism in system evolution is captured via first-order parameters, that is, further existentially quantified variables occurring in transition formulae, whereas second-order variables updates are functionally determined by such non-determinism at the first-order level.
On the top of this formal model, various problems arise that can be effectively attacked using techniques and solutions within the automated reasoning community in general, and the SMT community in particular. We briefly discuss next some of them.
To sum up, we believe that by employing both well-established and relatively new techniques, the automated reasoning community is ready to face the challenges raised by the emerging area of verification of data-aware processes, providing foundational, algorithmic, and applied advancements.