Verification of Data-Aware Processes: Challenges and Opportunities for Automated Reasoning

Calvanese, Diego; Ghilardi, Silvio; Gianola, Alessandro; Montali, Marco; Rivkin, Andrey

Verification of Data-Aware Processes: Challenges and Opportunities for Automated Reasoning

Diego Calvanese
(Free University of Bozen-Bolzano)

Silvio Ghilardi
(Università degli Studi di Milano)

Alessandro Gianola
(Free University of Bozen-Bolzano)

Marco Montali
(Free University of Bozen-Bolzano)

Andrey Rivkin
(Free University of Bozen-Bolzano)

We briefly introduce the line of research on the verification of data-aware processes, with the intention of raising more awareness of it within the automated reasoning community. On the one hand, data-aware processes constitute a concrete setting for validating and experimenting with automated reasoning techniques. On the other hand, they trigger new genuine research challenges for researchers in automated reasoning.

Introduction

Contemporary organizations rely more and more on business processes to describe, analyze, and regulate their internal work. Business process management (BPM) is now a well-assessed discipline at the intersection between operations management, computer science, and IT engineering. Its grand goal is to support managers, analysts, and domain experts in the design, deployment, enactment, and continuous improvement of processes [21].

One of the essential concepts in BPM is that of a process model. A process model explicitly describes which tasks have to be performed within the organization (such as check order) in response to external events (such as receive order request), and what are the allowed courses of execution (such as deliver order can only be executed if check order has been successfully completed). Several process modeling languages have been proposed for this purpose, such as BPMN [35], UML Activity Diagrams [26], and EPCs [1]. Verification and automated reasoning techniques are in this respect instrumental to formally analyze process models and ascertain their correctness before their actual deployment into corresponding BPM systems.

Traditionally, formal analysis of process models is limited to the process control flow, represented using variants of bounded Petri nets or finite-state transition systems (depending on how concurrency is interpreted). This, however, does not reflect the intrinsic, multi-perspective nature of processes and their corresponding models. In particular, process tasks are executed by resources based on decisions that depend on background and process-related data, in turn manipulated upon task execution.

In this multi-perspective spectrum, the last two decades have seen a huge body of research dedicated to the integration of data and process management to achieve a more comprehensive understanding on how data influence behavior, and how behavior impact data [38, 20, 37].

The corresponding development of formal frameworks for the verification of data-aware processes has consequently flourished, leading to a wide plethora of formal models depending on how the data and process components, as well as their interplay, is actually represented.

One stream of research followed the artifact-centric paradigm, where the main focus is that of persistent business objects (such as orders or loans) and their lifecycle [41, 8]. Here, variants of the same model are obtained depending on how such business objects are represented. Notable examples are: (i) relational data with different kinds of constraints [18, 6,32], (ii) relational data with numerical values and arithmetics [15, 19], (iii) tree-structured data [7]. Also more minimalistic models have been brought forward, capturing data-aware processes as a persistent data storage evolved through the application of (conditional) actions that may inject external, possibly fresh values through service calls reminiscent of uninterpreted functions. Two variants of this model have been studied, the first considering persistent relational data with constraints [5, 2], the second operating over description logic knowledge bases whose extensional data are interpreted under incomplete information, and updated in the style of Levesque functional approach [28, 14].

Another stream of research followed instead the more traditional activity-centric approach, relying on Petri nets as the underlying control-flow backbone of the process. Specifically, Petri net-based models have been enriched with: (i) data items locally carried by tokens [39, 31], (ii) data registers with numerical and non-numerical values [16], (iii) tokens carrying tree-structured data [4], and/or(iv) persistent relational data manipulated with the full power of FOL/SQL [17, 34].

Last but not least, the interplay between data and processes has been studied to build solid foundations for "many-to-many" processes, that is, processes whose tasks co-evolve multiple different objects related to each other (such as e-commerce companies where each order may correspond to multiple shipped packages, and each package may contain items from different orders). Implicit (data-driven) [3] and explicit (token-driven) [22] coreference and synchronization mechanisms have been proposed for this purpose.

On top of these formal models, several verification tasks have been studied. On the one hand, they consider different types of properties, ranging from fundamental properties such as reachability, safety, soundness and liveness, to sophisticated formulae expressed in linear- and branching-time first-order temporal logics [8]. On the other hand, they place different assumptions regarding how data can be manipulated, and whether there are read-only data whose configuration is not known. The resulting verification problems are all undecidable in general, and require to properly tame the infinity arising from the presence of data.

All in all, we believe this wide spectrum of verification problems constitutes an extremely interesting application area for automated reasoning techniques. On the one hand, data-aware processes constitute a concrete setting for experimenting symbolic techniques developed within automated reasoning, so as to enable reasoning on the evolution of data without explicitly representing them. In addition, given the applied flavor of BPM, the feasibility of assumptions and conditions imposed towards guaranteeing good computational properties (such as decidability or tractability) can be assessed in the light of enduser-oriented modeling languages and their corresponding modeling methodologies. On the other hand, data-aware processes trigger new, genuine research challenges for researchers in automated reasoning, arising from the subtle, yet necessary interplay between control-flow aspects and volatile and persistent data with constraints.

To substantiate this claim, we briefly describe next one particular verification problem where automated reasoning techniques are very promising.

The Concrete Case of Relational Artifact Systems

Artifact systems formalize data-aware processes using three main components: (i) a read-only database that stores fixed, background information; (ii) a working memory that stores the evolving state of artifacts throughout their lifecycle; (iii) actions that inspect the read-only memory and the working memory, and consequently update the working memory. Different variants of this model, obtained via a careful tuning of the relative expressive power of its three components, have been studied towards decidability of verification problems parameterized over the read-only database (see, e.g., [18, 15, 7, 19, 11, 12, 9]). These are verification problems where a property is checked for every possible configuration of the read-only database, thus guaranteeing that the overall process operates correctly no matter how the read-only data are instantiated.

In the most recent variants of this model, the read-only database is equipped with key and foreign key constraints relating the content of different relations. At the same time, the working memory is relational, with each relation representing an artifact, in principle capable of storing unboundedly many tuples denoting instances of that artifact [19, 32].

In [11], we took inspiration from this approach, studying the model of so-called relational artifact systems (RASs). Notably, we connected RASs to the well-established model of array-based systems within the SMT tradition [24]. This is done in two steps. First, the schema of a read-only database is represented in a functional, algebraic fashion, where relations and constraints are captured using multiple sorts and unary functions. Second, each artifact relation within the working memory is treated as a set of arrays, where each array accounts for one component of the corresponding artifact relation. A tuple (i.e., artifact instance) in an artifact relation is then reconstructed by accessing all such arrays with the same index.

With these notions at hand, from a logical point of view the behavior of a RAS is specified via: (i) second order variables for artifacts components; (ii) first order variables for "data", ranging both on the sorts of the read-only database and on numerical (real, integer) domains. Thus, suitable combinations of (linear) arithmetics and EUF can be employed for reasoning about RAS systems. Non-determinism in system evolution is captured via first-order parameters, that is, further existentially quantified variables occurring in transition formulae, whereas second-order variables updates are functionally determined by such non-determinism at the first-order level.

On the top of this formal model, various problems arise that can be effectively attacked using techniques and solutions within the automated reasoning community in general, and the SMT community in particular. We briefly discuss next some of them.

By focusing on model checking of safety properties via symbolic backward reachability [24, 25], the main problem is that of avoiding the existential prefix to grow in an uncontrolled way. This, in turn, calls for some form of symbol elimination. This is rather easily achieved for second-order variables − at least when backward search is employed − because, as mentioned above, updates are often functional modulo first-order parameters; however, it is not clear what happens if alternative search strategies are employed, or other relevant properties are checked. At the first-order level, specific challenges instead arise already in this setting. In fact, while numerical existentially quantified variables can be eliminated via well-known methods (such as predicate abstraction [23], interpolation [33, 30], model elimination [29, 36], or even quantifier elimination), the manipulation of variables ranging over theread-only database appears to require completely different techniques, like cover computation [27]. Thanks to cover computation, one can in particular overcome the fact that quantifier elimination is not directly applicable to variables pointing to elements of the read-only database. More technically, the computation of covers is nothing but quantifier elimination in the model completions of the theory used to capture the schema and the constraints of the read-only database schema, as shown in [12]. The idea of using model completions when quantifier elimination is not directly available is presentalso in [40]. Notably, differently from quantifier elimination in linear arithmetics, cover computationin the restricted "unary" case required for RASs turns out to be tractable [27, 12].

Different types of properties are usually required to be verified in the context of data-aware processes, where safety is one of the most typical. Nevertheless, a comprehensive research dedicated to SMT-based techniques for the effective verification of data-aware processes should also consider richer forms of verification going beyond safety (e.g., liveness and fairness), and richer classes of artifact systems incorporating concrete data types and arithmetic operations that should explicitly appear in the specification language.

Database instances are typically built on top of finite structures (although they may contain elementsfrom "value" sorts ranging over infinite domains). Depending on the specific setting under investigation, this feature may require to introduce specific techniques from finite model theory. In particular, advanced applications will presumably require: from the foundational perspective, to investigate suitable versions of the finite model property; from the applied perspective, to integrate common solvers with model finders.

Interesting variants of RASs arise when the data stored therein are interpreted under incomplete information, and in the presence of complex ontological constraints expressing background, structural knowledge of the organisational domain. Transferring model checking techniques such as those re-called in point 1 above to this richer setting is not at all trivial, as reasoning must now be carried outtackling two dimensions at once: the temporal dimension along which the artifact systems evolves, and the structural dimension constraining the manipulated data objects and their mutual relationships.

Towards enabling the concrete exploitation of verification techniques, logic-based formalisms used to formalize RASs or other types of data-aware processes need to be connected to end user-oriented process modeling languages. This interconnection paves the way towards practical reasoning tasks that are relevant for end users, but have not yet addressed by the automated reasoning community. In addition, by considering specific modeling guidelines and methodologies, interesting subclasses of general formal models such as that of RASs may naturally emerge. It would be then important to assess whether such subclasses come with interesting computational guarantees for the corresponding automated reasoning tasks.

We tried to address only some of the most simple problems from the above list. In particular, we have used RASs as a basis for formalizing a data-aware extension of the de-facto process modeling standard BPMN [10], and used the resulting approach to conduct an initial benchmark using some process models from [32], with very encouraging results [13, 11, 10].

To sum up, we believe that by employing both well-established and relatively new techniques, the automated reasoning community is ready to face the challenges raised by the emerging area of verification of data-aware processes, providing foundational, algorithmic, and applied advancements.

References

W.M.P. van der Aalst (1999): Formalization and Verification of Event-driven Process Chains. Information and Software Technology 41(10), pp. 639–650, doi:10.1016/S0950-5849(99)00016-6.
P. A. Abdulla, C. Aiswarya, M. F. Atig, M. Montali & O. Rezine (2016): Recency-Bounded Verification of Dynamic Database-Driven Systems. In: Proc. PODS. ACM Press, pp. 195–210, doi:10.1145/2902251.2902300.
A. Artale, A. Kovtunova, M. Montali & W. M. P. van der Aalst (2019): Modeling and Reasoning over Declarative Data-Aware Processes with Object-Centric Behavioral Constraints. In: Proc. BPM. Springer, pp. 139–156, doi:10.1007/978-3-030-26619-6_11.
E. Badouel, L. Hélouët & C. Morvan (2016): Petri Nets with Structured Data. Fundam. Inform. 146(1), pp. 35–82, doi:10.3233/FI-2016-1375.
B. Bagheri Hariri, D. Calvanese, G. De Giacomo, A. Deutsch & M. Montali (2013): Verification of Relational Data-centric Dynamic Systems with External Services. In: Proc. PODS, pp. 163–174, doi:10.1145/2463664.2465221.
F. Belardinelli, A. Lomuscio & F. Patrizi (2012): An Abstraction Technique for the Verification of Artifact-Centric Systems. In: Proc. of KR. Available at http://www.aaai.org/ocs/index.php/KR/KR12/paper/view/4531.
M. Bojańczyk, L. Segoufin & S. Toruńczyk (2013): Verification of database-driven systems via amalgamation. In: Proc. of PODS, pp. 63–74, doi:10.1145/2463664.2465228.
D. Calvanese, G. De Giacomo & M. Montali (2013): Foundations of Data Aware Process Analysis: A Database Theory Perspective. In: Proc. PODS, pp. 1–12, doi:10.1145/2463664.2467796.
D. Calvanese, S. Ghilardi, A. Gianola, M. Montali & A. Rivkin (2018): Verification of Data-Aware Processes via Array-Based Systems (Extended Version). Technical Report arXiv:1806.11459. arXiv.org. Available at https://arxiv.org/abs/1806.11459.
D. Calvanese, S. Ghilardi, A. Gianola, M. Montali & A. Rivkin (2019): Formal Modeling and SMT-Based Parameterized Verification of Data-Aware BPMN. In: Proc. BPM. Springer, pp. 157–175, doi:10.1007/978-3-030-26619-6_12.
D. Calvanese, S. Ghilardi, A. Gianola, M. Montali & A. Rivkin (2019): From Model Completeness to Verification of Data Aware Processes. In: Description Logic, Theory Combination, and All That. Springer, pp. 212–239, doi:10.1007/978-3-030-22102-7_10.
D. Calvanese, S. Ghilardi, A. Gianola, M. Montali & A. Rivkin (2019): Model Completeness, Covers and Superposition. In: Proc. of CADE, pp. 142–160, doi:10.1007/978-3-030-29436-6_9.
D. Calvanese, S. Ghilardi, A. Gianola, M. Montali & A. Rivkin (To appear): SMT-based Verification of Data-Aware Processes: a Model-Theoretic Approach. Mathematical Structures in Computer Science.
D. Calvanese, M. Montali & A. Santoso (2015): Verification of Generalized Inconsistency-Aware Knowledge and Action Bases. In: Proc. IJCAI. AAAI Press, pp. 2847–2853. Available at http://ijcai.org/Abstract/15/403.
E. Damaggio, A. Deutsch & V. Vianu (2012): Artifact Systems with Data Dependencies and Arithmetic. ACM TODS 37(3), pp. 22:1–22:36, doi:10.1145/2338626.2338628.
M. de Leoni, P. Felli & M. Montali (2018): A Holistic Approach for Soundness Verification of Decision-Aware Process Models. In: Proc. ER, pp. 219–235, doi:10.1007/978-3-030-00847-5_17.
R. De Masellis, C. Di Francescomarino, C. Ghidini, M. Montali & S. Tessaris (2017): Add Data into Business Process Verification: Bridging the Gap between Theory and Practice. In: Proc. AAAI. AAAI Press, pp. 1091–1099. Available at http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14627.
A. Deutsch, R. Hull, F. Patrizi & V. Vianu (2009): Automatic Verification of Data-Centric Business Processes. In: Proc. of ICDT, pp. 252–267, doi:10.1145/1514894.1514924.
A. Deutsch, Y. Li & V. Vianu (2016): Verification of Hierarchical Artifact Systems. In: Proc. PODS. ACM Press, pp. 179–194, doi:10.1145/2902251.2902275.
M. Dumas (2011): On the Convergence of Data and Process Engineering. In: Proc. of ADBIS, pp. 19–26, doi:10.1007/978-3-642-23737-9_2.
M. Dumas, M. La Rosa, J. Mendling & H. A. Reijers (2013): Fundamentals of Business Process Management. Springer, doi:10.1007/978-3-642-33143-5.
D. Fahland (2019): Describing Behavior of Processes with Many-to-Many Interactions. In: Proc. of PETRI NETS, LNCS 11522. Springer, pp. 3–24, doi:10.1007/978-3-030-21571-2_1.
C. Flanagan & S. Qadeer (2002): Predicate abstraction for software verification. In: Proc. of POPL, pp. 191–202, doi:10.1145/503272.503291.
S. Ghilardi, E. Nicolini, S. Ranise & D. Zucchelli (2008): Towards SMT Model Checking of Array-Based Systems. In: Proc. of IJCAR, pp. 67–82, doi:10.1007/978-3-540-71070-7_6.
S. Ghilardi & S. Ranise (2010): Backward Reachability of Array-based Systems by SMT Solving: Termination and Invariant Synthesis. Logical Methods in Computer Science 6(4), doi:10.2168/LMCS-6(4:10)2010.
Object Management Group (2013): OMG Unified Modeling Language 2.5. Http://www.omg.com/uml/.
S. Gulwani & M. Musuvathi (2008): Cover Algorithms and Their Combination. In: Proc. of ESOP, Held as Part of ETAPS, pp. 193–207, doi:10.1007/978-3-540-78739-6_16.
B. Bagheri Hariri, D. Calvanese, G. De Giacomo, R. De Masellis, P. Felli & M. Montali (2012): Verification of Description Logic Knowledge and Action Bases. In: Proc. ECAI, pp. 103–108, doi:10.3233/978-1-61499-098-7-103.
K. Hoder & Nikolaj Bjørner (2012): Generalized Property Directed Reachability. In: Proc. of SAT, pp. 157–171, doi:10.1007/978-3-642-31612-8_13.
L. Kovács & A. Voronkov (2009): Interpolation and Symbol Elimination. In: Proc. of CADE, pp. 199–213, doi:10.1007/978-3-642-02959-2_17.
S. Lasota (2016): Decidability Border for Petri Nets with Data: WQO Dichotomy Conjecture. In: Proc. of PETRI NETS, LNCS 9698. Springer, pp. 20–36, doi:10.1007/978-3-319-39086-4_3.
Y. Li, A. Deutsch & V. Vianu (2017): VERIFAS: A Practical Verifier for Artifact Systems. PVLDB 11(3), pp. 283–296, doi:10.14778/3157794.3157798.
K.L. McMillan (2006): Lazy Abstraction with Interpolants. In: Proc. of CAV, pp. 123–136, doi:10.1007/11817963_14.
M. Montali & A. Rivkin (2017): DB-Nets: on The Marriage of Colored Petri Nets and Relational Databases. TOPNOC 12, pp. 91–118, doi:10.1007/978-3-662-55862-1_5.
OMG (2009): Business Process Model and Notation (BPMN) - Version 2.0, Beta 1.
O. Padon, N. Immerman, S. Shoham, A. Karbyshev & M. Sagiv (2016): Decidability of inferring inductive invariants. In: Proc. of POPL, pp. 217–231, doi:10.1145/2837614.2837640.
M. Reichert (2012): Process and Data: Two Sides of the Same Coin?. In: Proc. of the On the Move Confederated Int. Conf. (OTM 2012), LNCS 7565. Springer, doi:10.1007/978-3-642-33606-5_2.
C. Richardson (2010): Warning: Don't Assume Your Business Processes Use Master Data. In: Proc. of BPM, LNCS 6336. Springer, doi:10.1007/978-3-642-15618-2_3.
F. Rosa-Velardo & D. de Frutos-Escrig (2011): Decidability and complexity of Petri nets with unordered data. Theor. Comput. Sci. 412(34), pp. 4439–4451, doi:10.1016/j.tcs.2011.05.007.
V. Sofronie-Stokkermans (2016): On Interpolation and Symbol Elimination in Theory Extensions.. In: Proc. of IJCAR, Lecture Notes in Computer Science. Springer, pp. 273–289, doi:10.1007/978-3-319-40229-1_19.
V. Vianu (2009): Automatic Verification of Database-Driven Systems: a New Frontier. In: Proc. of ICDT, pp. 1–13, doi:10.1145/1514894.1514896.

In Martin Suda and Sarah Winkler: Proceedings of the Second International Workshop on Automated Reasoning: Challenges, Applications, Directions, Exemplary Achievements (ARCADE 2019), Natal, Brazil, August 26, 2019, Electronic Proceedings in Theoretical Computer Science 311, pp. 53–58.
Published: 31st December 2019.

https://dx.doi.org/10.4204/EPTCS.311.9	bibtex	PDF

References in reconstructed bibtex, XML and HTML format (approximated).

Comments and questions to:

eptcs@eptcs.org

For website issues:

webmaster@eptcs.org