Learning Communicating and Nondeterministic Automata
Carsten Kern
The results of this dissertation are two-fold. On the one hand, inductive
learning techniques are extended and two new inference algorithms for
inferring nondeterministic, and universal, respectively, finite-state
automata are presented. On the other hand, certain learning techniques are
employed and enhanced to semi-automatically infer communicating automata
(also called design models in the software development cycle). For both
topics, theoretical results on the feasibility of the approaches, as
well as an implementation are presented which, in both cases, support
our theory.
Concerning the first objective to derive a so-called active online
learning algorithm for nondeterministic finite-state automata (NFA),
we present, in analogy to Angluin's famous learning algorithm L* for
deterministic finite-state automata (DFA), a version for inferring a
certain subclass of NFA. The automata from this class are called residual
finite-state automata (RFSA). It was shown by Denis et al. that there is
an exponential gap between the size of minimal DFA and their corresponding
minimal RFSA. Even if there are also cases where the canonical (i.e.,
minimal) RFSA is exponentially larger than a corresponding minimal
NFA, we show that the new learning algorithm---called NL*---is a great
improvement compared to L* as the inferred canonical RFSA has always
at most the size of the corresponding minimal DFA but is usually even
considerably smaller and more easy to learn. Unlike a learning procedure
developed by Denis et al.---called DeLeTe2---our algorithm is capable of
deriving canonical RFSA. Like L*, the new algorithm will be applicable in
many fields including pattern recognition, computational linguistics and
biology, speech recognition, and verification. From our point of view,
NL* might especially play a major role in the area of formal verification
where the size of the models that are processed is of enormous importance
and nondeterminism not regarded an unpleasant property.
The second objective of this thesis is to create a method for inferring
distributed design models (CFMs) from a given set of requirements
specified as scenarios (message sequence charts). The main idea is to
extend the L* algorithm to cope with valid and invalid sets of system
runs and, after some iterations, come up with an intermediate design
model (a DFA) which exhibits features that make it distributable
into communicating components (or processes) interacting via FIFO
channels. Theoretical results on which classes of CFMs are learnable in
which time-complexity bounds are presented. We also developed a tool
implementation called Smyle, realizing important theoretical results
evolving from this part of the thesis. Based on this learning formalism
we also derive a software-engineering lifecycle model called the Smyle
Modeling Approach in which we embedded our learning approach.
Additionally, we launched a project for a new learning library
called libalf which includes most of the learning algorithms (and
their extensions) mentioned in this thesis. We hope that, due to its
continuously increasing functionality, libalf will find broad acceptance
among researchers, and that it will be the starting point for an extensive
project of different research groups which will employ libalf, or augment
the library with new algorithms.