Rens Bod and Remko Scha
Deriving
optimal network diagrams by means of Structural Information Theory.
Introduction.
The problem
of automatically generating graphic information presentations consists
of two parts: (1) deciding on the interpretation schema that is
to be applied to the graphics, and (2) deciding on an actual graphic
object that represents the intended information if interpreted according
to this key. In this paper we focus on the second part of this problem.
Assuming that we have decided on the interpretation that is to be
used, it is often non-trivial to design a graphical object that
yields the correct information under this interpretation, and that
also avoids unwarranted suggestions. To assess the semantic validity
and pragmatic felicity of graphical information presentations, a
system must explicitly consider the perceptual Gestalts that the
graphics evokes.
For the sake
of concreteness, this paper focusses on one information presentation
problem: the automatic design of diagrams that represent networks.
It suggests that the diagrammatic representations of a network that
may be considered semantically correct are the ones that evoke a
set of Gestalts such that each of them, if a viewer applies the
contextually assumed interpretation key, (1) can be interpreted
as denoting the intended network, and (2) cannot be interpreted
differently. (We distinguish (1) and (2) because we are not sure
that interpretation keys that are in practice used for graphical
representations guarantee exclude ambiguity.) Among these representations,
we should prefer the one that avoids misleading suggestions to the
largest possible extent; a good candidate for this optimal diagram
is the one that is perceptually simplest.
To elaborate
this idea in mathematical and computational detail, we propose to
extend a formal model of Gestalt perception that was developed by
E. Leeuwenberg. This model computes a preference ordering on alternative
perceptual groupings of an input pattern, based on the information
load of their codings. The grouping that a person actually perceives
is the one with the lowest information load.
Within the
limits of its applicability, this model enables us to predict the
Gestalt that a graphical object will evoke in the person who sees
it. At the same time, Leeuwenberg's notion of information load constitutes
a plausible formal notion of simplicity that can be used as a criterion
for distinguishing between alternative diagrammatic representations
of one network. The ultimate success of this method depends, however,
on the adequacy of the visual coding language that is assumed. Leeuwenberg's
proposals about this suffer from some obvious limitations.
Automatic
diagram design: Marks and Reiter.
We consider
the problem of automatically generating a diagram that represents
a given network -- where we assume as given a definition of the
network as a collection of nodes and edges and predicates associated
with them, and we also assume as given a convention regarding the
graphical representation of the nodes (as squares, for instance),
of the edges (as line segments, for instance), and of the predicates
(as text labels, for instance). If so much is assumed, what is left
is essentially the problem of designing a reasonable lay-out (on
the screen or the page) of a not very large collection of connected
symbols.
Marks and Reiter
(1990) have drawn attention to the complexity of this problem. They
show that the space of lay-outs to be considered is extremely large,
and that different lay-outs may have significantly different properties;
for instance, many of them may evoke misleading suggestions in the
mind of the viewer. Marks and Reiter therefore propose a set of
preference rules to differentiate among the different lay-outs that
might be considered. These rules express plausible preferences between
alternative local decisions about the layout of a diagram; they
aim at avoiding "meaningless" perceptual groupings that
might arise because of gratuitous differentiations between the sizes,
the mutual proximities, and the mutual alignments of the graphical
symbols that constitute the diagram.
Marks and Reiter
have shown the importance of Gestalt perception phenomena for making
optimal decisions about lay-out design. But they do not attempt
to face the full extent of the Gestalt perception problem. Their
rules deal directly with the diagram elements, and don't refer explicitly
to the global Gestalt organization of the viewer's perception. In
the general case, this is not sufficient; the viewer does not assign
interpretations to just any part of the diagram, but only to the
constituents of the Gestalt which the diagram evokes. We cannot
take for granted that whatever is drawn on the screen is always
perceived as such.
Another limitation
of Marks and Reiter's system is that it is not clear how their rules
interact when there are conflicts between different preferences.
For demonstrations with a prototype system, one may sometimes be
able to choose applications which allow all preference rules to
be independent of each other; but a more generally applicable system
cannot rely on this. (Systems that try to account for Gestalt perception
by means of locally applicable preference rules always run into
this rule interaction problem. Another case in point is Lerdahl
and Jackendoff's theory about the perception of musical structure.)
In the remainder
of this paper, we sketch how a more principled method for automatic
lay-out design may be developed, based on Leeuwenberg's method of
formalizing Gestalt perception.
Structural
Information Theory.
Leeuwenberg's
(1969, 1971) coding theory starts out with a plausible intuitive
idea about Gestalt perception: when a person sees a visual pattern
as a certain Gestalt rather than another, this person has a mental
representation which (1) describes all or most of the details of
the visual pattern, and (2) assigns a particular constituent structure
to this pattern. (We may note an obvious analogy with the linguistic
notion of syntactic surface structure as applied to natural language
utterances.) To build a formal model of Gestalt perception, we must
therefore have (1) a formal way to code input patterns, and (2)
a formal way to code visual Gestalts which assign a specific constituent
structure to an input pattern. Assuming such coding systems, we
can then give a mathematical formulation to the problem of modelling
Gestalt perception: given an input code, what is the resulting Gestalt
code?
Leeuwenberg
has proposed a simple but powerful coding system, based on a view
on visual patterns that is now known as "turtle graphics".
The system he develops is limited to drawings that consist of discrete
straight line segments. This makes it possible to characterize input
patterns by means of an extremely simple system. A "primitive
code" of a two-dimensional line pattern is constructed by tracing
the contours of the pattern and successively noting the lengths
of line elements and the angles between them. Such a code thus consists
simply of a sequence: angle1 length1 angle2 length2 ... angleN
lengthN. ("angle1" specifies the angle of the
first linesegment with respect to a given axis; the other angles
are relative to the direction of the previous linesegment. Note
that this coding system does not specify one canonical coding
for a given input pattern; obviously, one pattern can be drawn in
many different orders. The system does, however, establish a well-defined
equivalence relation between alternative input codings; therefore
it is possible, in principle, to use an arbitrary input coding to
stand for this equivalence class.)
The language
that Leeuwenberg proposes for coding Gestalts is an extension of
the language that he uses to code raw input. Thus, concatenations
of pairs <angle, length> do not only serve as input codings,
but are also recognized as possible Gestalts. But, since regularities
in the input may give rise to specific perceived constituent structures,
the Gestalt coding language also provides means to encode such regularities.
For instance, out of arbitrary Gestalt codes a, b, c one
may build more complex codes by means of operations like iteration,
symmetry and alternation: Iteration(3,a) describes
the same pattern as aaa; Symmetry(ab) describes
the same pattern as abba; and Alternation (bc,a)
describes the same pattern as baca.
By employing
the higher order operations that such a Gestalt coding language
provides, an input pattern may be often be recoded in several ways.
Leeuwenberg's central assumption is, that Gestalt perception is
in fact this recoding process. His central claim about this process
is, that it results in the Gestalt that corresponds to the least
complex code -- where complexity (also called "information
load") is measured by simply counting the number of "primitive
elements" (i.e., elements from the primitive code of the input)
that still occur in the Gestalt code. For instance, if a and
b are both primitive elements, then bbaaa may be recoded
as Iteration(2,b) Iteration(3,a)
or as
b Alternation
(ba,a). The first recoding is preferred, because it contains
only 2 occurrences of the primitive elements a and b,
whereas the second recoding contains 4 such occurrences. This
predicts that the subpatterns defined by the subcodes Iteration(2,b)
and Iteration(3,a), i.e., bb and aaa,
will be perceived as constituents of the Gestalt evoked by bbaaa.
A code with
a lower structural information load then alternative codes for the
same pattern, is called a minimum code. Coding theory thus
replaces the classical principles of Gestalt perception with a single
one: the perception of a pattern will result in an interpretation
that corresponds to a minimum code; the information load of what
is actually perceived is lower than the information load of anything
that could have been perceived alternatively. This reformulation
of the old law of Prägnanz is known as the minimum principle
of coding theory.
Leeuwenberg
often talks as if one input pattern evokes exactly one minimum code.
It is attractive to generalize Leeuwenberg's approach to one which
recognizes the possibility of a set of distinct minimum codes for
one input, if the information loads of several alternative analyses
which do not subsume each other lay close together. In the sequel
we sometimes assume such a generalization.
During the
past 20 years, Leeuwenberg and his co-workers have reported on a
number of experiments that tested predictions based on the minimum
principle. They investigated the disambiguation of ambiguous patterns,
embeddedness, closure and the perception of foreground and background.
On the whole, the results of these experiments confirmed the psychological
reality of an ordering of interpretations in terms of a measure
of perceptual simplicity computed on the basis of Structural Information
Theory. We have also used the minimum principle to solve proportional
analogy puzzles (Bod et al., 1993). A statistical generalization
of the minimum principle is one of the key ideas in a new performance
model of natural language processing (Scha, 1992; Bod, 1993).
Coding theory
and the correctness of diagrams.
The semantically
correct representations of a network are the ones that evoke a set
of Gestalts such that each of them, if a viewer applies the contextually
assumed interpretation key, (1) can be interpreted as denoting the
intended network, and (2) cannot be interpreted differently. It
should be emphasized here that is the Gestalts which get interpreted,
not what is physically on the screen or on the page.
Note that this
also means that an interpretation key cannot be simply inverted
to yield a set of "expressive rules" which map a network
definition onto the diagrams that represent it; in general, such
"expressive rules" can only have a heuristic status, in
that they define a set of structures that might represent
the diagram -- candidates for the diagram generation process to
consider.
Network diagrams
generated by reasonable expressive rules may induce Gestalt perceptions
with interpretations that are not compatible with the network they
are intended to represent. For instance, a model consisting of four
nodes, represented as squares, and four relations between these,
represented as line segments, may happen to be represented by a
diagram in which the squares are connected by the lines in such
a way that a fifth square is perceived. Such a diagram should obviously
be rejected, since it conveys the wrong information. To discover
this fifth square the system must apply coding theory.
A system such
as Marks and Reiter's does not explicitly deal with Gestalt structure.
If such a system is to avoid all possibilities for Gestalts with
unintended interpretations, it must use extremely conservative expressive
rules, which consistently avoid crossing lines and closely placed
elements. We feel it is important to try to develop a more generally
applicable approach, where a system may check candidate diagrams
for undesired interpretations, by computing the minimum codes for
each diagram, and applying the interpretation key according to the
structures defined by those minimum codes.
Coding theory
and "conversational implicatures".
Marks and Reiter
have drawn attention to a subtle but important problem: a diagram
that represents a network may, although it is strictly speaking "correct", be viewed as a poor and useless representation
because it creates misleading suggestions. On the analogy with certain
linguistic phenomena, such misleading suggestions have been described
as unwarranted "conversational implicatures". In particular,
Marks and Reiter point ou that it is worthwhile to avoid "meaningless" perceptual groupings that might arise because of gratuitous differentiations
between the sizes, the mutual proximities, and the mutual alignments
of the graphical symbols that constitute the diagram. What this
amounts to, is that the overall Gestalt that is evoked by the diagram
should not contain elements that do not have an interpretation in
the network domain. All structure that is there, should be meaningful.
It seems clear
that one cannot always avoid unwarranted implicatures by just obeying
locally operating preference rules. Since we are interested in properties
of the Gestalt evoked by the diagram, the system must in fact compute
this Gestalt. Leeuwenberg's work shows how we may approach this
problem.
Among the diagrams
that correctly represent a certain network, we are interested in
the diagram with a minimum of unwanted implicatures. Marks and Reiter
show that this may amount largely to a preference for diagrams without
gratuitous differentiation. We want to suggest now that a minimum
of gratuitous differentiation may correlate with minimum complexity
in Leeuwenberg's sense. The most regular diagram may be the optimal
one. To choose the least complex diagram we may therefore use the
minimum principle of Structural Information Theory. We may compare
the information loads of the perceived Gestalts of alternative possible
diagrams of a given network; the 'best' diagram is the one with
lowest information load.
Since the Leeuwenberg-coding
of a diagram contains all perceived structure, there must be a homomorphism
between this coding and the definition of the network. If, from
the point of view of the application, the network is not just a
set of nodes and edges with labels, but is composed in ameaningful
way out of particular subsets, then it is desirable that every subexpression
of a Leeuwenberg coding of a candidate diagram should correspond
to a substructure of the network. Also, further properties of (sub)codings
of a diagram should correspond to properties of the network. For
instance, if (some part of) a diagram contains symmetry, it is preferable
if this symmetry is also found in the properties of the network
itself. If not, the symmetry in the diagram is not meaningful with
respect to the model.
However, it
must be stressed that there are also conventions about representing
network-like structures in a two-dimensional space; these may work
in a different direction than the meaningfulness-constraint. Four
vertices tend to be represented in a square, whatever their connecting
edges be. "Defaults" like this must also be taken into
account.
Conclusion.
Summarizing,
we may describe our model as follows: First the class of diagrams
of a certain network model is defined by means of expressive mappings
(Marks and Reiter, 1990): nodes and edges in a network are directly
mapped to symbols in a network diagram. Secondly, a partial order
over this class is defined by means of the Information Load of the
codings of the diagrams. Finally, the diagram with lowest information
load whose possible interpretations are correct and maximally meaningful
with respect to the model is chosen.
As to the prospect
of implementing such a model, it is clear that comparing exponentially
many diagrams of a certain model takes exponential time. Thus, it
seems that calculating the 'best' diagram is NP-complete. In practice,
however, we are not interested in the absolutely simplest correct
representation of a network model. We might settle for a relatively
regular diagram without ambiguities and false implicatures. In order
to calculate such a diagram in a tractable way, we can apply techniques
like simulated annealing or Monte Carlo methods.
In conclusion,
we raise some basic questions: (1) do we capture all implicatures
with this method?, and (2) what extension of Leeuwenberg's coding
system provides an adequate description of visual Gestalts? These
questions should be answered by testing an implementation of the
method suggested here.
References.
Rens Bod, 1993:
'Using an Annotated Corpus as a Stochastic Grammar', Proceedings
of the European Chapter of the ACL, 1993, Utrecht.
Rens Bod, Mehdi
Dastani and Remko Scha, 1993: 'Solving Proportional Analogies using
Structural Information Theory', to appear in: Proceedings of
the European Workshop on Case-based Reasoning, Kaiserslautern.
Emanuel Leeuwenberg,
1969: 'Quantitative specification of information in sequential patterns'.
Psychological Review, 76, 216-220.
Emanuel Leeuwenberg,
1971: 'A Perceptual Coding Language for Visual and Auditory Patterns'.
American Journal of Psychology, 83, 307-349.
Joseph Marks
and Ehud Reiter, 1990: 'Avoiding Unwanted Conversational Implicatures
in Text and Graphics', Proceedings AAAI '90, Menlo Park,
CA.
Remko Scha,
1992: 'Virtual Grammars and Creative Algorithms', Gramma/TTT 1(1).