automatic diagram design and Gestalt perception

IAAA Theory Remko Scha

[This paper was printed in the ESPRIT report HCI from a Discourse Perspective. HCRC, Edinburgh University, 1993. ]

Rens Bod and Remko Scha

Deriving optimal network diagrams by means of Structural Information Theory.

Introduction.

The problem of automatically generating graphic information presentations consists of two parts: (1) deciding on the interpretation schema that is to be applied to the graphics, and (2) deciding on an actual graphic object that represents the intended information if interpreted according to this key. In this paper we focus on the second part of this problem. Assuming that we have decided on the interpretation that is to be used, it is often non-trivial to design a graphical object that yields the correct information under this interpretation, and that also avoids unwarranted suggestions. To assess the semantic validity and pragmatic felicity of graphical information presentations, a system must explicitly consider the perceptual Gestalts that the graphics evokes.

For the sake of concreteness, this paper focusses on one information presentation problem: the automatic design of diagrams that represent networks. It suggests that the diagrammatic representations of a network that may be considered semantically correct are the ones that evoke a set of Gestalts such that each of them, if a viewer applies the contextually assumed interpretation key, (1) can be interpreted as denoting the intended network, and (2) cannot be interpreted differently. (We distinguish (1) and (2) because we are not sure that interpretation keys that are in practice used for graphical representations guarantee exclude ambiguity.) Among these representations, we should prefer the one that avoids misleading suggestions to the largest possible extent; a good candidate for this optimal diagram is the one that is perceptually simplest.

To elaborate this idea in mathematical and computational detail, we propose to extend a formal model of Gestalt perception that was developed by E. Leeuwenberg. This model computes a preference ordering on alternative perceptual groupings of an input pattern, based on the information load of their codings. The grouping that a person actually perceives is the one with the lowest information load.

Within the limits of its applicability, this model enables us to predict the Gestalt that a graphical object will evoke in the person who sees it. At the same time, Leeuwenberg's notion of information load constitutes a plausible formal notion of simplicity that can be used as a criterion for distinguishing between alternative diagrammatic representations of one network. The ultimate success of this method depends, however, on the adequacy of the visual coding language that is assumed. Leeuwenberg's proposals about this suffer from some obvious limitations.

Automatic diagram design: Marks and Reiter.

We consider the problem of automatically generating a diagram that represents a given network -- where we assume as given a definition of the network as a collection of nodes and edges and predicates associated with them, and we also assume as given a convention regarding the graphical representation of the nodes (as squares, for instance), of the edges (as line segments, for instance), and of the predicates (as text labels, for instance). If so much is assumed, what is left is essentially the problem of designing a reasonable lay-out (on the screen or the page) of a not very large collection of connected symbols.

Marks and Reiter (1990) have drawn attention to the complexity of this problem. They show that the space of lay-outs to be considered is extremely large, and that different lay-outs may have significantly different properties; for instance, many of them may evoke misleading suggestions in the mind of the viewer. Marks and Reiter therefore propose a set of preference rules to differentiate among the different lay-outs that might be considered. These rules express plausible preferences between alternative local decisions about the layout of a diagram; they aim at avoiding "meaningless" perceptual groupings that might arise because of gratuitous differentiations between the sizes, the mutual proximities, and the mutual alignments of the graphical symbols that constitute the diagram.

Marks and Reiter have shown the importance of Gestalt perception phenomena for making optimal decisions about lay-out design. But they do not attempt to face the full extent of the Gestalt perception problem. Their rules deal directly with the diagram elements, and don't refer explicitly to the global Gestalt organization of the viewer's perception. In the general case, this is not sufficient; the viewer does not assign interpretations to just any part of the diagram, but only to the constituents of the Gestalt which the diagram evokes. We cannot take for granted that whatever is drawn on the screen is always perceived as such.

Another limitation of Marks and Reiter's system is that it is not clear how their rules interact when there are conflicts between different preferences. For demonstrations with a prototype system, one may sometimes be able to choose applications which allow all preference rules to be independent of each other; but a more generally applicable system cannot rely on this. (Systems that try to account for Gestalt perception by means of locally applicable preference rules always run into this rule interaction problem. Another case in point is Lerdahl and Jackendoff's theory about the perception of musical structure.)

In the remainder of this paper, we sketch how a more principled method for automatic lay-out design may be developed, based on Leeuwenberg's method of formalizing Gestalt perception.

Structural Information Theory.

Leeuwenberg's (1969, 1971) coding theory starts out with a plausible intuitive idea about Gestalt perception: when a person sees a visual pattern as a certain Gestalt rather than another, this person has a mental representation which (1) describes all or most of the details of the visual pattern, and (2) assigns a particular constituent structure to this pattern. (We may note an obvious analogy with the linguistic notion of syntactic surface structure as applied to natural language utterances.) To build a formal model of Gestalt perception, we must therefore have (1) a formal way to code input patterns, and (2) a formal way to code visual Gestalts which assign a specific constituent structure to an input pattern. Assuming such coding systems, we can then give a mathematical formulation to the problem of modelling Gestalt perception: given an input code, what is the resulting Gestalt code?

Leeuwenberg has proposed a simple but powerful coding system, based on a view on visual patterns that is now known as "turtle graphics". The system he develops is limited to drawings that consist of discrete straight line segments. This makes it possible to characterize input patterns by means of an extremely simple system. A "primitive code" of a two-dimensional line pattern is constructed by tracing the contours of the pattern and successively noting the lengths of line elements and the angles between them. Such a code thus consists simply of a sequence: angle1 length1 angle2 length2 ... angleN lengthN. ("angle1" specifies the angle of the first linesegment with respect to a given axis; the other angles are relative to the direction of the previous linesegment. Note that this coding system does not specify one canonical coding for a given input pattern; obviously, one pattern can be drawn in many different orders. The system does, however, establish a well-defined equivalence relation between alternative input codings; therefore it is possible, in principle, to use an arbitrary input coding to stand for this equivalence class.)

The language that Leeuwenberg proposes for coding Gestalts is an extension of the language that he uses to code raw input. Thus, concatenations of pairs <angle, length> do not only serve as input codings, but are also recognized as possible Gestalts. But, since regularities in the input may give rise to specific perceived constituent structures, the Gestalt coding language also provides means to encode such regularities. For instance, out of arbitrary Gestalt codes a, b, c one may build more complex codes by means of operations like iteration, symmetry and alternation: Iteration(3,a) describes the same pattern as aaa; Symmetry(ab) describes the same pattern as abba; and Alternation (bc,a) describes the same pattern as baca.

By employing the higher order operations that such a Gestalt coding language provides, an input pattern may be often be recoded in several ways. Leeuwenberg's central assumption is, that Gestalt perception is in fact this recoding process. His central claim about this process is, that it results in the Gestalt that corresponds to the least complex code -- where complexity (also called "information load") is measured by simply counting the number of "primitive elements" (i.e., elements from the primitive code of the input) that still occur in the Gestalt code. For instance, if a and b are both primitive elements, then bbaaa may be recoded as Iteration(2,b) Iteration(3,a) or as

b Alternation (ba,a). The first recoding is preferred, because it contains only 2 occurrences of the primitive elements a and b, whereas the second recoding contains 4 such occurrences. This predicts that the subpatterns defined by the subcodes Iteration(2,b) and Iteration(3,a), i.e., bb and aaa, will be perceived as constituents of the Gestalt evoked by bbaaa.

A code with a lower structural information load then alternative codes for the same pattern, is called a minimum code. Coding theory thus replaces the classical principles of Gestalt perception with a single one: the perception of a pattern will result in an interpretation that corresponds to a minimum code; the information load of what is actually perceived is lower than the information load of anything that could have been perceived alternatively. This reformulation of the old law of Prägnanz is known as the minimum principle of coding theory.

Leeuwenberg often talks as if one input pattern evokes exactly one minimum code. It is attractive to generalize Leeuwenberg's approach to one which recognizes the possibility of a set of distinct minimum codes for one input, if the information loads of several alternative analyses which do not subsume each other lay close together. In the sequel we sometimes assume such a generalization.

During the past 20 years, Leeuwenberg and his co-workers have reported on a number of experiments that tested predictions based on the minimum principle. They investigated the disambiguation of ambiguous patterns, embeddedness, closure and the perception of foreground and background. On the whole, the results of these experiments confirmed the psychological reality of an ordering of interpretations in terms of a measure of perceptual simplicity computed on the basis of Structural Information Theory. We have also used the minimum principle to solve proportional analogy puzzles (Bod et al., 1993). A statistical generalization of the minimum principle is one of the key ideas in a new performance model of natural language processing (Scha, 1992; Bod, 1993).

Coding theory and the correctness of diagrams.

The semantically correct representations of a network are the ones that evoke a set of Gestalts such that each of them, if a viewer applies the contextually assumed interpretation key, (1) can be interpreted as denoting the intended network, and (2) cannot be interpreted differently. It should be emphasized here that is the Gestalts which get interpreted, not what is physically on the screen or on the page.

Note that this also means that an interpretation key cannot be simply inverted to yield a set of "expressive rules" which map a network definition onto the diagrams that represent it; in general, such "expressive rules" can only have a heuristic status, in that they define a set of structures that might represent the diagram -- candidates for the diagram generation process to consider.

Network diagrams generated by reasonable expressive rules may induce Gestalt perceptions with interpretations that are not compatible with the network they are intended to represent. For instance, a model consisting of four nodes, represented as squares, and four relations between these, represented as line segments, may happen to be represented by a diagram in which the squares are connected by the lines in such a way that a fifth square is perceived. Such a diagram should obviously be rejected, since it conveys the wrong information. To discover this fifth square the system must apply coding theory.

A system such as Marks and Reiter's does not explicitly deal with Gestalt structure. If such a system is to avoid all possibilities for Gestalts with unintended interpretations, it must use extremely conservative expressive rules, which consistently avoid crossing lines and closely placed elements. We feel it is important to try to develop a more generally applicable approach, where a system may check candidate diagrams for undesired interpretations, by computing the minimum codes for each diagram, and applying the interpretation key according to the structures defined by those minimum codes.

Coding theory and "conversational implicatures".

Marks and Reiter have drawn attention to a subtle but important problem: a diagram that represents a network may, although it is strictly speaking "correct", be viewed as a poor and useless representation because it creates misleading suggestions. On the analogy with certain linguistic phenomena, such misleading suggestions have been described as unwarranted "conversational implicatures". In particular, Marks and Reiter point ou that it is worthwhile to avoid "meaningless" perceptual groupings that might arise because of gratuitous differentiations between the sizes, the mutual proximities, and the mutual alignments of the graphical symbols that constitute the diagram. What this amounts to, is that the overall Gestalt that is evoked by the diagram should not contain elements that do not have an interpretation in the network domain. All structure that is there, should be meaningful.

It seems clear that one cannot always avoid unwarranted implicatures by just obeying locally operating preference rules. Since we are interested in properties of the Gestalt evoked by the diagram, the system must in fact compute this Gestalt. Leeuwenberg's work shows how we may approach this problem.

Among the diagrams that correctly represent a certain network, we are interested in the diagram with a minimum of unwanted implicatures. Marks and Reiter show that this may amount largely to a preference for diagrams without gratuitous differentiation. We want to suggest now that a minimum of gratuitous differentiation may correlate with minimum complexity in Leeuwenberg's sense. The most regular diagram may be the optimal one. To choose the least complex diagram we may therefore use the minimum principle of Structural Information Theory. We may compare the information loads of the perceived Gestalts of alternative possible diagrams of a given network; the 'best' diagram is the one with lowest information load.

Since the Leeuwenberg-coding of a diagram contains all perceived structure, there must be a homomorphism between this coding and the definition of the network. If, from the point of view of the application, the network is not just a set of nodes and edges with labels, but is composed in ameaningful way out of particular subsets, then it is desirable that every subexpression of a Leeuwenberg coding of a candidate diagram should correspond to a substructure of the network. Also, further properties of (sub)codings of a diagram should correspond to properties of the network. For instance, if (some part of) a diagram contains symmetry, it is preferable if this symmetry is also found in the properties of the network itself. If not, the symmetry in the diagram is not meaningful with respect to the model.

However, it must be stressed that there are also conventions about representing network-like structures in a two-dimensional space; these may work in a different direction than the meaningfulness-constraint. Four vertices tend to be represented in a square, whatever their connecting edges be. "Defaults" like this must also be taken into account.

Conclusion.

Summarizing, we may describe our model as follows: First the class of diagrams of a certain network model is defined by means of expressive mappings (Marks and Reiter, 1990): nodes and edges in a network are directly mapped to symbols in a network diagram. Secondly, a partial order over this class is defined by means of the Information Load of the codings of the diagrams. Finally, the diagram with lowest information load whose possible interpretations are correct and maximally meaningful with respect to the model is chosen.

As to the prospect of implementing such a model, it is clear that comparing exponentially many diagrams of a certain model takes exponential time. Thus, it seems that calculating the 'best' diagram is NP-complete. In practice, however, we are not interested in the absolutely simplest correct representation of a network model. We might settle for a relatively regular diagram without ambiguities and false implicatures. In order to calculate such a diagram in a tractable way, we can apply techniques like simulated annealing or Monte Carlo methods.

In conclusion, we raise some basic questions: (1) do we capture all implicatures with this method?, and (2) what extension of Leeuwenberg's coding system provides an adequate description of visual Gestalts? These questions should be answered by testing an implementation of the method suggested here.

References.

Rens Bod, 1993: 'Using an Annotated Corpus as a Stochastic Grammar', Proceedings of the European Chapter of the ACL, 1993, Utrecht.

Rens Bod, Mehdi Dastani and Remko Scha, 1993: 'Solving Proportional Analogies using Structural Information Theory', to appear in: Proceedings of the European Workshop on Case-based Reasoning, Kaiserslautern.

Emanuel Leeuwenberg, 1969: 'Quantitative specification of information in sequential patterns'. Psychological Review, 76, 216-220.

Emanuel Leeuwenberg, 1971: 'A Perceptual Coding Language for Visual and Auditory Patterns'. American Journal of Psychology, 83, 307-349.

Joseph Marks and Ehud Reiter, 1990: 'Avoiding Unwanted Conversational Implicatures in Text and Graphics', Proceedings AAAI '90, Menlo Park, CA.

Remko Scha, 1992: 'Virtual Grammars and Creative Algorithms', Gramma/TTT 1(1).