IAAA        Theory        Synthetic Vocal Music       Remko Scha



[This article was printed in Mediamatic, 7, 1 (Fall 1992), along with a Dutch version and with many illustrations which are omitted here.]




Remko Scha

Institute of Artificial Art, Amsterdam


Virtual Voices

(Mimesis)

The new digital media technologies which are now being developed are often imitative technologies. Future generations may end up viewing the twentieth century as the century of abstraction, and the now imminent turn of the century as the moment of a return to a mimetic aesthetics. The imitation of nature is once again a widely pursued artistic ideal. Sometimes this concerns the imitation, not of the way things look, but of the processes that constitute life, body, or mind. Mimesis is then called artificial life, robotics, or artificial intelligence. But in other cases, it concerns the classical ideal of the perfect simulation of the surface of things. Then it is called ray tracing, paintbox, digital photography, virtual reality.

Music exists between the poles of mathematical abstraction and pure physics. Imitation is not an issue there, one might think, but nothing could be further from the truth. After the failure of 'real' electronic music, which used sinusoids, square waves, noise and modulators to build sound sculptures that people don't particularly want to listen to, there is now an avalanche of digital electronic technologies that simulate the sounds of conventional instruments in great detail, and make them accessible for keyboards and computers with MIDI interfaces.

Artificial speech synthesis is an imitative technology which is closely connected with music. However, its relation with language also lends this medium an entirely unique character. This article explores the history, the techniques and the aesthetics of this medium.

(Voice)

Music is a matter of physics -- not so much because music is usually realized by means of sound, but rather because precisely the structural properties of music (such as metre, rhythm, harmony, melody) are based on physical phenomena.

Language is a matter of symbols. The conceptualization and abstraction of human experience.

Between language and sound: speech. Between mind and matter: the voice.

"Listen to a Russian bass [...]: something is there, manifest and persistent (you only hear that), which is past (or previous to) the meaning of the words, of their form (the litany), of the melisma, and even of the style of the performance: something which is directly the singer's body, brought by one and the same movement to your ear from the depth of the body's cavities, the muscles, the membranes, the cartilage, and from the depths of the Slavonic language, as if a single skin lined the performer's inner flesh and the music he sings."1

(To copy / To fake)

Within the technology of voice imitation, two approaches are usually distinguished: the genetic approach and the gennematic one. The genetic method imitates the physiological processes that generate speech sounds in the human body. The gennematic method is based on the analysis of the speech sounds themselves, and reconstructs these sounds without considering the way in which the human body produces them.

The speaking machines of the eighteenth century were based on the genetic principle: the hardware of the larynx and the oral cavity was reconstructed in a stylized way. If such an imitation is faithful enough, the sounds it generates resemble the sounds of human speech.

In the twentieth century, we see an entirely different approach: digital technology which calculates the shapes of sound signals and then uses loudspeakers to make them audible. The voice is no longer imitated, but its output is faked. The algorithm computes signals that evoke the image of a physical process that never occurred.

The eighteenth-century automaton is a mechanical body, a piece of clockwork claiming the qualities of life. In twentieth-century computer simulation, the mechanics is abstract, the machine dissolves into mathematics. The body has disappeared.

(To copy)

The impulse of classical sculpture: not representation, but imitation. A life-size, colored, three-dimensional model is not a model, but a copy. The master sculptors of classical mythology even managed to duplicate the human body in sculptures that did not only show perfect likeness, but that also could speak and move naturally. In Chinese and Germanic mythology, carpenters and silversmiths displayed similar skills, building treacherously seductive female automata.

The essential step on the road from myth to technology was taken in the seventeenth century. The idea that living organisms function according to the laws of physics, and could in principle be simulated by means of mechanical constructions, is then no longer a vague, alarming suspicion, but a scientific hypothesis. In the early seventeenth century, Descartes presented the thesis that animals are in fact machines.

Thomas Hobbes: "Nature, the art by which God hath made and governs the world, is by the art of man, as in many other things, in this also imitated, that it can make an artificial animal. For seeing life is but a motion of limbs, the beginning whereof is in the principal part within; why may we not say that all automata (engines that move themselves by springs and wheels as doth a watch) have an artificial life? For what is the heart but a spring, and the nerves but so many strings; and the joints but so many wheels giving motion to the whole body, such as was intended by the artificer?"2

In the seventeenth and eighteenth centuries, the construction of automata which imitate bodily functions of man or animal was extremely popular: the development of clockwork technology had made it possible to realize much better imitations than before; and the theories of the Cartesians lent a philosophical interest to such enterprises. Thus, there were dolls that could walk or talk, write letters, or play the flute; birds that could flap their wings, tweet, eat, drink and shit. There is a curious similarity between this kind of automaton building and present-day Artificial Intelligence. In those days, too, the capacities of the most advanced technologies of the moment were exploited with the goal of imitating the outward appearance of certain aspects of human behaviour; then, too, this resulted in products which aroused everybody's interest, because they could be regarded as technological experiments, as biological models, as philosophical existence proofs, as art, or as entertainment.

<<illustration: Mechanical Duck>>

This is illustrated by the career of Jacques de Vaucanson, one of the most well-known automaton builders of the eighteenth-century. His automata were amusing and astonishing exhibits in popular fairs, while their mechanisms were published in learned scientific articles -- by the designer himself, and also by Diderot and D'Alembert in their Encyclopédie. The production of the automata also generated interesting technological spin-offs. Eventually, Vaucanson became an innovative organizer in the textile industry, and built the most advanced silk-spinning factory of that time. He used the technology of his automatic flute player for the design of the first programmable loom, which would later become the basis for Jacquard's work.

(L'Homme Machine)

Human beings are emphatically excluded from Descartes' reasoning about the mechanical character of animals. He links mechanism with the absence of emotions and the absence of consciousness. Apparently he has no difficulty in viewing animals in this way, but that people could be machines is more of a problem. The Cartesian philosopher Cordemoy formulates the argument against the mechanizability of humans in terms of the idea of an automatic speech machine: "...although I see clearly that a purely mechanical apparatus could utter a few words, I know at the same time that the springs which distribute the air or open the tubes that let out the voices display a certain order between each other, which they could never change. So that, from the moment the first voice sounds, the voices that usually follow must necessarily follow as well -- that is, if the machine still has sufficient air. Contrarily, the words which I hear being uttered by bodies such as mine, are rarely pronounced in the same order."3

From this point of view, the richness of language is connected with the typically human capacity of free will, which is intrinsically incompatible with the rigidity of a clockwork. To account for free will, Descartes provides the - otherwise mechanical - human body with an interface to the immortal soul. He situates this interface in the pineal gland -- a small gland with an unknown function, located in the hypothalamus.

A hundred years after Descartes, the idea that human beings are machines as well was explicitly defended after all. In "L'Homme machine", La Mettrie argues that "all capacities of the soul are to such an extent dependent on the right organization of the brain and the entire body, that apparently they are nothing but this organization itself." In this theory, there is a material identity between body and soul; therefore, the necessity of a mind-body-interface has disappeared. Between man and animal, there are only differences in degree. The Cartesian arguments concerning the mechanical character of animals now apply directly to human beings, but at the same time their meaning changes profoundly. Mechanism no longer implies non-consciousness: consciousness itself is mechanical.4

If the more ambitious automaton builders of this period had based their research programmes on La Mettrie's viewpoint, they would have invented something like today's Artificial Intelligence: a discipline aimed at creating actual demonstrations of mechanized mental processes. But clockwork technology was not suitable for such a purpose. Therefore, this step could not be made until the middle of the twentieth century, when the electronic computer became available.

This is also why Cordemoy's argument was so plausible in his days. No one could foresee the virtually unlimited switching flexibility which would be introduced by Von Neumann's stored program computer. Software is mechanics which is capable of dynamic reconfiguration. Programs are virtual clockworks with self-modifying and self-extending capacities. Although these capacities are limited by the finiteness of the hardware on which the programs are implemented, in practice we can often ignore these limitations. Compared to a clockwork, the computer realizes a qualitatively superior complexity and flexibility. With this invention behind us, we must now forever view the limits of the mechanizable as unknown and open-ended.

(Talking Heads)

The first serious speech machines were developed by eighteenth-century automaton builders who were engaged in mechanical simulation of the bodily functions of man and beast. At this time, the sound of speech was not yet viewed as a phenomenon which could be analyzed and reconstructed. Speech simulation was imitation of the act of speaking. Artificial bodies were created, which could blow out air and thereby make the air vibrate; the fidelity of the artificial speech generated in this way depended on the accuracy with which the relevant features of the human body were reproduced.

Like human beings, these machines had 'vocal cords' which are set into vibration when air is pressed through. The precise functioning of the human vocal cords was not yet known at this time. To imitate them, the machine-builders used the principle of the harmonium: an air tube is closed off by a flexible metal tongue, which moves under pressure to let the air through and is consequently set into vibration. The tongue was often covered in leather to dim the high tones slightly.

As with a reed organ, this vibration was then conveyed to the air in a resonance chamber -- which in this case was made to resemble the human mouth as much as possible. Depending on the exact shape of the resonance chamber (that is, the position of the mouth), various vowels could be generated. Depending on the way in which the air stream was started or stopped, or obstructed by constricting the outlet, various consonants could be formed.

<<illustration: machine from the "bachelor machines">>

As Cordemoy had argued already, independently functioning machines of this kind could only deliver a limited repertoire of texts. Because speech simulation proved far from easy, in practice this even came down to rather small numbers of words or sentences, which would be hardwired into the machine. For this reason, speech machines were often designed as instruments instead -- machines which could generate all the sounds that are needed to pronounce any given text, but which could only pronounce an actual text if operated by a technical expert who determined which sounds were produced at which moment. Descartes' solution, one might say: a mechanical machine driven by human consciousness; a body controlled by a mind.

In 1778, for example, Wolfgang von Kempelen designed a machine which directly imitated the functioning of the oral cavity. The operator squeezes a pair of bellows to press the air, via 'vocal cords', into a resonance chamber, which he modulates with both hands. The various vowels are created by changing the shape of the resonance chamber with one hand; the consonants are produced as the other hand opens or closes this chamber in various ways.

A lung and vocal-cord prosthesis, which makes it possible to use the hands as a mouth. Technological perversion of speech.

<<illustration: Von Kempelen's machine>>

The 'vowel organ'.5 The same principle, but in this case: a carrousel of different sound cavities. A fan of vowels. A laboratory instrument operated by a technician, by means of switches, wheels, foot pedals. As a result of the technician's actions, the vibrating air is sent to one resonance chamber or another, and such a chamber is opened or closed in a variety of ways. Thus, by a succession of separate interventions, the technician realizes, one by one, the phonetic elements of the language expression to be pronounced.

<<illustration: Van Brakel's machine>>

Human speech is a continuous process. In this mechanical simulation, there is no such continuity. What we hear is phonology: the discrete combinatorics of linguistics.

Joop van Brakel on the 'vowel organ': language shattered into meaningless fragments. Slapstick, merriment, music. Language regressing to animal sounds. Cackling, bleating, barking. ("There once was a time when all speech was song.")

The vowel organ has ingenious 'artificial vocal cords'. A hollow cylinder with a slit in it continually rotates within another hollow cylinder, also with a slit in it. The result: a slit-shaped opening is opened and closed continually. The air is pressed through this opening. If we use this technique to set the air in motion without providing a connection to an 'artificial oral cavity' in which the air can resonate, what you hear is a fart. Is that the sound that underlies all speaking?

<<illustration: Van Brakel>>

Other speech machines create an even greater separation between the operating technician and the material production of sound: they insert a keyboard-interface. Abbé Mical's Têtes Parlantes (1783) and Joseph Faber's Euphonis (1840) belong in this category. Speech machines for entertainment. The designer also acted as operating technician, and as variety artist, ventriloquist: he puts a puppet on stage, and tries to create the illusion that it really speaks.

Here, the laboratory apparatus has become a musical instrument, with an interface which enables the virtuoso performer to add natural dynamics and timing to the mechanical speech utterances, and to compensate as much as possible for the limitations of technology.

Thus, one of Mical's contemporaries writes about the Têtes Parlantes: "With a little practice and agility, we will be able to speak with the fingers as with the tongue, and we will be able to give the language of the heads the speed, the calm, and in short all the qualities that a language can possess which is not animated by passions."6 On the keyboard of the Têtes Parlantes, you present a text as you would play a musical score on a piano.

<<illustration: Faber or Mical>>

(Soft Machines)

Alexander Graham Bell stands at a turning point in the history of speech synthesis. When he was young, his father took him and his younger brother to an exhibition where they saw a replica of one of Von Kempelen's speech machines. Back home, the boys proceeded to build a similar speaking machine themselves. When, several years later, Bell invented the telephone, he introduced the technique that would determine the future of sound processing: the represention of sounds by means of electric signals. Bell also produced a detailed design, that never got implemented, for a device that would have been a mechanical Vocoder.

But his most curious contribution to artificial speech synthesis was another early feat. "Bell's youthful interest in speech production also led him to experiment with his pet Skye terrier. He taught the dog to sit up on his hind legs and growl continuously. At the same time, Bell manipulated the dog's vocal tract by hand. The dog's repertoire of sounds finally consisted of the vowels /a/ and /u/, the diphthong /ou/ and the syllables /ma/ and /ga/. His greatest linguistic accomplishment consisted of the sentence, 'How are you Grandmamma?' The dog apparently started taking a 'bread and butter' interest in the project and would try to talk by himself. But on his own, he could never do better than the usual growl."7

A related technology, with a cyberpunk slant, is due to Johannes Müller, the father of modern physiology. "His working method is clearly characterized by his orientation toward experiments on living or dead objects. Continuing the efforts of Liskovius, who in 1814 was the first to generate chest- and head-voice from the larynx of a corpse, he cut off the head of a corpse in such a way that the entire vocal apparatus and part of the trachaea were preserved. By blowing air into the larynx of the corpse, Müller produced vocalic sounds which closely resembled human speech. By moving the lips, he even managed to generate some consonants."8

(To fake)

Hermann Helmholtz was a pupil of that same Müller. But his work in the field of speech synthesis was less physiologically and more acoustically oriented. In the second half of the nineteenth century, research into the phenomenon of sound had reached the stage where one could attempt to analyse the sounds of human speech into elementary components. To synthesize vowels, Helmholtz did not imitate the human body, but built up the sounds from elementary, sinus-shaped components.

His synthesis machine consists of a battery of tuning forks equipped with resonance chambers, with frequencies in harmonious proportions. Driven by electromagnets, the tuning forks vibrate with perfect regularity in their basic frequencies. The volumes of the contributions from the different tuning forks can be varied by partly opening or closing their resonance chambers. Thus, sounds with different spectrums can be composed, which bear resemblance to various vowels: Oo, Ee, Ah, Oh, Uh, Ih...

<<Illustration: Helmholtz' machine>>

The same method of synthesis can be applied even more easily with modern electronic technology -- a technology which was developed for the reproduction and transmission of sound. The crucial invention which made electronic sound generation possible was the loudspeaker: the general purpose sound producer which can replicate the sound of an arbitrary event, without having to mimick its material structure.

The loudspeaker transforms arbitrary electric signals into material sound waves. This creates the possibility of treating electric signals as models of sound waves. In electronic technology, this is done by means of resistors, induction coils, radio tubes, transistors. Objects with a specific electronic behaviour are combined into circuits which generate the desired output patterns.

The two kinds of approach mentioned above in connection with mechanical sound synthesis can be applied in electronics as well. The structure and the components of a mechanical system that imitates the human larynx can systematically be transposed to the electronic domain; this will indeed result in a circuit with an output signal that corresponds to the vocal sound produced by the mechanical model. Translating Helmholtz' approach to the electronic realm is even simpler: replace his tuning forks with sine wave generators, and his adjustable resonance chambers with potentiometers.

<<illustration from Köster, p.239 (Paget), or alternatively Flanagan>>

Electronic simulation has a material form: a circuit consisting of identifiable components and connections. But on the outside, nothing seems to be happening. The clockwork stands still. It thinks.

The structure of the circuit corresponds to the mathematical analysis of a physical sound-generating process. The circuit is a materialized diagram. (A print board actually looks like that.9)

The computer is the next step in the development towards an increasingly abstract simulation. The hardware no longer has anything in common with the physics conjured up for the listener. The hardware even has a structure which is essentially incompatible with the origins of music. A computer really 'computes': it manipulates discrete symbols. Music, on the other hand, is generated by the resonance of continuous systems.

Digital sound simulation is two steps away from real sound: the electric signal driving the loudspeaker is represented in the computer as a sequence of discrete symbols that represent the amplitude variation in time, split up into small discrete steps. Thus, even the continuity of the electric signal is faked.

The operations on the symbolically represented signals largely correspond to the functioning of the components from electronic circuits -- but because these operations are now symbolically represented as well (installed as software in the computer), they can be applied with infinite flexibility, in every imaginable combination and sequence. Cordemoy's impossibility has come true: lifeless matter has escaped the rigidity of the clockwork.

The flexible machine which can do anything is at the same time the enigmatic machine which shows nothing. The machine is motionless, so that we do not see anything happening. But neither does the wiring structure of the components reveal anything about the functions performed. This structure only says: calculations in progress.

<<illustration: computer>>

The flexibility of the software medium is virtually complete. All operations which can be described mathematically can be implemented. Even the fact that the execution of each operation takes a short, but not infinitely short, moment of time, and that very complex combinations of operations can therefore take a long time, is hardly a limitation anymore. This practical problem is solved by VLSI technology. It is often possible to develop special chips for sub-processes which take too much time: large-scale integrated electronic hardware, which is less flexible than software, but extremely fast.

Everything you can imagine you can do with software. That is what's interesting about A.I. and other experimental branches of computer science: we discover the limits of what we can imagine. Sound synthesis is a typical example of this: modern synthesizers can produce a tremendous richness of sound, but imitations of existing instruments still sound stylized. Where they do sound natural, this is because they are not synthesized on the basis of structural analysis, but on the basis of samples. In that case there is no imitation, but reproduction of a previously recorded sound. The best sounding synthesizers have a great deal in common with tape recorders. They are digital mellotrons.

Digital sound registration technology is now the technology with the highest accuracy. The basic methods of digital sound representation are thus completely adequate. The limitations of digital sound synthesis are solely due to the limitations of our understanding of the psychological structure of sound.

(Platonic People)

Because their speech was barely intelligible, there was not much use for the first electronic speech-synthesis systems. For example, you could not make them speak a complex text with unpredictable contents if you wanted the text to be understood by an audience.

These systems also sounded distinctly inhuman. The voice appears to be generated by an alien body which is not flesh and blood -- by the angular movements of the metal components of the prototypical robot. What you hear is a machine which, in its awkward mechanical way, tries to use the human means of communication. This behaviour evokes disturbing questions about the possibilities and the dangers of technology, about mind and matter, and the nature of human identity.

But current state-of-the-art software is different. A typical example is DECtalk.10 This program is the realization of Abbé Mical's wildest dreams. Têtes parlantes: not one, not two, but nine different ones; and all of them can moreover be modified and interpolated. The DECtalk manual presents their portraits and gives them names: Rough Rita, Frail Frank, Whispering Wendy, Huge Harry, Kit the Kid, Perfect Paul, Beautiful Betty, Uppity Ursula, and Doctor Dennis. Protagonists of a comic strip version of Peyton Place.

<<illustration: DECtalk voices>>

The input for programs such as DECtalk consists of discrete symbols. The program processes files that consist of sequences of phonemes. So there is no human control of timing and dynamics, as with the eighteenth-century machines which were operated by means of a keyboard. In spite and even partly because of this, the output has greater continuity. The software does not only contain models of the signals that correspond to the individual phonemes, but also procedures for merging the successive signals seamlessly together.

Modern synthetic voices are perfectly intelligible. And because of a more accurate control of the spectrum of vowels, the distinctively metallic quality of the sound has disappeared. But nevertheless, no one would confuse their output with human speech. The synthetic voice is still inhuman, if only because of its uniformity.11

DECtalk's standard voice, Perfect Paul, is an abstract sounding voice, that of a newsreader. Neither machine, nor human being. This marks the birth of a new medium. Up until now, you could not listen to a text without listening to someone's body. The independent text, independent of the human body, was always the printed text. For the first time, language now has a sound independent of the body -- a sound that directly emanates from the linguistic system, from syntax and phonemes.

The next step in this development is foreshadowed by other DECtalk voices, such as Whispering Wendy and Huge Harry. These are more personal, but just as equable and imperturbable, smooth and continuous. Airbrush pinups. Platonic bodies.

Whispering Wendy's voice has a pure, clear sound, with very little substance -- like Marilyn Monroe's singing voice, or Brigitte Bardot's. The suggestion of a soft, supple, weightless body. Huge Harry is Wendy's macho counterpart. His voice is heavy and lustful. Not Elvis Presley yet, but not bad for a beginner.

The synthetic body has already become an erotic ideal. Look, for instance, at the use of classical statues in thirties' fashion photography: "The forms of high fashion assume the look of the statuesque, the hallowed, the classical. Living flesh has the smoothness, the soft luster of ancient marble. Stone, it almost seems, is as supple as flesh. Hoyingen-Huene makes an equation between living and not living bodies, and the equation enchants, for in his photographs the bodies that do not live are not dead. They are statues. His imagery argues that in the realm of fashion there is no death. To enter the fashionable instant is to live forever."12

The future of digital image- and sound-simulation: the smooth coolness of the statue in a naturally moving body, in a sensually modulating voice. Technology is heading slowly but surely toward increasingly perfect robot-porn. Live performers like Prince and Michael Jackson are already beginning to dissolve into their computer-animated images.

When Andy Warhol invented commercial telephone sex, he suggested in the same breath that it could best be done by robots: "A robot-computer to answer the phone, that would be great. It would do the job without emotion."13

(Epilogue by Ultra Violet)

"I think back to one of Andy's earliest paintings, compelling in its simplicity -- a starkly black-and-white six-foot-high Coca-Cola bottle, painted in oil on canvas in 1960. I think of the paintings of clean, shiny Campbell soup cans, the young, unlined, fresh-scrubbed faces of Marilyn Monroe, Jackie Onassis, Ingrid Bergman, so many others."

"Then gradually I begin to grasp what Andy was trying to say with all his babble about machines and sex. Where sex has turned repulsive and inhuman, machine sex beckons alluringly. Only in telephone sex, robot sex, computer sex, is there escape from ugliness and cruelty. Machine sex is the only kind left that is uncontaminated, antiseptic, clean, even a little mysterious [...].

Yes, here is still another of the endless paradoxes Andy strews along our paths. In sex, as in art, [...,] he reinvents shining, pristine, early morning purity. His kind, of course: on the surface, no deeper."14


english version olivier/wylie/scha


NOTES

1. Roland Barthes: "Le Grain de la Voix." In: L'obvie et l'obtus. Paris: Éditions du Seuil, 1982. [English translation: "The Grain of the Voice." In: The Responsibility of Forms. Critical essays on Music, Art and Representation. New York: Hill and Wang, 1985, pp. 269/270.]

2. Thomas Hobbes: Leviathan. 1651 [Harmondsworth, Middlesex: Penguin, 1968.]

3. G. de Cordemoy: Discours physique de la parole. Paris, 1666.

4. Julien Offray de la Mettrie: L' Homme machine. Leyden: Luzac, 1748.

5. This is a relatively recent machine (built at the Institute of Phonetic Sciences of the University of Amsterdam), but its method of operation definitely belongs to the eighteenth-century tradition.

6. "Avec un peu d'habitude et d'habileté, on pourra parler avec les toigts comme avec la langue, et on pourra donner au langage des têtes la rapidité, le repos et toute la physionomie enfin que peut avoir une langue qui n'est point animée par les passions." From a letter by Antoine de Rivarol, 1783 (Oeuvres complètes de Rivarol, Part III, p. 207. Paris, 1808.)

See: Jens-Peter Köster: Historische Entwicklung von Syntheseapparaten zur Erzeugung statischer and Vokalartiger Signale nebst Untersuchungen zur Synthese deutscher Vokale. (Historical development of synthesis machines for generating static and vowel-like signals and research into the synthesis of German vowels.) Hamburg: Buske, 1973, p. 85. On p. 95, Köster also quotes another part of this letter: " If these heads were multiplied in Europe, they would raise terror in all those Swiss and Gascon language teachers, whose influence has infected all countries and who disfigure our language for the peoples who love it." Köster comments: "Here lie the roots of the use of technological tools in foreign language teaching."

7. James L. Flanagan: Speech Analysis Synthesis and Perception. Second Edition. Berlin: Springer, 1972, pp. 206/207.

8. Köster, op.cit., p. 149.

9. Cf. Dick Raaijmakers: "De kunst van het machine lezen." (The art of machine reading.) Raster, 6 (1978), pp. 6-53.

10. DECtalk was developed by Digital Equipment on the basis of MITalk. See: Jonathan Allen, M. Sharon Hunnicutt and Dennis Klatt: From text to speech: The MITalk system. Cambridge (UK): Cambridge University Press, 1987.

11. Speech technologists are doing their best to imitate human limitations and imperfections. Allen et al. (op.cit.), for example: "Some additional pauses are introduced in longer phrases and slow speaking rate so that the talker does not seem to have an inhuman supply of breath."

12. Carter Ratcliff: "Out of Time." Artforum International 30, 1 (September 1991), pp. 112-117.

13. Ultra Violet: Famous for 15 minutes. My years with Andy Warhol. New York: Avon Books, 1990, p. 163.

14. Ultra Violet: Op. cit., pp. 165/166.