Music and the Computer: Up-Ending the Family Tree

Keynote Address to the 1995 Conference of the Australian Computer Music Association

Introduction

Computer music is nearly forty years old. Electronic music is twice that old, dating back to the invention of the Theremin Vox. In that time, computer music has brought together many diverse disciplines, creating hybrids such as psycho-acoustics and algorithmic composition, as well as spawning its own diverse branches. These range from performance instruments to music printing, from MIDI sequencing to automatic transcription. Such diversification is an indication of the success of the field. Yet some categorical divisions arose as a result of philosophical schism, often in response to limitations in the technology of the day.

As this is my first trip to Australia, it seems fitting to flip things upside down and take a fresh look at some of these traditional divisions. Focusing on areas such as sound synthesis, performance, and the role of the composer, we will see which of the familiar constructs still apply and which may no longer serve our best interests. The intention here is not to survey all existing work--this would require more time than we have--but rather to discuss a few examples drawn from my own work and that of several of my colleagues in order to show that hybridization is still an active force throughout the computer music family tree.

Synthesis vs. Sampling

One of the earliest dichotomies in electronic music was between recording-based music (i.e. Musique Concrete) and pure synthesis. In today's terms, this expresses itself as sampling versus synthesis. The strength of the recording-based approach is that the source material is itself of great richness. However, this complexity is achieved at the expense of flexibility. While the recording accurately depicts a snap-shot of an instrument, as soon as it is placed in a musical context requiring variety of expression, it falls flat, no more than a cardboard cut-out with nothing behind it. In contrast, synthesis allows for great malleability but is hard-pressed to create realistic sounds.

If we examine the current crop of samplers and synthesizers, we find manufacturers sensing the limitations of the extreme positions and moving toward the center, creating hybrid instruments. So-called "samplers" are enhanced by synthesis-like techniques such as pitch-shifting, looping, dynamic filtering, amplitude enveloping and other modifications that move far away from the pure recording-based paradigm. In fact, a looped sample is nothing more than a synthesizer's wave table oscillator with a wave table much longer than the period of the fundamental frequency. Meanwhile, so-called "synthesizers" now incorporate sampled wave tables, which may then be used as (for example) FM modulation signals. The distinction between sampling and synthesis is then reduced to a matter of degreeÑhow long is the wave table and what other techniques are applied?

A closely-related issue is that of analysis-based synthesis methods versus those not based on analysis data. Analysis-based synthesis methods, including the phase vocoder and linear prediction, are similar to recording-based methods except that the recording is analyzed to convert it into a more pliable and meaningful parametric representation. These techniques offer great realism but require massive amounts of data that is difficult to manage and store. Also, in that in order to synthesize a particular musical phrase, it is necessary to obtain a recording of that phrase. When the time comes to synthesize another phrase, a new recording must be obtained. In contrast, the non-analysis-based methods, such as FM, are much more flexible but do not provide the same level of realism.

In this area, we again find a plethora of hybrids emerging. These include Xavier Serra's stochastic-plus-deterministic decomposition technique, a variation of the phase vocoder which avoids the data overload problem by separating the signal into a deterministic component, which is then resynthesized by sine waves, and a stochastic component, which is then resynthesized by filtered noise.

Another hybridization technique involves extracting only certain aspects of the source signal and using a completely different synthesis method in the reconstruction phase. This can render the distinction between analysis-based and non-analysis based synthesis extremely ambiguous. For example, consider a work of mine, "Impossible Animals", originally written for chorus and tape, but now existing in a number of versions for different instruments and tape. With a desire to explore the boundaries between nature and human expression, I began with a recording of a Winter Wren, a North-American bird with a rich warbled song. Using a frequency domain sinusoid tracking technique developed by Julius Smith, I extracted pitch and amplitude trajectories from the bird song recording. This information was then processed by a series of home-brew programs that segmented the trajectories into individual "chirps", performed time-varying time expansion, imposed an evolving pitch range modification, tuned the flat portions of the frequency trajectories to an underlying evolving harmonic background and, most revealingly, mapped the frequency trajectory onto an interpolated set of vowels for resynthesis using the Chant vocal synthesis technique of Xavier Rodet. Because the frequency-to-vowel mapping is in effective even in rapid trills, the result is strangely evocative, as if the brain of a bird had been transplanted into the body of a wildly-gifted tenor.

We can side-step the entire recording/synthesis trade-off by moving to a physical modeling synthesis technique such as waveguide synthesis. Here, instead of trying to match a desired waveform or spectrum, a particular sound is obtained by modeling the physics of the sound-production mechanism itself. Marvelous realism can now be combined with the malleability that is necessary to synthesize expression. Furthermore, this malleability is of exactly the right kind--the parameters of the synthesis technique are precisely those that go into the construction and performance of the real-world sound-production system. For example, a plucked string model such as that developed by Karplus, Strong, Smith and myself, has both performance variables such as pick position and dynamic level, and instrument construction variables such as string stiffness and body shape. Such duplication of existing instrumental sounds is only the starting point. By extending the physical parameters far beyond their usual values, it is possible to distort and abstract the simulated physical mechanism, creating powerfully-expressive new instruments. The physical modeling approach assures that these imaginary constructs behave in ways that make intuitive sense, drawing as they do on our real-world experience.

Tape Music vs. Keyboard Music

Let us now shift our attention to the relationship between performer, composer and computer, with music for tape or tape plus instruments on one side and music for live electronics or keyboard synthesizers on the other. The schism here is largely historical, a result of technological limitations of the day. Computers were not fast enough to do sophisticated synthesis or processing in real time. Composers tended to prefer non-real time systems, where they could achieve the sonic complexity they desired, while performers, for whom real time interaction was essential, were required to use the real time performance systems. The gulf was widened by sociological factors. Non-real time systems were developed and used primarily in academia and state-funded radio and research institutions to produce art music using large mainframe computers available only in such institutions. Real time systems were most often developed by commercial musical instrument manufacturers and used primarily in popular music in conjunction with low-cost personal computers. Software reflected the priorities of these groups. The non-real time software tended to favor power, generality and fine control of timbre, while real time software emphasized ease of use and performance control of musical gesture.

Advancing technology is serving to bridge this gulf. Personal computers are now far more powerful then the old mainframes. New commercial software is beginning to support true general-purpose synthesis. Commercial synthesizers are becoming more programmable. Academic research has found its way into industry and commercial synthesizers and PCs are being extensively used in "serious" music, with composers exploring the interactive control possibilities of MIDI. General purpose synthesis and analysis systems can now run in real time using fast CPUs and DSPs. Modern hybrid software architectures, such as the NeXT Music Kit, subsume both the MIDI and Music5 paradigms into a higher-level object-oriented framework. It is no longer necessary to make an exclusive choice between fine gestural control and fine timbral control.

The tape versus live dichotomy is in reality a continuum. At one endpoint are pieces for tape and performers, with the instrumentalists slavishly synchronizing to the computer. At the other endpoint are pieces for keyboard synthesizer, where the roles are reversed, with the computer slavishly following the performer. The region in between, in which the computer and the performer form a more flexible relationship, is still largely unexplored.

The first step is to free the performer from the need to strictly follow a tape part, but without requiring a keyboard player to perform the entire electronic score. One way to accomplish this is for the computer to "listen" to the performer, either via MIDI or using pitch and amplitude detection on the audio signal itself. The computer maintains an internal representation of the score, follows where in the score the performer is and adjust its own performance accordingly. A commercial product based on this approach has been released by Coda, based on work by Roger Dannenberg. Another approach is for a conductor to lead the electronics in a similar manner to the live performers, using a device such as the Buchla Lightning or the Mathews Radio Baton.

The situation becomes particularly intriguing when such controllers are used as "virtual instruments." Here, a performer has the opportunity to use his or her familiar language of physical gestures--that of violin technique, for example--but with these gestures assigned a new meaning. The roles of composer and performer are redefined. Instead of the composer providing the performer with a description of the notes and the performer rendering an expressive reading, the composer is responsible for specifying the behavior of the virtual instrument, creating a virtual world or musical language in which the soloist is free to expresses him- or herself. It is up to the composer to determine where on the continuum between expression and improvisation to position the performer.

To give an idea of what is possible, let's take a close look at the extended cadenza from the "Statue of Zeus" movement of "The Seven Wonders of the Ancient World", a large-scale concerto I recently completed. The piece is scored for the Boie/Mathews Radio Drum--a variant on the Mathews Radio Baton--controlling the Yamaha Disklavier robotic grand piano and accompanied by an ensemble of eight plucked string and percussion instruments. The effect of the Drum/computer/piano combination is of a hybrid instrument that maps the gestures, techniques, idioms and cultural associations of percussion performance onto the piano mechanism, with its own history, expectations, and mechanical constraints. The goal was to allow the soloist freedom to realize the virtuosic potential of his instrument, while at the same time retaining sufficient influence over the musical material, so that the cadenza is guaranteed to fit within the context of the composed concerto form.

Here is a description of the algorithm devised in collaboration with the soloist, Andrew Schloss: The right mallet sends information to the computer whenever the performer strikes the Drum surface, while the left mallet reports its position continuously. The computer interprets the information--width (X), height (Y) and depth (Z)--to control its own note generation process that, in turn, controls the piano. As the left-hand Y value moves from the region furthest from the performer to the region closest to the performer, the piano changes from playing chords in whole notes to half notes, quarter notes, eight notes, sixteenth notes and, finally, thirty-second notes. The spacing and number of notes of the chords also changes; the greater the note's rhythmic value, the wider the chord spacing and the more notes per chord. At the same time, the right mallet can be used to introduce a new tempo in the matter of a metric modulation. The performer plays a new tempo with the right mallet, then presses a foot pedal which causes this pulse to be used as the new tempo for the left mallet's automatic material. On top of this, the right mallet's velocity determines the loudness of the notes played by both mallets. The X value of each mallet determines the transposition of the chord, where moving right on the Drum corresponds to higher transpositions. This configuration of the Drum/piano allows the soloist extensive improvisational freedom. However, it also constrains the possible piano sonority to the extent that the cadenza has a certain immutable character, no matter what the soloist plays.

The possibilities really open up when more than one virtual instrument are combined, as Schloss and I explored in "Wildlife", a duo for Zeta violin, Radio Drum, and NeXT and Macintosh computers. No longer is the violinist responsible only for the violin, while the drum is the exclusive providence of the percussionist. Instead, the boundary between the instruments becomes permeable, to an extent that may be controlled as a compositional parameter. The environment can be further enriched by semi-autonomous computer processes. These seemingly-intelligent organisms may either support, contradict or ignore what the human performers are doing, adding their own peculiar brand of virtuosity and expression.

Engineer vs. Artist

Conceiving of music in this way leads us to question yet another traditional dichotomy: the distinction between engineer and artist. Computer music imposes great demands on a composer. In addition to writing music, he or she is called upon to build instruments, create expressive performances and now, in addition, develop, rehearse and conduct interactive ensembles. Collaborations with sympathetic engineers can be invaluable. But to make the most of the computer medium, composers must also be willing to expand their expertise to include portions of the technical realm. Combining the technical and musical perspectives within a single mind allows for a much more fluid and organic compositional process. Luckily, learning to program computers is not particularly difficult and music programming is becoming easier, thanks to modern graphical programs like SynthBuilder, Kyma, Max and many others, applications that blur the distinction between the user and the programmer.

It is this quality of the computer--its programmability--that converts it from a quirky appliance to a tool for extending the mind. The computer reveals itself as simply a general-purpose programmable machine. The composer can choose to leave this programming to someone else or tackle it him- or herself. More realistically, he or she can mix these two approaches, using existing software where it suits the musical conception and writing new software when necessary.

The composer who does decide to program the computer gains the opportunity to tailor the language in which compositional ideas are expressed to the ideas themselves. This process is usually considered the realm of yet another branch of the computer music tree, algorithmic composition. But, with the exception of research into artificial intelligence, algorithmic composition can be seen as simply a means of expressing ideas in a compact manner, with the only limitation being the ingenuity and imagination of the composer.

Computer Music vs. Music

We have arrived at the root of the computer music family tree and will attempt our most subversive up-ending. The traditional conservatory viewpoint sees computer music as a tiny twig on the giant tree of Western music composition, dwarfed by such massive limbs as orchestral composition, choral composition, etc. Yet, computer music is far more general than all these other media combined. While all string quartets are constrained by the physics of the instruments and the (admittedly, ever-extending) limits of human virtuosity, no such boundaries exist in computer music. Computer music is an unbounded realm of idea, constrained only by the physical limitations of the medium of projection, which may be a set of loudspeakers, a Disklavier, or even a video screen. The computer is not a medium, not a style, but an apprentice, wildly-gifted in some ways, hopelessly simple-minded in others. In the best computer music, the computer itself disappears, allowing the personality of the composer to shine through.

Although computer music as a separate discipline has served us well for many years, perhaps it has become too comfortable a niche. The time may come when you can go into a record store and no more expect to find a record bin marked "computer music" than a bin marked "electric guitar music". For now, may the kind of hybridization and cross-fertilization discussed here continue to produce vital works of art, science and technology.