(This post shares some of my thinking around my thesis subject at the Université du Québec à Montréal. It’s long (click on the link at the bottom to keep reading), but an easy read. I’d appreciate any comments.)
In “Perform or Else”, Jon McKenzie describes the relationship between cultural, organizational, and technological performance. Each type of performance has its own sphere of actualization, cultural performance representing performance art itself and the academic study of performance; organizational performance representing the corporate world and its demands on workers and systems to “perform”; and technological performance, representing the machines whose performance we rely upon to conduct our daily business. McKenzie states that these three types of performance will create a synergy which will ensure that “performance will be to the 20th and 21st centuries what discipline was to the 18th and 19th, that is, an onto-historical formation of power and knowledge.” (McKenzie, “Perform or Else”, pg.18.) McKenzie argues that this unspoken command, “perform or else”, bears all the hallmarks of the speed and tension of contemporary existence: you must perform or you will be replaced.
In the period shortly after World War II, a court reporter by the name of Horace Webb responded to the “perform or else” challenge posed by the difficulties in creating accurate transcripts of court proceedings with the current stenographic techniques. He had become frustrated with the inefficiencies posed by stenographic methods of court reporting, and developed the method of court reporting known as voice writing, or verbatim reporting. This method has a unique invention at its core, a device called the Stenomask. This object is a mask that fits over the lower portion of the face, with a built-in microphone. By enclosing the microphone inside the specially-designed mask, a reporter is allowed to speak without being heard by other people, and to keep background noise away from the microphone.
The stenomask was created in a true spirit of invention, responding to the lack of efficacy present in court reporting methods. Though the stenomask proved to be an excellent way to solve the problems with typed transcription in court, it brought its own problems to the fore, in particular, when the stenomask is forced to work in conjunction with voice recognition software on a computer.
When the stenomask is connected to a computer, the computer interprets the incoming sound and attempts to translate the sounds into text. It reads every sound that comes in through the stenomask’s microphone as a potential word. Many hours must be dedicated to creating a “voice model” for each user of the computer. The creation of a voice model involves reading similar words and phrases over and over so that the computer can learn how an individual user enunciates.
Voice recognition software is notoriously flawed, and even with hours of voice training, the computer may still misinterpret what is read to it. The human voice is subject to many external factors, such as disease and fatigue. This causes variability in tone, accent, and stress. This makes a completely accurate voice model impossible to create, hence, some element of error will always be present when working with voice recognition software.
The stenomask itself, as a physical object, was created to perfect a method of audio capture, in particular, the capture of the human voice. It is a successful invention, in that it does exactly what it claims to do – separate the human voice from external noise for the purposes of dictation. It is when the stenomask is coupled with voice recognition software that the potential for error occurs. The digital interpretation of the captured human voice is the most vulnerable element of this system. As a response to the “perform or else” challenge, the stenomask succeeds, but the voice recognition software fails.
It is these vulnerabilities, in a system that was designed with hopes of being invulnerable, that make the system interesting. The persona of the typical user of this device, as a stoic protector of the “true” record, is also a point of interest. The one who uses the stenomask device is effectively silenced and somewhat removed from the local context, and yet holds immense power over the record. The trusted accomplice in this relationship, the computer, will often misinterpret what is being read to it, despite hours of training the voice recognition software to understand the speaker. This double-bind of potential error, the human conduit and the technological interpreter, becomes a fertile ground for exploration of the malleability of the spoken word.
The role of the stenomask reporter is to provide a completely accurate account of what was said in court. The role of the computer is to provide an accurate translation. How do human tendencies in speech, such as the embellishment of details when retelling a story, impact this notion of the incontestable record? In the case of the stenomask, the speed and flexibility of the oral is married to the permanence of the written, as speech is directly funneled into the computer to create a record. This would seem to be an ideal pairing, however, in reality it is a constantly negotiable relationship, fraught with error on both the side of the oral and the written – ie. the human and the computer.
This relationship between the human and the computer is based on a certain level of functionality, on the software level, and a certain level of virtuosity, on the human level. It is this relationship, being bound together by a high level of skill, which is where the user of the stenomask really becomes a performer. It is the virtuosity with the stenomask device that is an answer to the “perform or else” challenge.
An operator of a stenomask can be trained to re-voice everything they hear into a stenomask connected to a voice recognition system, to produce a real-time text transcription. Because of the stenomask’s unique features, it is excellent for voice transcription in noisy environments, or use in quiet environments, where it is crucial that your voice doesn’t interfere with your surroundings, such as a court or a classroom.
Horace Webb and two colleagues spent several years designing the stenomask and perfecting the voice writing method. Webb’s first “stenomask” experiments included enclosing a microphone in a cigar box, number two tomato cans, bottles, and boxes, made of various materials and ranging in size. He found his initial tests to be quite dismal in audio quality. Eventually Webb discovered that rags or cotton at the bottom of the box assisted in producing a better quality recording, because the reverberating sound waves needed to be dampened as soon as they struck the microphone.
Webb’s final stenomask prototype involved fitting a rubber facepiece salvaged from Air Force equipment over a “Royal Chef” coffee pot on sale at a large department store. His prototype worked and he set about attempting to use it in courtroom environments in Washington D.C.
At first, the unsightly device was roundly dismissed, despite Webb’s best efforts to demonstrate the superiority of this system over traditional transcription methods used in courtrooms. The pivotal moment for the stenomask occurred when Webb’s colleague, Frank Kenny, visited Newport, RI, the home of the Naval Justice School. At that point the Navy wished to evaluate and test all known systems of court reporting, and to set up a school to train naval personnel as court reporters. The Navy conducted its tests, and the stenomask proved to be the most effective method of court reporting, far outranking the others. The Navy adopted the method, and soon thereafter, the method known as voice writing began taking over a significant portion of court reporting in the United States.
McKenzie’s idea that performance, and the capability to perform, is really a potent combination of power and knowledge, is analogous to the relationship that the stenomask-computer-user combination has with the world. The reporter holds the knowledge, the stenomask and the computer hold the power. Errors entering into this stenomask-computer-user relationship is extremely problematic.
From the perspective of a performance artist, the presence of errors in performance is equally problematic but accepted as a possibility. While there is not the weight of the public legal record to think about, there is the question of artistic integrity and personal reputation. The pressures that a performance artist faces when on stage are identical to the demands that a stenomask reporter faces when in court. The human at the centre of the system must rely on their virtuosity in performance to execute their function perfectly. Perfection is an impossible goal, but pressure remains to be as close to perfect as possible. In the case of the stenomask-computer-user, understanding your machines and working with them perfectly is critical to a successful formation of power and knowledge.
2 replies on “Technical Virtuosity, the Voice, and the Challenge of Performance”
One thing of interest for me in this is the nature of the courtroom as a performance arena- We have this strange idea that only a written record of the procedings is "impartial" enough. We could just mic everyone up, and and do a multicamera shoot- it wouldn’t cost that much more than paying stenographers,and it would be totally accurate but we don’t do it. We don’t even allow still cameras in the room. I like the fact that we send someone in to *draw* everything- imagine if we still did all news like that? Maybe we should. In my day to day work, I often have to deal with live closed-captioning of TV shows (interestingly, I’ve noticed the credits now read "closed captionining performed by___" ) In this case most of the technology is derived from court reporting, but the emphasis is on speed, not accuracy. The software that translates the phonetic steno keyboard into written English generates some pretty humorous stuff: my favorite was a person who said "I’m Ojibway"- the best the software could do with that was "I’m a Jew boy". Whoops.
It’s nice to hear what you’re working on. Very interesting post.