What I want is an electronic "instrument" (ie: some set of equipment) that lets me sing, and, by also specify chords, produces an ensemble sound from the sound of my voice and the chords, by adding in extra samples of my voice modulated to form the chord.
So, if I was singing a C, and specifying that the chord is a major chord and that I'm singing the bottom note, then the output would be the original vocal sound (C) plus two modified versions (singing E and G).
I think a basic setup could be something like this:
The Realtime Vocal Ensemble Effect takes a digital audio input, a midi input, and has a digital audio output. The digital audio input is modified according to control signals from the midi input, and the modified audio comes out as the output.
The singer/player needs an input device that can specify a relative chord. This chord information must include the notes that make up the relative chord, and an indication of which of those notes is being sung by the singer.
For example, the singer might specify that a Major 7 chord is being played, of which the singer is singing the third, ie: (1, 3, 5, 7) (bold means that is the note being sung).
A standard Midi Keyboard could be coopted for this function. Note that the keys would no longer represent a series of fixed pitches – they would represent relative differences from the sung note.
The keyboard is made relative by redefining an absolute note (we will choose middle C because it is in the middle) as the note the singer is singing. Whatever the singer sings, that is represented on the keyboard as the key formerly known as middle C.
Thus, the keys represent distance from the note the singer is singing, as shown below:
So, to specify that you want a root position major chord, where the singer’s voice is the root note, you would depress the following keys:
To specify a major seventh, where the singer is singing the third, you would depress the following keys:
Using this method, the singer can hold any note, and change the keys being pressed to move the chord around their voice. Or, alternatively, the singer could choose a major chord, then glissando down, say, a 5th, and all the transposed voices would move with the singer.
Internally, the Realtime Vocal Ensemble Effect could look like this:
The transpositions are simple modules. They transpose a digital signal up or down in pitch. This could be specified in semi-tones. Each one is simply instructed “transpose up 4 semitones” or “transpose down 2 semitones”. The number of transpositions and their transposition settings are determined by the Transposition Controller.
NOTE: Nowhere here does this mechanism attempt to determine the pitch of the sung note. This is never necessary. The singer sings whatever they desire, and, completely separately, the keys pressed on the keyboard instruct the Transposition Controller to make certain transpositions. The transpositions simply transpose digital audio up or down in pitch, *regardless of what the digital audio actually is*. So there is NO PITCH DETECTION REQUIRED OR DESIRED IN THIS MECHANISM.
The singer sings into the microphone, and uses the keyboard to specify the chord to create from their voice, in a relative fashion, as specified above. Into the Realtime Vocal Ensemble Effect, comes the digital audio of the voice, and midi note signals from the keyboard.
It is the job of the transposition controller to decode the midi notes into a set of relative pitches, and then modify the Transpositions accordingly.
It would work as so:
The transposition controller would work on key down / key up type messages.
When a key down message is received, the Transposition Controller calculates the relative distance from Middle C of the note, creates a new Transposition, configures it to use the calculated distance for transposing, and enables it.
When a key up message is received, the Transposition Controller calculates the relative distance from Middle C of the note, finds the existing Transposition for that relative distance, and deletes it.
Special case: Middle C
Middle C represents the non-transposed digital audio input. Key down means enable the pass thru channel, key up means disable it.
This tool is conceived as a performance instrument for singers. However, it can clearly be used in other ways.
Firstly, there is absolutely no reason to use it only for solo voice. It’s been conceived for solo voice, because voice is a relatively pitched instrument (in the way that a piano, for instance, is not). However, you could use this on any digital audio stream in realtime, probably with very interesting results.
Secondly, this could be used in a recording/composition setting, as a useful studio effect, for adding harmonies to audio. There’s no reason to confine it to a performance only setting.