Electronic dance music, (Techno) is a style of music with a repeating rhythm and melody, lending itself nicely to be produced automatically. Often, a DJ will will mix and blend the songs into each other, producing a multi-hour amorphous piece of music. Something along those lines should be possible to be produced real-time, unassisted. The software will have to learn what the listener likes and dislikes, so that it may generate and play music without human intervention.
The mechanism behind all of this should be a genetic algorithm (GA) system. unlike traditional GA systems, which try to achieve a single specific solution to the problem at hand, for example, the peak on a bell curve. This one will try to achieve a result in a region around the peak in the curve, or perhaps occasionally completely deviate from the curve entirely.
I will try to find the best balance between the various appropriate child-individual production/crossover methods (n-point, mutations, etc) and also to find out the best method to generate fitness for the various produced music pieces.
There would be different levels to the generation/representation of the music.
To generate a musical "song", you would need to determine the flow structure of it. For example, "intro", "part 1", "bridge", "part 2", "chorus", "solo", "ending", and so on. Each of of those elments would contain a series of metameasures, probably a multiple of 2 (2, 4, 8, 16, 32, 64, etc), as most dance music is arranged that way. Then each metameasure would be composed of a pattern of measures. Typically, these patterns in techno/dance music are either AAAB or ABAC for songs with four measures, AAAAAAAB and ABABABAC for songs with eight measures.
Also, at the song level, are "transitions" between "songs". This would be where two different styles of songs, whether it be different tempos, different instrument tonalities, or whatever, are connected seamlessly.
Within each of the measures are either 2, 4 or 8 beats. They also contain many channels. A channel is where an instrument is played. You could have one channel for the bass drum, one for the snare drum, one for the melody line, one for the sound effects, and so on. Each of these channels can be rendered on multiple "voices" in the sound card, or in the synthesizer, but that doesn't really matter to us.
Each channel should be stored separately, and some mechanism to join channels, together should be available. By default all of the percussion and drum-related channels should be associated, that way the algorithm can grab a block of these channels drop them into the song being generated, and have a drum riff. The internal representation of a voice can be of many forms.
There are many ways that this can be done. One method might be to use a bitstring of 32 or so steps per beat. The number of beats per measure, measures per section, etc, can be stored in other bitstrings.
The method that I think will work the best is to use a BNF tree and syntax to describe the music to be generated. Some research will have to be put into this to determine if it is in fact feasible. Some basic tests show that it very well might work.
This is the part that's the hardest to predict. Initial tests should be done with a simple user interface with 2 buttons, "I don't like this", and "Play this again later". The user will sit in front of the computer, which will be playing back rhythm riffs, or melodies, and will be the fitness computation. If there is no user input, then the system should assume that the part it is playing is acceptable to the user.
Initially, for the basic population, each of the different segments of the music should be evolved independently. That is to say that "good" melodies, bass lines, snare drum riffs, etc should be evolved independently, until it produces acceptable channels of these instruments.
After an acceptable amount of these parts are generated, then the system will have to learn what combinations of them are acceptable, then what grouping of them is acceptable, and so on up through the hierarchy to the song level, as described above.
There should be some amount of mutations within each run, so that there is not a single solution that the music will evolve to. It is okay for the generation to fall back on older patterns of music that worked well in this song, but it is also very important for there to be some variety within a song, otherwise, you will end up with the same "best" grouping of channels played over and over for five minutes. Although there are styles of techno dance music out there that follow this rule, they usually end up being quite boring to listen to.
Along with random generation, fractal generation of the music should be explored.
This is where the most experimenting and exploring will happen while the songs are being tested. Many different traditional crossover styles (single/multipoint, and uniform crossovers) should be applied on all levels of the stored music. That is to say that the crossovers are to be applied to each channel, measure, metameasure, and so on.
And, to reiterate what I said above, mutations are very important to have. Some tolerance for mutations (N mutations per part) must be found to keep the style of the music consistent while the mutations are applied to the the music.
The music engine will have to be able to run while crossovers and mutations are occurring for seamless playback/learning. Although, the initial learning process of learning the channels that sound good (before layering them together) does not have to be this way.
The optimum here is for future expansion. There should be "plug-ins" for inputs and outputs. You should be able to plug in a midi drum machine, and have the GA subsystem talk to it with no problem. You should be able to replace the GA subsystem with a music playback engine to play back pre-recorded sessions. There should also be some method for outputting to standard "sound wave" output files, so that good sessions can be put onto audio cd's.
Proof-of-concept.
This should be the basic interface shell. It should handle
a single channel bass "kick" drum, with the most basic of user interfaces.
It should have a very simple sound generation engine, capable of playing back
simple sound samples at specified volumes. This will also be a good testbed
for the listener feedback. If the method described does not work well, then
other methods can be tried at this stage, before things get too complicated.
Remaining percussion.
Add in the other elements to the percussion session. This will also require
adding in the multi-channel fitness testing, to listen to different channels
simultaneously, or alone to determine their fitness; unlike the above step,
where there was one and only one channel playing back.
Melodical elements.
This will probably be one of the more complicated subsystems to build. It will
reference much of the research John A. Biles has done. This is where we will add in
the interface and code for the melody generator and fitness interface. There
will have to be some sort of allowance for the user to select musical key,
chord progression, and so on. Or, if it were to be simplified, the user could
just select between "happy" and "sad" music, and it will automatically generate
melodies in Major and Minor keys, and so on.
This will use some simple, proprietary interface to the internal music synthesizer
already written for the above two phases for playback of the instrument to be
used for the melodical exploration.
Input and Output plug-ins.
This is where the midi interface, and code APIs will be developed for future
expansion of the sound of the system. In the very least, a MIDI output
type should be added in, so that external tone generators can be added onto the
system.
John A. Biles. GenJam web resource. http://www.it.rit.edu/~jab/GenJam.html
John A. Biles. GenJam: A Genetic Algorithm for Generating Jazz Solos. In Proceedings of the 1994 International Computer Music Conference, ICMA, San Francisco, 1994.
John A. Biles and William Eign. GenJam Populi: Training an IGA via Audience-Mediated Performance. In Proceedings of the 1995 International Computer Music Conference, ICMA, San Francisco, 1995.
John A. Biles, Peter G. Anderson, Laura W. Loggi. Neural Network Fitness Functions for a Musical IGA. In Proceedings of the International ICSC Symposium on Intelligent Industrial Automation (IIA96) and Soft Computing (SOCO96), March 26-28, Reading, UK, ICSC Academic Press, pp. B39-B44.
Martin Gardner. White and brown music, fractal curves and one-over-f fluctuations. Scientific American, 238(4), pp. 16-27, 1978.
John A. Biles. Composing with Sequences paper. http://www.it.rit.edu/~jab/Fibo98/
Claus-Dieter Schulz. The Fractal Music Project web resource. http://www-ks.rus.uni-stuttgart.de/people/schulz/fmusic/.