2+ years ago, when toying with new and personally exciting ways of making music, I would have laughed if you told me that Google and Nvidia would provide such incredible studio assets. A lot of this is probably because the music production industry has been incredibly underwhelming from a technological point of view in the last decade, and I don’t blame them. Why spend money on R&D introducing a learning curve when you can serve up a stylish recreation that will sit next to 4 other recreations of the same classic synthesizer? After all, there is no real push to innovate anymore. In the early days of the DAW, there was a climb to support more tracks, sample more realistically, and match, in hopes to replace, the reverbs and tones of expensive rack gear. So companies once on the cutting edge have gone the path of least resistance, iPad apps and retro synthesizer clones. Take the Yamaha Montage for example. Anyone who understands the basics of FM synthesis can glance at the specs and note that this thing goes where no VST plugin or synthesizer has gone before. But despite its amazing power for the price, its rare to see in the wild. I don’t have Yamaha’s sales figures, but I fear this might be a lesson for them to dump their resources into making some apps or MIDI controllers in the future. I hope not.
This isn’t to say there isn’t interesting stuff coming out here and there. There certainly is. But it’s a far decline from the awe-inspiring 1990’s and 2000’s where the announcement of new gear would inspire a whirlwind of personal ideas. So I’m on my own, kind of.
My interest in software/hardware and development isn’t some kind of selfless journey to keep music production technology moving forward, but rather a natural (if not selfish) way to keep myself entertained and fulfilled as a producer. My last release was mostly made on buggy, 32-bit homemade software, and previous to that, I was so sick of looking at a DAW session that I just wrote piano songs day in and day out.
Last year I got obsessed with 3 things. Modular synthesizers, robotics, and neural networks. When Google dropped the source code for Deep Dream, I began to fantasize about not how this could make music production more efficient, but how it could be exploited and fucked with. Parang Mital made an excellent Kadenze course allowing my caveman brain comprehend cutting edge software development, and I fell deep into the endeavor of using artificial intelligence to make melodies and sounds that will be new to my ears.
As with anything, you have to do some boring stuff before you can go crazy. I had my computer “learn” from just about everything Bach ever wrote and compose an original (thanks to the “Classical Archives”). Then, using Tensorflow, I went on a bender MIDIfileizing a large majority of my music, and attempting to recreate myself both melodically and rhythmically until I enjoyed listening to the results. While this locked in some distant-future plans of having an A.I. alias that automatically writes music, I’m more immediately fascinated with neural synthesis, which not surprisingly Google’s Deepmind team beat me to by about 4 weeks with nSynth, which is strikingly close in concept.
If you know what a vocoder is, then I can explain not so much how this works, but a metaphor for the result. Your average vocoder has a carrier (synthesizer oscillator) and a modulator (usually a human voice). In modern vocoders, when you carefully allow the modulator to alter the fast fourier transform spectrum of the carrier, it sounds like your synthesizer is speaking (Daft Punk is probably the most notable artist to use this in popular music, but far from the first). A limitation that most artists likely don’t even think about, is that you’re limited to one carrier and one modulator, otherwise you’ll just have nonsense. But there are certainly many casual uses of shooting non-vocal elements into the modulator input of a vocoder, my favorite and most obvious example is Datassette’s Flechte.
So firstly, let’s give the neural network some stuff to study (dare I say modulators). I’m including short clips of much, much larger data sets (the use of that word was not on purpose, how about that).
Here’s some mandolin and marimbas. It’s my melody, with a totally chaotic rhythm:
Now, we need some form of rhythm. Nothing too boring, and nothing too exciting (since the above data set is chaotic). So here’s an annoying sounding Goldilocks:
Next, I want this to sound unlike a bunch of stupid drum machine sounds merged with mandolins and marimbas, so I’m going to pump in a saw wave. A bonus to having a single monophonic tone is that I can run it through and modulate a low-pass filter, which allows me to control the intensity of the final output:
And finally, with a lot of fine tuning, we have the A.I. spectral firstborn of these data sets:
You were probably expecting something way cooler, weren’t you? That’s okay. I’m just happy that I’ve never heard a breathing, organic arpeggiator sound until I ran this one of many exploratory sessions into the wild world of sound synthesis now available to anyone who wants to get past a few learning curves.
In other words, I’m sorry that my next album won’t be dubstep wobbles with eastern/ethnic instrument loops plopped on top. Like the aforementioned music production industry, I’d probably make a hell of a lot more money on the path of least resistance.
By the way, I might be the only recording artist doing this (though maybe not!), but I’m far from the only person diving into neural synthesis. Here’s some links:
A great article: https://deepmind.com/blog/wavenet-generative-model-raw-audio/
A profound dissertation about a fellow who trained a data set by having it watch Blade Runner, then had it recreate Blade Runner from memory: http://research.gold.ac.uk/19559/1/Autoencoding_Video_Frames.pdf