2018-12-12

It has been over three years since I last looked at this, but I finally decided to pick it up again because I have a few days off work. I spent most of the time so far reading through the scattered online resources I could find, trying to understand more about how speech production works using the source filter model. While I have gained a bit more understanding of the theory behind it, I am far from the goal I want to reach.

I did a fair amount of refactoring as well. Instead of main.cpp containing the bulk of the implementation, we now have something which is beginning to look like a speech synthesis class. I also made it possible to generate white noise in addition to the sawtooth pulse, and I made it easier to reconfigure things such as the fundamental frequency, the vibrato and the filter parameters on the fly. main.cpp is a simple shell which generates an audio file just as before, except that main.cpp has no notion whatsoever of what the implementation is doing (as well it should not). The shell does generate a Wave file now though, rather than a raw dump of the 32 bit floats so it is a bit easier to listen to.

A very interesting discovery that I made just a few minutes ago is that I was doing the filtering all wrong. I was doing it in parallel, but I really should be doing it in a cascading fashion. That is to say, previously I was running the filters one by one on the original source and then summing the outputs of the filters, while what I should have been doing all along is to have the output of the first filter be the input of the second filter and so forth. It makes a huge difference!

Here is what it sounds like when done in parallel:

Listen

Very close to what we had before, except for the fact that the code no longer performs the pitch bend in the middle of the audio. I also made it a bit quieter (it was hurting my head).

And here is what it sounds like when running the filters in cascade:

Listen

Now it *almost* sounds like something resembling a person! Hurray!