Identifying a Tone (Sine Wave) in iOS with the Accelerate Framework

Introduction

If you just want the code:

ToneReceiver.m – The finished code for this post

Handshake – The complete project

I found a lot of people had the same question that I did, “How do you identify a frequency in <insert language here>?” The answer was usually the same: use a fast Fourier transform. It’s even built in to iOS.

Fast Fourier Transforms

While I’ve learned more physics programming for my phone than I used in college, I did have physics classes. Fast Fourier transforms weren’t mentioned in any of my math classes because they’re primarily used in electrical engineering.

The Wikipedia entry lost me. I figured out how to use them before I reasoned how they (probably) work. I figured out that with the Accelerate framework I could use vDSP_fft_zrip with an array of samples to get an array of intensities at particular frequency ranges. The maximum value in that array corresponds to the strongest frequency range.

A quick note on the Accelerate framework

The Accelerate framework has a number of functions for digital signal processing. For speed, these are C functions which must be bridged so they can be used from Objective C. The fast Fourier transform functions are just some of what the framework provides. There are functions for taking integrals, derivatives, the Fourier transforms I’m using, and other processing that I’ve yet to learn.

Packing the Data

The Fourier functions all operate on real, complex arrays and the microphone provides real data.

//Convert the microphone data to real data
float *samples = malloc(numSamples * sizeof(float));
vDSP_vflt16((short *)inSamples, 1, samples, 1, numSamples);

//Convert the real data to complex data
// 1. Populate *window with the values for a hamming window function
float *window = (float *)malloc(sizeof(float) * numSamples);
vDSP_hamm_window(window, numSamples, 0);

// 2. Window the samples
vDSP_vmul(samples, 1, window, 1, samples, 1, numSamples);
      
//3. Define complex buffer
COMPLEX_SPLIT A;
A.realp = (float *) malloc(halfSamples * sizeof(float));
A.imagp = (float *) malloc(halfSamples * sizeof(float));
      
// Pack samples:
vDSP_ctoz((COMPLEX*)samples, 2, &A, 1, numSamples/2);

vDSP_fft_zrip is an in place function, so the number of frequency ranges is exactly the same as the number of samples fed in, which works best if everything is a power of two. The iPhone takes 44,100 samples per second and 1024 is a nice power of two. So for 1/43 of a second I can identify the 43 Hz bucket that has the strongest frequency. Not good enough for a tuner, but good enough to communicate. If I wanted a greater resolution I could just take more samples. For this proof of concept, a 43 Hz resolution is enough.

// Setup the FFT
// 1. Setup the radix (exponent)
int fftRadix = log2(numSamples);
int halfSamples = (int)(numSamples / 2);
// 2. And setup the FFT
FFTSetup setup = vDSP_create_fftsetup(fftRadix, FFT_RADIX2);

And at the heart of the function, perform the fast Fourier transform.

// Perform a forward FFT using fftSetup and A
// Results are returned in A
vDSP_fft_zrip(setup, &A, 1, fftRadix, FFT_FORWARD);
      
// Convert COMPLEX_SPLIT A result to magnitudes
float amp[numSamples];
amp[0] = A.realp[0]/(numSamples*2);
      
// Find the max
int maxIndex = 0;
float maxMag = 0.0;
      
// We can't detect anything reliably above the Nyquist frequency
// which is bin n / 2 .
for(int i=1; i maxMag)
   {
      maxMag = amp[i];
      maxIndex = i;
   }
}

Recording

Apple provided the delegate which is called when the audio buffer was full. This class just has to implement AVCaptureAudioDataOutputSampleBufferDelegate.

-(void)start
{
   AVAudioSession *session = [AVAudioSession sharedInstance];
   [session setActive:YES error:nil];
   
   self.captureSession = [[AVCaptureSession alloc] init];
   AVCaptureDevice *device = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];
   AVCaptureDeviceInput *input = [AVCaptureDeviceInput deviceInputWithDevice:device error:NULL];
 
   [self.captureSession addInput:input];
   
   AVCaptureAudioDataOutput *output = [[AVCaptureAudioDataOutput alloc] init];
   dispatch_queue_t queue = dispatch_queue_create("Sample callback", DISPATCH_QUEUE_SERIAL);
   [output setSampleBufferDelegate:self queue:queue];
   [self.captureSession addOutput:output];
   
   [self.captureSession startRunning];
}

And what gets called when the buffer is full:

- (void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
       fromConnection:(AVCaptureConnection *)connection

And a Note on a Hack

I had a problem getting the sample buffer to be consistent. The first time I started recording I’d get 4096 samples, and all subsequent times I’d get 1024. For consistency I implemented a bit of a hack.

- (void)totalHackToGetAroundAppleNotSettingIOBufferDuration
{
   self.captureSession = [[AVCaptureSession alloc] init];
   AVCaptureDevice *device = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];
   AVCaptureDeviceInput *input = [AVCaptureDeviceInput deviceInputWithDevice:device error:NULL];
   
   [self.captureSession addInput:input];
   
   AVCaptureAudioDataOutput *output = [[AVCaptureAudioDataOutput alloc] init];
   dispatch_queue_t queue = dispatch_queue_create("Sample callback", DISPATCH_QUEUE_SERIAL);
   [output setSampleBufferDelegate:self queue:queue];
   [self.captureSession addOutput:output];
   
   [self.captureSession startRunning];
   [self.captureSession stopRunning];
}

Sending the Result

It’s kind of anticlimactic after the trouble of figuring everything out sending the result back is a perfect case for a simple delegate.

NSNumber* toSend = [[NSNumber alloc] initWithInt:maxIndex];
      
if (self.delegate)
{
   [self.delegate didReceiveTone:toSend];
}

Where I Screwed Up

I screwed up enough making this class that I have enough content for a blog post on its own. So it’s going to get one.

Future Work

This code identifies the maximum frequency bucket across the entire range that the iPhone can receive. For a proof of concept that’s fine, but in production I’d likely look for local maximums across the frequency range that I was interested in and compare those to the maximums over the rest of the data.

I’d also like to remove that hack, but again this is a proof of concept.

Playing a Pure Tone (Sine Wave) in IOS

Introduction

If you just want the code:

ToneGenerator.m – The finished class for this post

HandShake – The complete project

Most programming languages have some variant of: Beep(frequency, duration);

Objective C does not. Part of this is the nature of iOS devices: an interruption can happen at any time meaning there’s no way to guarantee that the call will have enough time to complete. To produce a tone on demand, the programmer must fill the audio buffer with the tone data and the device will play the data when it can.

Audio Player Setup

I got most of the setup code from http://christianfloisand.wordpress.com/2013/07/30/building-a-tone-generator-for-ios-using-audio-units/. An explanation is also provided there so there’s no need to restate it here.

Rendering the Audio Data

Forgetting my high school physics completely, I expected a constant frequency to need constant audio data. I expected a tone of 440 Hz to look something like, “[440, 440, 440, 440, 440]“.

A pure sound is a pure wave. Generating a wave is easy using the sine function. I started with code from http://www.cocoawithlove.com/2010/10/ios-tone-generator-introduction-to.html, but had to update it for portability (see “Where I Screwed Up #2″).

OSStatus RenderTone(
                    void *inRefCon,
                    AudioUnitRenderActionFlags 	*ioActionFlags,
                    const AudioTimeStamp *inTimeStamp,
                    UInt32 inBusNumber,
                    UInt32 inNumberFrames,
                    AudioBufferList *ioData)

{
	// Fixed amplitude is good enough for our purposes
	const double amplitude = 1;
   
	// Get the tone parameters out of the class
	
        ToneGenerator *toneGenerator = (__bridge ToneGenerator*)inRefCon;
        double theta = toneGenerator->_theta;
        double frequency = toneGenerator->_frequency;
	
	double theta_increment = 2.0 * M_PI * frequency / SAMPLE_RATE;
   
	// This is a mono tone generator so we only need the first buffer
	const int channel = 0;
	Float32 *buffer = (Float32 *)ioData->mBuffers[channel].mData;
	
	// Generate the samples
	for (UInt32 frame = 0; frame < inNumberFrames; frame++)
	{
		buffer[frame] = sin(theta) * amplitude;
		
		theta += theta_increment;
		if (theta > 2.0 * M_PI)
		{
			theta -= 2.0 * M_PI;
		}
	}
	
	// Store the theta back in the object
	toneGenerator->_theta = theta;
   
	return noErr;
}

Originally I didn’t save theta as a class variable. The result was that the wave never completed a cycle and instead of a beep it sounded like a click.

Playing the Tone

The class now has enough in it to start playing a tone.

   [self.audioSession setActive:true error:nil];
   self.frequency = frequency;
   // Create the audio unit as shown above
   [self createToneUnit];
   
   // Start playback
   AudioOutputUnitStart(_toneUnit);

To keep track of time, I cheated and just slept the thread using [NSThread sleepUntilDate:date].

To stop playing all that is necessary is just tearing down the toneUnit.

   self.frequency = 0;
   self.theta = 0;
   AudioOutputUnitStop(self.toneUnit);
   AudioUnitUninitialize(self.toneUnit);
   AudioComponentInstanceDispose(self.toneUnit);
   self.toneUnit = nil;

For completeness, the stop handler and interruption handlers should do the same thing.

Where I Screwed Up #1

When I settled on my naive encoding, I figured I’d just assign a tone per bucket based on the ASCII value of the character to send. Since I wanted them all to be inaudible I’d make sure all sent tones were above 19 kHz.
frequency = 19000 + (43 * (int)charToSend);

I didn’t realize until later that the lowest ASCII value I was sending was the ‘.’ character with a corresponding frequency of 20,978 kHz. As far as I can tell, that’s 978 Hz above the rated max of my iPhone speakers. It actually still works for values under ’5′ (ASCII 53) but I shouldn’t expect that.

It works for the proof of concept though. Production uses would require a better encoding scheme— for this and other reasons.

Where I Screwed Up #2

A lot of the code samples I used were before automatic reference counting. In the beginning the code crashed after 1/43 of a second (one audio frame at 44100 Hz).

Through some trial and error I managed to not get through one entire tone, but all future tones wouldn’t play correctly— my cats really hated these problems.

Automatic reference counting turned out to be the culprit. I wasn’t correctly casting the ToneGenerator in the RenderTone function so theta wasn’t properly saved and new tone units would not use fresh variables. The solution turned out to be the __bridge cast in the RenderTone function.

Resources

I found a number of helpful blog posts:

Where’d Matt Go?

When I started this blog, I intended to alternate between process reflections and technical posts. That fell apart.

The falling was precipitated by a team assignment change. I didn’t change companies— I didn’t have to— but as often happens in a software company I changed technology stacks. I went from a web team on the Microsoft stack to an iOS team writing in Objective-C.

Very few fields exemplify “learn or die” as much as a tech company. I’m always happy to learn a new skill but the time had to come from somewhere. Instead of reflecting I binge-watched Stanford’s iOS lecture series. Instead of pushing the limits of CSS I was spent writing practice iPhone apps. When I just about had a handle on Objective-C, Apple made an unexpected announcement.

It wasn’t until I watched Google I/O that I realized I learned enough to write a demo app. Google announced an innovation that seemed worth porting to iOS: pairing devices using nearly ultrasonic tones.

To prevent what happened last time, I’m calling my shot:

  1. Part 1: Playing a pure tone (sine wave) in iOS
  2. Part 2: Identifying a frequency in iOS using the Accelerate Framework
  3. Part 3: Troubleshooting EXC_BAD_ACCESS and memory leaks in XCode
  4. Part 4: Putting it all together

The writing for computers is done. Writing for people soon to come.