ASF Performance
The other day I noticed in my news feed that Sparkfun had a great post talking about "Performance" of the Arduino. While the post itself was a great start, the comments really added to the conversation. The short version is they tested toggling a pin using the Arduino digitalWrite command. If a user simply reads the specs of the ATMega328P (Arduino microcontroller) it's easy to take the 16 MHz clock speed and the estimate of 1 instruction per clock cycle and think the pin should toggle close to 8 MHz.There are several reasons this number isn't what you would think.
- The C compiler adds additional instructions that may not be needed.
- Multiple checks and other actions are performed before the pin state is set.
- Instructions for managing the loop
The reality in their tests result was 117 kHz or 1/130th the speed expected. This suggests that there were approximately 130 assembly language instructions executed for each on/off cycle of the pin.
Before we get much further I'd like to clarify that this isn't a real world use case. For the simple action of toggling a pin at any speed the job is generally given to a built-in timer. In the case of the XMega we previously covered the ability to output the actual oscillator speed on a couple of supported pins (depends on XMega model how many and which ones).
The real point here is to examine the Atmel Software Framework (ASF ) on the XMega and compare the overhead to the Arduino. Also we can explore some techniques to improve this situation.
First step is a "New Project .." "GCC C ASF Board Project", etc. For these tests we are going to set the clock speed to match the Arduino aka 16 MHz. See this post and based on your hardware use either the external crystal or the internal 32 MHz clock, both will require the use of PLL to achieve 16 MHz.
I'm going to compromise a little on the ASF code location standards and put everything in 'main.c' this is to be clear about the code being tested. The clock code remains where it is supposed to because that wont be changing over the course of these tests. My code in 'main.c' now looks like this:
#include
int main (void)
{
#define TOGGLE_PIN IOPORT_CREATE_PIN(PORTD, 6)
#define FREQ_OUT IOPORT_CREATE_PIN(PORTD, 7)
ioport_set_pin_dir (TOGGLE_PIN, IOPORT_DIR_OUTPUT);
ioport_set_pin_dir (FREQ_OUT, IOPORT_DIR_OUTPUT);
PORTCFG.CLKEVOUT = PORTCFG_CLKOUT_PD7_gc;
sysclk_init();
for (;;)
{
ioport_set_pin_level(TOGGLE_PIN, true);
ioport_set_pin_level(TOGGLE_PIN, false);
}
}
In this code you can see the extra 3 lines of initialization that I use to output the actual clock frequency out on Pin D7. This is for a quick validation of the operating frequency and should have no impact on the performance test in the for loop.
Actual clock frequency output is 15,999,883 or 16 MHz
The Frequency Counter shows the toggle speed on Port D6 is 2.6 MHz. Oscilloscope showed 3 MHz.
This is a great result but not exactly what I was expecting. My expectation was probably 1 MHz or less. So now lets work on some theories as to why.
The two pieces of test equipment show different values, this isn't usually a good sign. In this case we can assume that they use different methods for counting cycles.
Let talk about what the signal on the pin should look like. In it's simplest form the code is:
Loop -> High -> Low->Loop->High->Low
The Loop statement is really a low not knowing how many instructions are being executed it's still reasonable to say that the pin will be Low for twice as long as it's High. So this isn't really the type of wave that a frequency counter is good at counting.
Since the results weren't what I was expecting I changed the for loop to see what would happen.
for (;;)
{
ioport_toggle_pin_level(TOGGLE_PIN);
}
This code should give us equal High and Low times. Now both the O-Scope and the Frequency Counter show 2 MHz even. So less code, provides a lower toggle speed?
This is only a theory, I'm sure someone smarter than me can confirm it or explain it better. When the two toggle instructions occur inside the loop the counting algorithms count the time between the high's since the wave isn't sinusoidal it leans toward the smaller of the two values.
Improve Performance
Originally I was going to play with the code and see if it could be optimized to toggle the pin any faster. While researching the unexpected behavior I opened the '\Output\*.lss' file which contains the assembly source code with c source code as comments. What I found was well optimized code, 4 assembly instructions for the 2 lines toggle and 3 assembly instructions for the single line toggle code. In both cases one of the instructions is a jump which is defined to take 2 instruction cycles. I don't know if it could be done better, I couldn't do any better manually assigning registers and the loss of code readability is too valuable for me to mess with it.