Increase MSP432™ SPI Performance – Part 1
Currently I work on a 240×320 Pixel QVGA Display, connected via SPI. As I wrote this article I mentioned this was a bad idea.
Following calculation:
- 240 x 320 Pixel = 76.800 Pixel
- 16 Bit (Color Mode) * 76.800 Pixel = 1.228.800 Bit
- MSP432P401R SPI max frequency = 24MHz
- 24.000.000 / 1.228.800 = 19,53125
This makes a maximum Full Screen FPS of round about 20. So for the human eye it is to slow.
Ok, for now I can’t break this limit, but I can try to reach it.
Try One: use eUSCI ISR handler
void UCIA0IsrHandler(void) { switch(UCIA0IV) { case 0x0004: // UCTXIFG if (fill_count & 0x001) { UCIA0TXBUF = fill_data_h; } else { UCIA0TXBUF = fill_data_l; } fill_count--; if (fill_count) { return; } UCIA0IE &= ~(UCTXIE); break; } }
Whats the result of this: a bad one. I tried to count the CPU steps until the new byte is pushed to the transmit buffer.
The Disassembler shows 14 steps, I think with ISR Join and Leave I’m over 16, which results in a gap between each byte and the transmission is slower than 20 fps.
There are possibilities to optimize this code, but for me are 16 steps between ISRs too few.
Try Two: Synchronized transmission
#define SendSync(data) \ UCIA0TXBUF = data; \ while (UCIA0STAT & UCBUSY); void Fill(uint16_t color, uint32_t count32) { uint8_t a = color >> 8; uint8_t b = color; uint8_t slow = count32 & 0x07; if (slow) { do { SendSync(a); SendSync(b); } while (--slow); } // x8 unrolled int count = count32 >> 3; if (count) { do { SendSync(a); SendSync(b); SendSync(a); SendSync(b); SendSync(a); SendSync(b); SendSync(a); SendSync(b); SendSync(a); SendSync(b); SendSync(a); SendSync(b); SendSync(a); SendSync(b); SendSync(a); SendSync(b); } while (--count); } }
This results in 8 Steps for each byte and with a bit logic you reach the 20 fps.
What comes next?, decrease the code size. This can be done using Assembler.
The next part, I hope will contain the Fill function in asm.
First summary
At first I would say: what the heck, (TI) you build a 32Bit ARM Core based on MSP430 but there is only a 8Bit SPI Interface. Why don’t use in addition a transfer mask, that makes it variable in how meany bits to transfer. In my case I could set a 16bit mask until the data can be divided by two and then I can ship 32bit data.
This could make the use of ISR more efficient, because this creates a gap of 64 CPU Steps, which are enough to do other stuff.