and saving ~1k memory
by limiting the `#pragma GCC optimize (3)` optimisation to `ultralcd_st7920_u8glib_rrd.h`. These optimisation was and is not done for all the other displays, is the reason for the big additionally use of memory, because the complete 'ultralcd.cpp' and 'dogm_lcd_implementation.h' was optimised (sadly i did not observe a change in speed).
Unrolling the loop in `ST7920_SWSPI_SND_8BIT()`, what i expected the optimiser to do, by hand, saved some speed by eliminating the loop variable (i) compares and increases. Every CPU cycle in this loop costs at least 0.5ms per display update because it's executed more than 1k times/s.
The delays are now pre-filled with the calculated values for 4.5V driven ST7920.
A way to simply add __your__ timing into the configuration was made.
At 4.5V
1.) The CLK signal needs to be at least 200ns high and 200ns low.
2.) The DAT pin needs to be set at least 40ns before CLK goes high and must stay at this value until 40ns after CLK went high.
A nop takes one processor cycle.
For 16MHz one nop lasts 62.5ns.
For 20MHz one not lasts 50ns.
To fulfill condition 1.) we need 200/62.5 = 3.2 => 4 cycles (200/50 = 4 => 4). For the low phase, setting the pin takes much longer. For the high phase we (theoretically) have to throw in 2 nops, because changing the CLK takes only 2 cycles.
Condition 2.) is always fulfilled because the processor needs two cycles (100 - 125ns) for switching the CLK pin.
Needs tests and feedback.
Especially i cant test 20MHz, 3DRAG and displays supplied wit less than 5V.
Are the delays right? Please experiment with longer or shorter delays. And give feedback.
Already tested are 5 displays with 4.9V - 5.1V at 16MHz where no delays are needed.