And why not dropping the slow cluster approach and doing this with a NVIDIA Jetson TK1 on the 192 CUDA cores using GPGPU acceleration at a fraction of costs/consumption?
When A83T has to run multithreaded workloads it needs a lot of power and heat dissipation becomes a problem: http://linux-sunxi.org/Banana_Pi_M3#Sudden_shut_offs_.2F_maximum_consumption_.2F_cooling_vs._consumption