We present a new gravitational octree code on GPU that adopts a block time step. It uses adaptive optimizations by monitoring the execution time of each function. The code achieves a 3–5 fold acceleration compared to the shared time step method. The averaged performance of the code is 10–30% of the theoretical peak performance.