Writings

This multi-post series goes over my learnings in trying to optimize inference latency of the recent Diffusion Policy paper out of Toyota Research Institute. I dive into intracacies of GPU architecture and apply these learnings to speed up the U-Net from the TRI paper.