- Published on
At Google I/O Extended Newport Beach, I watched Gemma 4 go from a 147-second response down to a few seconds — with no new hardware, just five inference optimization techniques stacked on top of each other. Here is what each one actually does, in plain language.