I hope you’re right about seeing that jump to doing more on GPUs. I hope that for a very specific reason: I work for Peakstream (www.peakstreaminc.com) and we specifically write software that schedules large matrix calculations on GPUs 
Our trick to good performance in these operations is pretty simple: do large, SIMD-type operations, and then have what amounts to a JIT compiler to get everything scheduled on one or more GPUs and/or CPUs. Well, okay, it’s simple to say. Writing it takes a bit more work.