How ‘slimmed-down’ large language models can reduce AI’s environmental and energy footprint

New research led by Professor Samin Aref (MIE) proposes better methods to reduce the environmental and energy impact of generative artificial intelligence systems like Large Language Models (LLM).
The research highlights the effectiveness of quantization, a technique that compresses LLMs by reducing the precision of their parameters, thus using less energy while maintaining almost intact performance.
Two conference papers, one winning the best paper award, detail these findings and explore compression methods through partial retraining and distribution alignment training.
Partial retraining compresses LLMs from 16-bit precisions to 3-bit and 2-bit precisions while minimizing performance loss using a specialized regularization term.
The use of distribution alignment training improves the recovery of losses in quantized language models by up to 20.37%, achieving better trade-offs between compression and accuracy.

You may have missed