The Rising Importance of Inference in Large-Scale Machine Learning Systems
In the ML development cycle, Experimentation involves designing and evaluating model architectures and training methods, requiring substantial computational power. At Meta, 50% of training experiments take up to 1.5 GPU days, while 99% take up to 24 GPU days, with large-scale models exceeding 500 GPU days. Once a promising model is selected, it moves to Inference, where it generates trillions of predictions daily for billions of users. The compute cycles required for inference have now surpassed those for training, highlighting its growing importance in large-scale ML systems.
Carbon Footprint in Industry-Scale ML
The carbon footprint of industry-scale machine learning (ML) models, such as those used by Meta, consists of both operational and embodied emissions. Meta’s operational carbon footprint for models like its Transformer-based Universal Language Model (LM) and recommendation models (RM1-RM5) varies, with inference often dominating the emissions, especially for LM (65% inference vs. 35% training). Despite renewable energy offsets, the embodied carbon cost of AI hardware (such as GPUs) remains significant, accounting for around 50% of the total footprint. Continuous optimization has improved power efficiency, reducing operational emissions by 20% every six months.
The environmental impact of AI development highlights the need for sustainable practices, especially in reducing operational and embodied carbon costs. As AI continues to grow, optimizing energy usage and leveraging renewable energy will be critical to reducing its carbon footprint and ensuring more sustainable innovation in the industry.