1. The AI Gold Rush: Why Investors Are Pouring Money into the AI Economy
  2. How One Data, an AI software company, secured its Kubernetes environment using Kasten K10.
  3. Nvidia Unveils Ada Lovelace GPU Architecture, OVX Systems, and Omniverse Cloud
  4. AWS Announces Significant Bedrock Update: Expanded AI Models and Improved Flexibility for Users
  5. Combining AI and Human Expertise to Drive Growth in Banking
  6. Maximizing Security in Microsoft Azure: Advice for IT Administrators
  7. How to strike a balance between digital transformation and day-to-day operations
  8. VMware introduces developer, data, and security services for sovereign clouds.
  9. Cohesity integrates Intel’s confidential computing capabilities into Cohesity Data Cloud.
  10. How to Automate Your Infrastructure Management
  11. AI Video Platform Synthesia Secures $90M in Series C Funding, Backed by Nvidia
  12. Future-proof your business with cloud storage that’s sustainable for the planet
  13. Improving Your IT Infrastructure: The Right Time and Way to Integrate Refurbished Hardware
  14. Mining Companies Explore Repurposing Unused GPUs for HPC and AI Applications
  15. A Revenera survey reveals that insights into product usage drive the creation of better roadmaps.
  16. How Retailers Can Identify Fake AI
  17. Using Large Language Models to Predict Financial Markets
  18. The Future of Accounting: How AI is Revolutionizing Financial Operations
  19. Achieving successful digital transformation: Key lessons learned – from decisions to data.
  20. Sundar Pichai’s Interview Sparks Discussion on the Real Abilities of AI Chatbots
  21. Eseye: Ongoing Challenges in IoT Connectivity and Security
  22. Europe’s €200 Billion AI Investment: A Major Move for the Future
  23. Hammerspace Secures $56M to Redefine Data Orchestration
  24. Harnessing the Potential of Generative and Predictive AI in Marketing
  25. Apple scraps data protection tool for UK customers ChatGPT said: Apple cancels data protection tool for customers in the UK.
  26. AWS Brings the AI Heat: Project Rainier and GenAI Innovations Take the Lead
  27. Seamless Digital Strategy and Business Transformation
  28. GlobalData: Key Advances That Will Drive the Metaverse’s Success
  29. Shaping the Future of AI Systems at Meta
  30. VMware Introduces New Generative AI Tools and Strengthens Partnership with Nvidia
  31. The Alps Scientific Symposium Shines a Spotlight on AI’s Potential to Address Major Scientific Challenges
  32. Quantinuum Unveils Gen QAI: A Generative Quantum AI Framework
  33. Ververica Enhances Advanced Stream Processing Technology with the Launch of the ‘Powered By Ververica’ Program
  34. Turning Data Into Insight: Key Tips for Building a Strong Data Strategy
  35. SambaNova Unveils Next-Generation DataScale System
  36. IT leaders are increasingly opting for hybrid cloud strategies because of their flexibility, cost efficiency, and enhanced security.
  37. NSF Grants SDSC Funding for NAIRR Pilot Research on Nvidia’s DGX Cloud
  38. Cerebras Breaks Record in Molecular Dynamics with 1.1 Million Simulations per Second
  39. Edge SIM Introduced to Link IoT Devices with Cloud Providers Across 180+ Countries
  40. 5 Strategies to Optimize IT Projects
  41. Shaping Healthcare Workflows: The Impact of AI Agents Beyond Automation
  42. How Hyve Managed Hosting Guaranteed a Seamless Voting Experience at the National Television Awards
  43. A 30-Year Journey to Overnight Success: The Parallel Growth of AI and Quantum Computing
  44. Nvidia’s Compact Desktop AI Box with Powerful Unified GPU/CPU Memory
  45. The Future of AI in Medical Billing Auditing
  46. UK-based cloud consultancy Rebura has been acquired by global technology distributor Westcon-Comstor.
  47. Here are five strategies for parents to guide their children in using AI responsibly
  48. Intel Labs Unveils ‘Kapoho Point’ Board Powered by Loihi 2 Technology
  49. The Future of IoT in 2025: Digital Twins, Mesh Networks, Virtual Reality, and Beyond
  50. Transforming Insurance Premium Payments: From Legacy Systems to AI-Driven Solutions
  51. Nvidia Takes a New Approach to the Top500 List, Shifting Focus Away from GPUs
  52. N-able is growing its Technology Alliance Program to create a more open ecosystem designed specifically for Managed Service Providers (MSPs)
  53. How Sun Chemical achieved infrastructure cost savings of over 50% during acquisitions.
  54. Nurturing the Cycle of Innovation: The Role of HPC, Big Data, and AI Advancement
  55. Brian Cerchio of Losant: Unlocking the Full Potential of IoT
  56. Genesys Acquires Radarr Technologies to Enhance Customer Experience Integration
  57. 6 Common Mistakes CTOs Make When Leading Teams in the Early Stages of a Project
  58. The IT Support Guide: A Complete Resource for Understanding and Hiring IT Support for Your Business
  59. Why Content Professionals Should Fully Embrace AI Without Hesitation
  60. OpenAI Unveils New Initiative to Address Risks of ‘Superintelligent’ AI
  61. Creating custom document solutions with Fluent: A developer’s viewpoint.
  62. How AI and Quantum Computing Will Work Together: Insights from Quantinuum
  63. Raising the Bar: Why Quality and Service Excellence Are Essential for Success in Today’s Business World
  64. Google Cloud’s 2025 AI Trends: The Future of Search, Customer Experience, and Security
  65. NVIDIA GTC Highlights: The Future of Data Centers and Strategic Cloud Partnerships
  66. How AI is Transforming Scientific Research
  67. Minima and Inferrix Join Forces to Strengthen Security for Millions of Connected Devices
  68. Digital Transformation Approaches for CIOs of Midsize Enterprises
  69. The University of Texas at San Antonio has been awarded a $4 million grant by the National Science Foundation (NSF) to fund a cutting-edge neuromorphic computing initiative
  70. Is ChatGPT Losing Its Edge?
  71. NetApp has partnered with Google Cloud to enhance flexibility in cloud data storage.
  72. Countdown to Compliance: Getting the Financial Sector Ready for DORA and Responsible AI
  73. AI for Everyone: How Technology Can Promote Inclusivity
  74. DataQube Partners with NodeWeaver to Deliver Complete Edge Cloud Solutions
  75. IoT Fuels Digital Transformation
  76. NTT DATA and Google Cloud Strengthen AI Partnership Across the Asia Pacific Region
  77. Empowering IT leaders to drive business growth and future-proof the organization
  78. IT Systems in the Public Sector for Enhanced Operational Efficiency
  79. Luis Mirabal, Globalstar: How Satellite IoT Can Boost Efficiency and Cut Costs
  80. Oracle and AWS have teamed up to bring Oracle Database to the AWS cloud.
  81. The Great 8-bit Controversy in Artificial Intelligence
  82. Teradata will provide cloud analytics services to the Los Angeles Clippers and the Intuit Dome.
  83. Salesforce introduces Einstein Copilot for Tableau.
  84. Microsoft’s compact, palm-sized chip is paving the way for practical quantum computing, bringing this cutting-edge technology closer to reality than ever before.
  85. GigaIO’s New SuperNode Achieves Unprecedented AMD GPU Performance
  86. Google Playfully Pokes Fun at Nvidia’s Blackwell as It Eases TPU Competition
  87. HPE Acquires Pachyderm to Enhance Reproducibility in Machine Learning
  88. Revenera introduces a new monetization analytics dashboard.
  89. Variations in IT Service Providers Across Regions
  90. Over 40% of companies lose revenue due to technology downtime and cloud complexity.
  91. Kyndryl has partnered with Veeam to provide robust cyber resiliency solutions.
  92. GoodData announces a major update to FlexQuery, its groundbreaking analytics engine.
  93. aicas has introduced the Edge Device Portal, and is now accepting applications from pilot customers
  94. Ververica enhances its advanced stream processing technology with the launch of the “Powered By Ververica” program.
  95. AI Will Replace Jobs – And That’s a Positive Change
  96. Intel Discontinues Its ‘Blockscale’ Bitcoin Chip
  97. Generative AI Paving the Path for Local SEO
  98. How One Data Safeguarded Its Kubernetes Environment Using Kasten K10
  99. The Future of Cell Therapy: How AI is Paving the Way for New Advances
  100. Quantum and AI: Realistic Synergy or Just Hype?
  101. The Impact of Deep Learning on Finance
  102. Madoc Batters of Warner Hotels highlights that the most significant hurdle is driving meaningful change
  103. The world’s first bio-circular data center has been launched, using algae to generate energy
  104. The Growing Importance of Cybersecurity in the Age of Artificial Intelligence
  105. AI drives a nearly 30% rise in IT modernization spending, yet companies are unprepared for the data demands.
  106. Marc Andreessen Claims AI Will Be the Key to Saving the World
  107. 5 Common Digital Transformation Challenges and How CIOs Can Address Them
  108. Vonage and AWS utilize communication and network APIs to provide innovative solutions
  109. My micro wave happy sunday from Expand is too small to fit Expand.
  110. Embracing the Digital Revolution: Transforming Processing Services and Evaluating the Future
  111. Google, OpenAI, Microsoft, and Anthropic Join Forces to Launch the Frontier Model Forum for Responsible AI
  112. Fewer than 20% of IT professionals believe that cloud infrastructure fully meets their needs
  113. AI-powered clouds for achieving business goals and optimal results
  114. The DCIM software market is projected to grow to $3.63 billion by 2029
  115. Nokia Enhances IoT Capabilities for Industrial Edge Applications
  116. In EMEA, half of cloud expenses are spent on fees, yet the majority of businesses still plan to expand their cloud capacity
  117. ByteDance’s Strategy for AI Chip Access Raises Concerns About Export Control Effectiveness
  118. Particle Tachyon: Bringing the Power of Smartphones to IoT
  119. Sam Altman Confirms GPT-5 Won’t Launch This Year in Reddit AMA
  120. Google Unveils TPU v5e AI Chip Following Controversial Background
  121. Qlik introduces Data Flow to speed up the Data-to-Decisions process in Qlik Cloud Analytics.
  122. The Potato Principle: Why It’s Important for Everyone to Understand
  123. Generative AI is becoming a standard feature in cloud business models, with Azure leading the charge.
  124. Fire Safety Company Resolves Customer Service Challenges with Sabio’s AI-Driven Analytics Solution
  125. Challenges and Approaches for CIOs in the Digital Economy
  126. Leveraging AI in Property Management
  127. ROI for IoT and Edge Computing on the Rise, with Open Source Playing a Key Role
  128. A Decade-Long Market Analysis and Forecast for Liquid Cooling
  129. Nvidia has unveiled new enterprise reference architectures designed to help businesses build AI-driven operations, or “AI factories.”
  130. Transforming Finance: How Generative AI is Enhancing Operational Efficiency
  131. The Digital Operational Resilience Act: Compliance is Just the Beginning for Banks
  132. SAS extends its hosted managed services to AWS
  133. Operational Risks: Why the Banking Sector Remains Cautious About AI

Artificial intelligence (AI) has become an integral part of our everyday lives, with AI-powered services and products seeing a huge surge in demand. This has been especially true for large language models like ChatGPT and image generation tools like Stable Diffusion. However, this rise in popularity has also led to a closer examination of the computational and environmental costs, particularly in the area of deep learning.

The main factors contributing to the high costs of deep learning are the size and complexity of the models, the type of processor used, and the way data is represented. Over the past decade, the size of AI models has been growing rapidly, with compute requirements doubling every 6 to 10 months. While processor power has improved, it hasn’t kept pace with the rising costs of the latest AI models. This has prompted researchers to explore ways to optimize data representation, as choosing the right data type can significantly affect a model’s power consumption, accuracy, and throughput. However, the ideal data type for AI depends on whether you’re in the training phase or the inference phase of deep learning.

Finding the Balance: Bit by Bit

To make AI more efficient, one approach is to reduce the number of bits used to represent the data, a process known as quantization. By lowering the number of bits, you not only make the model smaller but also reduce computation time, which in turn reduces the power needed for processing. This is an important technique for anyone working on efficient AI systems.

AI models are typically trained using 32-bit floating point (FP32) data, but it turns out that not all 32 bits are necessary to maintain accuracy. Using 16-bit floating point (FP16) data types has shown promise, leading to efforts to find the minimum number of bits required for a model to remain accurate. Google developed the 16-bit brain float (BF16), and for models set up for inference, they are often quantized to 8-bit floating point (FP8) or integer (INT8) data types. There are two main methods for quantizing a neural network: Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). Both aim to improve computational efficiency, memory usage, and energy consumption, but they differ in how they apply quantization and how it affects model accuracy.

Post-Training Quantization (PTQ) is applied after a model has been trained with higher-precision data (like FP32 or FP16). This process reduces the model’s precision by converting its weights and activations to lower-precision formats, like FP8 or INT8. While PTQ is relatively simple to implement, it can result in accuracy loss, particularly in low-precision formats, as the model wasn’t trained to handle these quantization errors.

Quantization-Aware Training (QAT), on the other hand, incorporates quantization during the training process itself, allowing the model to adapt to lower precision. By simulating quantized operations during training, the model learns to handle the reduced precision more effectively. While QAT typically results in better accuracy than PTQ, it requires changes to the training process and can be more complex to implement.

The 8-bit Debate

In the AI industry, two primary data types have emerged as contenders for quantization: INT8 and FP8. Different hardware vendors have shown strong preferences for one or the other. In mid-2022, Graphcore and AMD proposed an IEEE standard for FP8, and shortly after, Intel, Nvidia, and Arm joined with similar proposals. Other companies, including Qualcomm and Untether AI, have also explored the merits of FP8 versus INT8. While the debate is ongoing, the choice between these two data types largely depends on the specific AI model and the hardware used.

Integer vs. Floating Point

The distinction between floating point and integer data types lies in how they represent numbers. Floating point data types are used to represent real numbers, including both integers and fractions, and can be written in scientific notation, with a mantissa and exponent.

Integer data types, on the other hand, are used to represent whole numbers, with no fractions involved. This difference in representation means floating point numbers have a wider dynamic range, while integer numbers have less range but more precision.

Integer vs Floating Point for Training

During the training phase of deep learning, the focus is on optimizing the model’s parameters, and this requires a higher dynamic range to accurately propagate gradients and achieve convergence. As such, floating point representations like FP32, FP16, and even FP8 are preferred during training to maintain a sufficient range.

Integer vs Floating Point for Inference

The inference phase is about efficiently applying the trained model to new data. In this phase, the focus shifts to minimizing computational complexity, memory usage, and energy consumption. This is where lower-precision data types like INT8 and FP8 come into play. For real-time applications and mobile services, the smaller INT8 data type is often the best choice, as it reduces memory and compute time while still offering enough accuracy for effective results.

FP8 and INT8 for Inference

FP8 is becoming more widely adopted, and major hardware vendors and cloud providers are incorporating it into their deep learning platforms. There are several versions of FP8, with varying trade-offs between precision and dynamic range. FP8 E3M4, for example, has a smaller dynamic range but higher precision, while FP8 E4M3 offers a greater dynamic range by sacrificing some precision. FP8 E5M2 has the highest dynamic range, making it ideal for training, which requires a larger range.

INT8, by contrast, has a smaller dynamic range but more precision, with 1 sign bit, 1 exponent bit, and 6 mantissa bits. Whether FP8 or INT8 is better for a specific model depends on the hardware and the performance goals. Research from Untether AI suggests that FP8 outperforms INT8 in terms of accuracy, performance, and efficiency on their hardware. On the other hand, Qualcomm has found that while FP8 may offer higher accuracy, it doesn’t justify the loss in efficiency compared to INT8 for their hardware.

Ultimately, the decision of which data type to use for quantization in inference depends on several factors: the model’s requirements, the hardware capabilities, and the trade-offs between accuracy and efficiency.

0 Comments

Leave a Comment