IBM Launches New Generation Mainframe: 7.5 Times Higher AI Performance Than Previous Generation
Introduction
IBM has launched its latest generation mainframe, IBM z17, continuing the IBM Z series tradition of security and reliability for mission-critical workloads. Through the newly designed Telum II processor and Spyre AI accelerator card, AI capabilities have been deeply integrated into the system architecture. IBM z17 aims to address the growing enterprise demand for generative AI (GenAI), large language models (LLM), and predictive AI, delivering 7.5 times higher AI performance than the previous z16 generation, while demonstrating excellent potential in transaction processing, fraud detection, and business insights.
This article analyzes how IBM z17 provides efficient, secure, and energy-saving AI computing solutions through hardware and software collaborative innovation, reshaping mainframe value in the new era of “AI-defined enterprises” from three dimensions: technical architecture, functional implementation, and application scenario innovation. The launch of IBM z17 represents not only a technological breakthrough but also a deep response to customer needs, with market performance worth anticipating.
Section 1: IBM z17 Technical Architecture
Telum II Processor: The Core of AI and High-Performance Computing

The core of IBM z17 is the Telum II processor, which made its debut at the 2024 Hot Chips conference. Manufactured by Samsung using a 5-nanometer process, it achieves significant performance improvements compared to the previous generation Telum processor (designed for z16).

Hardware Specification Upgrades

- Telum II maintains an eight-core design but increases the clock frequency from 5GHz to 5.5GHz, enhancing single-thread performance.
- Cache capacity is expanded to 36MB of level 2 cache (L2) per core, plus an additional DPU-dedicated cache, totaling ten 36MB L2 caches.
- Virtual level 3 cache (L3) and level 4 cache (L4) increase to 360MB and 2.88GB respectively, with overall cache capacity increasing by 40% compared to the previous generation, significantly improving data access efficiency.
- The built-in second-generation on-chip AI accelerator supports 24 trillion operations per second (TOPS), with new support for INT8 data types, optimizing AI inference performance so it can run in parallel with high-load enterprise tasks.
On-Chip DPU Innovation

- Telum II integrates a Data Processing Unit (DPU) for the first time, containing four processing clusters, each equipped with eight programmable microcontrollers and an IO acceleration manager, designed to reduce the main processor’s burden in data-intensive AI tasks.
- The DPU optimizes data flow through independent L1 cache and request managers, directly connecting the main processor with PCIe architecture, reducing data transfer overhead, increasing throughput and energy efficiency, and providing support for efficient operation of the Spyre AI accelerator.
- In maximum configuration, z17 can support 32 Telum II processors and 12 IO expansion drawers with a total of 192 PCIe slots, greatly expanding the system’s IO capacity.
Liquid Cooling Technology
- Adopts liquid cooling solution, upgraded from traditional distilled water to a new cooling liquid to simplify maintenance and improve heat dissipation efficiency.
- This design supports high-density deployment of dual-chip modules (DCM), ensuring system stability under high loads.
Spyre AI Accelerator Card: A Tool for Expanding AI Capabilities

- The Spyre AI accelerator card, a highlight of z17, integrates into the system through PCIe slots, providing dedicated computing resources for generative AI and LLMs, with each Spyre card configured with up to 32 cores.
- Its architecture is similar to the Telum II AI accelerator, with a power consumption of 75W and memory capacity of 128GB.
- The system supports up to 48 Spyre cards running in clusters of 8 cards, with an 8-card combination providing 1TB of memory and 256 accelerator cores, significantly enhancing AI processing capability.

Design Goals
- Spyre cards are specifically designed to handle complex AI models such as LLMs, supporting model fine-tuning and even training tasks, allowing enterprises to keep data local to meet high security requirements.
- Working in conjunction with Telum II, it realizes the concept of “integrated AI,” improving prediction accuracy and reducing false positives through multi-model combinations.
Application Scenarios
- In the financial sector, Spyre cards can support real-time fraud detection; in enterprise management, they can be used for document summarization, code generation, and other GenAI tasks.
- Additionally, z17’s motherboard adopts a complex PCB structure with over 50 layers to improve signal integrity and reliability while supporting high-density memory module deployment.
- Telum II achieves multi-chip interconnection through Symmetric Multi-Processing (SMP) cable connectors, supporting up to 32 processors working together to form a powerful computing cluster.
- z17 optimizes data processing capabilities in hybrid cloud environments through integration with modern data access methods and NoSQL databases, providing broader data sources for AI applications.
Section 2: Functional Implementation and Application Scenarios
Enhanced Real-time Response Capabilities
- IBM z17 significantly improves real-time response capability by embedding AI inference capabilities into transaction processing workflows.
- Telum II’s AI accelerator can perform over 450 billion inference operations daily, with latency as low as 1 millisecond, improving AI inference throughput by 50% compared to z16.
- In fraud detection scenarios, it can evaluate 100% of real-time transactions, greatly reducing the rate of missed detections.
- z17 combines the advantages of predictive AI and GenAI. For example, in the insurance industry, it extracts structured claims data from DB2 databases and analyzes unstructured text such as claim reasons through LLMs, feeding them into predictive models to optimize results.
- Spyre cards further expand generative AI capabilities, supporting complex tasks such as chatbot management and medical image analysis.
Software Ecosystem Empowerment
- The z/OS 3.2 operating system, planned for release in the third quarter of 2025, fully supports hardware-accelerated AI, providing operational AI insights to optimize system management.
- New native support for NoSQL databases and hybrid cloud data processing allows AI to mine more enterprise data to generate predictive business insights.
- watsonx Code Assistant for Z provides developers with code auto-completion and optimization suggestions, improving development efficiency.
- watsonx Assistant for Z is integrated into Z Operations Unite, using real-time system data to provide AI-driven event detection and solutions.
- IBM Z Operations Unite, to be released in May 2025, integrates operational logs in OpenTelemetry format, using AI to accelerate anomaly detection, shorten problem resolution time, and can work with IBM Concert to achieve intelligent operations.
Comprehensive Security and Resilience Upgrades
- Integrates HashiCorp’s Vault technology to support credential and key management in hybrid cloud environments, ensuring the security of critical loads.
- IBM Threat Detection for z/OS uses Telum II and natural language processing technology to discover and classify sensitive data in real-time and detect potential threats.
- Built-in 10th generation IBM Storage DS8000 provides modular architecture and optimized data performance, ensuring the agility and security of mission-critical workloads.
- In terms of development and support systems, IBM Technology Lifecycle Services provides customized support to optimize system performance and reduce interruption risks.
- IBM worked closely with customers, research institutes, and software teams; z17’s development took five years and submitted over 300 patents, fully reflecting a customer-oriented R&D strategy.
Overall
The launch of IBM z17 marks the transformation of mainframes from traditional transaction processing platforms to AI-driven computing platforms. Through the collaborative innovation of Telum II processors and Spyre AI accelerator cards, z17 not only achieves a perfect combination of high performance and high reliability in its technical architecture but also meets enterprise’s diverse needs in AI inference, generative AI, and data security through deep optimization of functional implementation.
From real-time fraud detection to document summarization to hybrid cloud data processing, z17 provides more than 250 AI use cases for industries such as finance, insurance, and retail.