Liquid AI Launches New “Liquid” Edge Model Hyena Edge, Testing Shows Both Efficiency and Quality Surpass Transformer

Introduction to Liquid AI’s Latest Innovation

Remember Liquid AI, the company that previously introduced the new Liquid Foundation Model (LFM)? Just months after receiving investment from AMD, this MIT-based startup has unveiled another breakthrough.

On April 25, Liquid AI officially released a new AI architecture called “Hyena Edge” designed for edge devices. Unlike mainstream Transformer-based model architectures, Hyena Edge is a convolution-based multi-hybrid model specifically optimized for edge devices like smartphones.

“Artificial intelligence is rapidly becoming ubiquitous, from large-scale cloud deployments to resource-constrained edge devices such as smartphones and laptops,” stated Liquid AI science team members Armin Thomas, Stefano Massaroli, and Michael Poli in their research report. “Despite impressive progress, most small models optimized for edge deployment, such as SmolLM2, Phi models, and Llama 3.2 1B, primarily rely on Transformer architectures based on attention operators.”

Superior Performance for Edge Computing

While these traditional architectures feature parallel computation and efficient kernels, they still face efficiency bottlenecks on edge devices. As a Liquid architecture, Hyena Edge inherently possesses advantages in computational efficiency, making it highly suitable for edge deployment. According to Liquid AI, Hyena Edge has demonstrated performance surpassing Transformer baselines in both computational efficiency and model quality during real hardware testing.

They tested Hyena Edge on the Samsung Galaxy S24 Ultra, with results showing the model outperforming powerful Transformer-based benchmark models across multiple key metrics.

In terms of efficiency, Hyena Edge demonstrated faster prefill and decoding latency. Particularly for sequences exceeding 256 tokens, decoding and prefill latency improved by up to 30%. Notably, its prefill latency for short sequence lengths also outperformed Transformer baselines, which is crucial for responsive device applications. Regarding memory usage, Hyena Edge utilized less memory across all tested sequence lengths.

Impressive Quality Benchmarks

In terms of model quality, after training on 100 billion tokens, Hyena Edge performed excellently across various common language modeling benchmarks, including Wikitext, Lambada, Hellaswag, Winogrande, Piqa, Arc-easy, and Arc-challenge. For example, perplexity on Wikitext decreased from 17.3 to 16.2, on Lambada from 10.8 to 9.4, accuracy on PiQA improved from 71.1% to 72.3%, on Hellaswag from 49.3% to 52.8%, and on Winogrande from 51.4% to 54.8%.

“These results indicate that the efficiency improvements do not come at the cost of prediction quality—a common trade-off in many edge-optimized architectures,” the research team noted.

The STAR Framework: Technical Innovation Behind Hyena Edge

The core technology of Hyena Edge lies in the team’s previously proposed STAR (Synthesis of Tailored Architectures) framework and its optimization techniques. The core idea of the STAR framework is to utilize Evolutionary Algorithms and the mathematical theory of Linear Input-Varying Systems (LIVs) to efficiently explore the vast neural network architecture space and automatically synthesize “tailored” architectures for specific objectives (such as low latency, small memory footprint, high model quality, small parameter count, etc., which can be optimized for multiple objectives simultaneously).

Unlike traditional methods that rely on human experience and intuition for model design, or automated searches within limited spaces, STAR provides a more comprehensive solution. LIV theory is a key theoretical foundation that can uniformly describe and generalize various computational units common in deep learning, including various attention variants, linear recurrent networks, convolutional networks, and other structured operators. Based on LIV theory, STAR constructs a novel, hierarchical architecture search space.

Evolutionary Architecture Development

Under this framework, model architectures are encoded as a “genome.” This genome contains information about multiple levels of the architecture, from the low-level featurization methods, operator structures (defining how tokens and channels mix), to the top-level backbone (defining how LIV units connect and combine). STAR’s genome design features good hierarchical and modular characteristics.

Subsequently, STAR applies the principles of evolutionary algorithms to iteratively optimize these architecture genomes. This primarily includes evaluation (measuring architecture performance based on preset objectives), recombination (combining features of excellent parent architectures), and mutation (introducing random changes to explore new architectures). The framework supports multi-objective optimization, capable of simultaneously considering potentially conflicting metrics such as model quality, parameter count, inference cache size, and latency, to find architectural solutions that achieve a good balance among these objectives.

Practical Implementation and Results

In designing Hyena Edge, the Liquid AI team applied the STAR framework. They started with an initial population of 16 candidate architectures and conducted 24 generations of evolutionary iterations. Their search space was designed to be very rich, including variants of multiple convolution operators, primarily inspired by the Hyena architecture:

Hyena (Full): Contains convolution operations in both the gating mechanism and within Hyena internal convolutions.
Hyena-X: Excludes internal convolutions.
Hyena-Y: Excludes convolutions in the feature groups (gates).

In addition to these three main Hyena types, the search space also considered variations in the length of their learned short-distance, explicit convolution kernels (from 3 to 128), constituting a total of 18 different convolution operators. Furthermore, the search space included variants of grouped query attention (with different numbers of KV heads) and SwiGLU (with different internal widths) and other common Transformer components.

During the evolutionary process, the STAR framework continuously optimized the architecture population toward the Efficiency-Quality Frontier by analyzing the initial latency and memory usage of each candidate architecture on the Samsung S24 Ultra, combined with model perplexity performance during training.

Interestingly, as the evolutionary process progressed and architectures approached the optimal efficiency-quality boundary, STAR noticeably favored selecting Hyena-Y type convolutions. This indicates that Hyena-Y convolutions achieved a superior balance between latency, memory, and model quality.

Based on this finding, the final Hyena Edge architecture replaced some GQA operators with Hyena-Y gated convolutions optimized and selected by STAR, built upon a GQA-Transformer++ benchmark model.

Future Prospects and Open Source Plans

As the benchmark tests indicate, Hyena Edge maintains high model quality while improving efficiency, which is an important feature for edge device applications with limited performance and resources.

Liquid AI has stated plans to open-source a series of foundation models, including Hyena Edge, in the coming months, with the goal of building AI systems that can adapt to various environments from cloud to edge. Beyond the models themselves, the design methodology they’ve demonstrated may be even more worthy of our anticipation.