Stanford AI Index Report 2025: US AI Private Investment 12 Times Higher Than China, GPT-3.5 Level Model Inference Costs Drop 280-Fold
Executive Summary: AI Gap Narrowing as Innovation Accelerates
On April 7, Stanford University’s Human-Centered Artificial Intelligence (HAI) Institute released the “2025 Artificial Intelligence Index Report.” Multiple data points indicate that the AI gap between China and the United States is narrowing, while AI technological innovation is progressing several times faster than in the previous decade.
The comprehensive report spans nearly 450 pages, with these key highlights:
Global AI Investment Landscape Shows Dramatic Growth
The report points out that both the US and China experienced tremendous growth in AI investments over the past year. US private AI investment reached $109.1 billion, nearly 12 times that of China ($9.3 billion) and 24 times that of the UK ($4.5 billion). Global generative AI startup funding reached $33.9 billion, an 18.7% increase compared to 2023.
AI Models Becoming Visibly More Efficient
Driven by efficient smaller models, inference costs for GPT-3.5 level models have decreased 280-fold from November 2022 to October 2024, with hardware costs declining by 30% annually.
Industry Leads Academia in AI Research
The report indicates that in 2024, almost 90% of notable AI models came from industry, compared to 60% in 2023. Despite the continuous expansion in model parameter sizes, the performance gap between models is narrowing: data shows that the performance difference between the world’s top AI model and the 10th-ranked model shrank from 11.9% to 5.4% within a year.
Additionally, last year’s index report highlighted a significant performance gap between closed-source and open-source LLMs, but this year, that gap has narrowed to just 1.7%.
AI Infrastructure Making Remarkable Progress
According to the report, AI performance per dollar has significantly improved. The inference cost for an AI model equivalent to GPT-3.5 has dropped from $20.00 per million tokens in November 2022 to just $0.07 per million tokens in October 2024 (Gemini-1.5-Flash-8B), a reduction of more than 280-fold in approximately 1.5 years.
Epoch estimates that hardware costs for fixed performance levels decrease by 30% annually, making AI training increasingly economical and scalable, contributing to model improvements. The report also states that machine learning (ML) hardware energy efficiency has significantly improved over time, increasing by approximately 40% annually.
According to Epoch AI data, the enterprise sector contributed 55 notable AI models in 2024, while academia did not produce any notable models that year. It’s worth noting that the number of models resulting from enterprise-academic collaboration continues to grow. Over the past decade, the proportion of notable AI models originating from industry has shown a steady upward trend, reaching 90.2% by 2024.
In 2024, the major contributing organizations were OpenAI (7 models), Google (6), and Alibaba (4). Since 2014, Google has ranked first with 186 notable models, followed by Meta (82) and Microsoft (39). Among academic institutions, Carnegie Mellon University (25), Stanford University (25), and Tsinghua University (22) have had the most prominent model output since 2014.
As model parameter counts grow, the scale of AI system training data is expanding in parallel. Meta’s flagship large language model Llama 3.3, launched in summer 2024, broke the 15 trillion token threshold for training data.
According to Epoch AI research, the size of large language model training datasets approximately doubles every 8 months. This exponential growth trend, combined with increasing model complexity, continuously pushes the boundaries of AI performance.
Epoch estimates that the computing power for important AI models doubles approximately every 5 months, datasets double every 8 months, and energy consumption increases annually, a trend that has been particularly significant over the past five years.
OpenAI’s current state-of-the-art GPT-4o foundation model required 38 billion petaFLOPs of training compute.
This resource threshold makes it difficult for academia to compete, resulting in industry’s continued dominance in cutting-edge AI research. Although the gap narrowed slightly this year (last year’s AI Index report first pointed out this trend), this division continues.
DeepSeek V3 and the US-China AI Computing Power Gap
The December 2024 release of the DeepSeek V3 model garnered widespread attention, with its core breakthrough being: achieving top-tier performance while requiring significantly fewer computational resources than most mainstream large language models. The following comparison of training compute for notable machine learning models from the US and China reveals a key trend: US top AI models generally require far more computing power than their Chinese counterparts.
According to Epoch AI data:
- China’s leading language models’ training compute has maintained growth of approximately 3x per year since late 2021
- The rest of the world has maintained a growth rate of 5x per year since 2018
This gap reflects the differing approaches to AI development in the two countries: Chinese teams focus more on algorithm efficiency optimization, while their international counterparts tend to drive performance breakthroughs through computing power. However, it’s worth noting that DeepSeek V3’s success suggests that computational efficiency improvements might become a new track in the future AI race.
However, the AI Index data also confirms recent industry speculations: model training costs are showing a significant upward trend.
In 2024, one of the few models with estimable costs, Llama 3.1-405B, had a training cost of $170 million. The steep rise in training costs is primarily due to three factors:
- Increased competition leading to less disclosure of training processes, making cost estimation more difficult
- Training costs directly correlating with computational requirements
- Models with greater computational requirements experiencing exponential growth in training costs
Performance Gap Between Large Models Narrowing
In early January 2024, leading closed-source models outperformed top open-source models by 8.0%. By February 2025, this gap had narrowed to 1.7%.
This rapid progress is primarily due to Meta’s summer release of Llama 3.1 and subsequently launched high-performance open-source models, such as DeepSeek’s V3 version.
The following figure shows an overview of the top ten models on the Chatbot Arena leaderboard as of January 2025. Notably, in 2023, the Elo skill rating gap between the top model and the tenth-ranked model was 11.9%. By 2025, this gap had narrowed to just 5.4%.
Although the introduction of reasoning mechanisms such as chain-of-thought significantly improved large language model (LLM) performance, these systems still have key limitations:
Reliability Defects
- Cannot reliably solve problems verifiable through logical reasoning (such as arithmetic operations, task planning, etc.)
- Perform particularly poorly when faced with instances beyond the scale of training data
Application Constraints
- Seriously affect system credibility assessment
- Limit their feasibility in high-risk scenarios (such as financial decisions, medical diagnoses, etc.)
OpenAI’s o1 model completed only 23.6% of complex instances requiring at least 20 steps to solve in the PlanBench test.
Planning is essentially a combinatorial optimization problem, and the time required to solve long-sequence problems necessarily grows super-linearly. This characteristic explains the performance limitations of current models on complex planning tasks.
Multimodal AI and Video Generation Breakthroughs
Early models showed potential but had obvious flaws: poor image quality, lack of audio support, and overly short video duration (typically only 2-4 second clips). In 2024, this field witnessed major breakthroughs—several tech giants successively released new-generation video generation systems. Specific breakthrough points include:
- Video duration extended from seconds to 20-second level
- Resolution achieving high-definition (HD) standards
- Generated content expanding from 2D to 3D domains
- Marking the practical application stage of text-to-video generation technology
Humanoid Robots Reach New Milestones
2024 became a key turning point in humanoid robot development, with robots possessing human-like form and bionic functions achieving multiple breakthroughs. Innovative enterprises represented by Figure AI introduced a new generation of general-purpose humanoid robots, Figure 02, with technological features including complex task execution, intelligent interaction, and support for “voice-reasoning-voice” closed-loop operations.
In addition to the AutoRT system, DeepMind simultaneously released two innovative platforms: ALOHA (Advanced Autonomous Learning of Human-like Activities) and DemoStart. Among them, the ALOHA Unleashed version achieved a major breakthrough in robot fine manipulation, achieving human-level fine motion control for the first time and demonstrating the engineering feasibility of large models + imitation learning.
AI Investment Scale Growing
Total AI investment increased to $252.3 billion in 2024, a 25.5% increase from 2023. Over the past decade, AI-related investment has grown nearly 13-fold.
The figure below shows the trend of global enterprise AI investment from 2013 to 2024, covering mergers and acquisitions, minority equity, private investment, and public offerings.
Between 2023 and 2024, global private investment in AI increased by 44.5%, the first year-over-year growth since 2021.
In 2024, the generative AI sector attracted $33.9 billion in investment, an 18.7% increase from 2023, reaching more than 8.5 times the investment scale of 2022. Notably, generative AI investment in 2024 accounted for more than one-fifth of total AI-related private investment.
The number of AI companies receiving funding in 2024 jumped to 2,049, an 8.4% increase from the previous year. Among them, the number of newly funded startups in the generative AI field increased significantly—a total of 214 startups received funding throughout the year, a substantial increase from 179 in 2023 and 31 in 2019.
Data from 2024 shows the United States leading with $109.1 billion in investment, China ($9.3 billion) ranking second at just 8.5% of US investment, and the United Kingdom ($4.5 billion) ranking third, with an investment scale equivalent to 4.1% of the US.
The three most concentrated areas of investment in 2024 were:
- AI infrastructure/research/governance ($37.3 billion)
- Data management and processing ($16.6 billion)
- Healthcare ($11.0 billion)
The outstanding performance in AI infrastructure, research, and governance areas is mainly due to large investments received by leading companies focused on AI application development, such as OpenAI, Anthropic, and xAI.
Enterprise AI Adoption Accelerating
McKinsey’s latest report shows:
- Overall AI application rate jumped from 55% in 2023 to 78%, with 78% of surveyed enterprises indicating they have applied AI technology in at least one business function.
- Generative AI applications experienced explosive growth: the application rate reached 71% in 2024, more than double compared to last year (33%). This technology was included in the statistics for the first time as a new survey item last year.
- Enterprise AI applications have achieved dual benefits of cost reduction and revenue increase, with the most significant cost-saving areas being: service operations (49% of surveyed enterprises reporting effectiveness), supply chain and inventory management (43%), and software engineering (41%).
Microsoft’s latest workplace research shows: in routine office tasks, document editing efficiency improved by 10-13%, and email processing time was reduced by 11%. Professional positions saw more significant improvements—security analysts completed tasks 23% faster with 7% improved accuracy.
Sales teams improved response speed by 39% while increasing conversion rates by 25%.
In scientific research, material discovery rates improved by 44.1%, patent applications increased by 39.4%, and product prototype output increased by 17.2%.
These data confirm the dual value of AI in improving work quality and efficiency.
AI For Science Still Has Extremely High Ceiling
In 2024, AI-driven research harvested the highest honors—two Nobel Prizes were awarded for breakthrough achievements in artificial intelligence.
Google DeepMind’s Demis Hassabis and John Jumper won awards for their pioneering work with AlphaFold in protein folding. The latest AlphaFold 3 has broken through single protein structure prediction to achieve precise modeling of protein interactions with key biomolecules (DNA, RNA, ligands, antibodies).
John Hopfield and Geoffrey Hinton received the Physics Prize for their foundational contributions to neural networks.
AI assists medical and biological research. Researchers using directed evolution methods have demonstrated that large language models can generate protein sequences that outperform traditional algorithms in both synthetic and experimental fitness landscapes.
The generative AI model ProGen, by designing functional protein sequences, highlights the potential of AI-assisted protein engineering. Similarly, Transformer-based models like ProtT5 use deep learning to directly predict protein function and interactions from sequence data, driving computational biology development.
The expansion of public databases is crucial for AI applications in protein science, as high-quality, large-scale datasets enable AI models to train based on diverse biological sequences, enhancing predictive capabilities. The number of entries in various public protein science databases has continued to grow since 2019. However, ensuring data quality and avoiding model bias remains an ongoing challenge.
Imaging and multimodal AI are also driving scientific discovery. Advances in cryo-electron microscopy, high-throughput fluorescence microscopy, and whole-slide imaging technologies enable scientists to resolve structures at atomic, subcellular, and tissue levels with high precision, revealing new mechanisms of complex biological processes.
With the rise of high-throughput microscopy technology, vision-language models and emerging vision-omics foundation models have become research hotspots. The number of microscopy foundation models continues to increase with technological development: optical microscopy models doubled from 4 to 8 in 2024.
In AI-driven protein research in biological sciences in 2024, functional prediction (8.4%) ranks first, followed by structural prediction (7.6%) and protein-drug interactions (3.0%).
Enthusiasm for developing LLM agents for biological tasks has increased.
As the application value of AI systems in scientific fields (especially biology) becomes increasingly apparent, how to design intelligent language models capable of calling tools to solve complex tasks becomes a key challenge. Aviary provides a structured framework for this, specifically training language agents to address three high-difficulty scientific tasks:
- DNA manipulation (for molecular cloning)
- Scientific question answering (through scientific literature retrieval)
- Protein stability engineering
In the Aviary environment, the baseline model Claude 3.5 Sonnet performed limitedly due to inability to access external tools, while models integrated into the Aviary agent framework significantly outperformed the baseline in almost all tasks.
The research reveals two key conclusions:
- Although general LLMs perform well in most scientific tasks, fine-tuning models with domain expert knowledge can further improve performance;
- Accelerating AI-driven scientific innovation depends not only on model scale but also on capability expansion through external tool interaction—this “agent-based AI” is becoming a new paradigm.
For required token quantities, the clinical LLM GatorTron (82 billion tokens) uses far fewer than Llama 3 (15 trillion tokens), while the imaging model RadImageNet (16 million image equivalent tokens) is only 1/375 of DALL-E (6 billion).
Furthermore, AI applications in clinical scenarios have enormous potential. Microsoft and OpenAI’s recently tested o1 model set a new record with 96.0% accuracy (a 5.8 percentage point improvement from 2023).
LLM clinical knowledge capabilities continue to improve (especially o1 equipped with real-time reasoning), but hallucinations and inconsistent multilingual performance issues remain.
In diagnostic reasoning, physicians assisted by GPT-4 achieved a diagnostic accuracy of 76%, only slightly higher than the traditional tool group (74%); but GPT-4’s independent diagnostic accuracy reached 92%, a 16 percentage point improvement over physicians without AI assistance (Figure 5.4.6). Despite AI’s excellent independent performance, diagnostic time was not significantly reduced. Subsequently, workflow restructuring, user training, and interface design are needed to transform isolated model advantages into clinical collaborative efficiency.
Over the past five years, attention to ethical issues in medical artificial intelligence has increased year by year. From 2020 to 2024, the number of publications related to ethics and medical artificial intelligence increased fourfold.
Public More Optimistic About AI Era
In 2024, 67% of respondents indicated they “have a good understanding of what AI is,” and 66% believed “AI will profoundly change their daily lives in the near future.”
The proportion of the global population believing AI-driven products and services “have more benefits than drawbacks” rose slightly from 52% in 2022 to 55% in 2024.
In surveys about AI products and services, Chinese respondents had the highest awareness, trust, and enthusiasm for AI on average. 80% of respondents indicated “these products and services make them excited.” In comparison, only 58% of US respondents believed “AI will profoundly change life in the next 3-5 years.”
Over the past year, the proportion of respondents who “trust that companies using AI will protect personal data” decreased by 3 percentage points, and those who “trust that AI won’t discriminate against or show bias toward any group” decreased by 2 percentage points.
However, there are significant regional differences in public opinion.
Respondents in Asia and Latin America are more inclined to believe AI has more benefits than drawbacks. In contrast, respondents in Europe and English-speaking countries are more skeptical. Only 46% of UK respondents and 39% of US respondents believe AI has more benefits than drawbacks.
This year’s Ipsos survey added questions about “how AI affects current work.” 60% of respondents believe “AI may change the way they work in the next five years,” and 36% (more than one-third) believe “AI may replace existing jobs in the next five years.”
Generation Z (67%) and Millennials are more likely than Generation X and Baby Boomers (49%) to agree that “AI will change the way they work.” From 2023 to 2024, the proportion of all generations agreeing with this view increased, with Millennials and Baby Boomers showing the largest increase, possibly indicating a convergence of awareness across generations.