In a significant advancement in artificial intelligence, researchers from Stanford University and the University of Washington have developed “s1,” a large language model (LLM) that rivals leading models like OpenAI’s GPT, Google’s Gemini, and DeepSeek’s R1, but at a fraction of the cost. While traditional models often require millions of dollars and extensive computational resources for training, the s1 model was trained in under 30 minutes using 16 Nvidia H100 GPUs, costing approximately $20 to $50 in cloud compute credits. 
The s1 model employs a method called “test-time scaling,” which enhances the model’s reasoning capabilities during inference by allowing it to allocate more computational resources when generating responses. This approach contrasts with traditional models that primarily focus on extensive training phases to improve performance. Additionally, s1 utilizes a technique known as “budget forcing,” where the model is prompted to continue its reasoning process by appending the word “Wait” to its responses, encouraging it to double-check and refine its answers. 
In benchmark tests, s1 has demonstrated performance comparable to leading models. For instance, it outperformed OpenAI’s o1-preview by up to 27% on competition math questions.  This achievement underscores the potential for developing high-performing AI models with significantly reduced financial and computational investments, challenging the prevailing notion that advanced AI development necessitates substantial resources.
The success of s1, along with models like DeepSeek’s R1, indicates a shift towards more cost-effective AI development methodologies. These approaches not only make advanced AI research more accessible but also prompt a reevaluation of resource allocation in AI development, potentially democratizing the field and fostering innovation across various sectors.
We will of course keep a watchful eye on s1 and similar developments. It is an exciting development on many levels. In the meantime, for those who are technically inclined, here is a great in-depth overview on s1.
Keep a lookout for the next edition of AI Uncovered!
Follow our social channels for more AI-related content: LinkedIn; Twitter (X); Bluesky; Threads; and Instagram.
The following report provides insights into s1 and DeepSeek-R1 that you may find valuable:
From Brute Force to Brain Power: How Stanford's s1 Surpasses DeepSeek-R1
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5130864
Here are the first five sentences from the abstract:
Large Language Models (LLMs) are increasingly adept at complex reasoning, yet many state-of-the-art approaches rely on massive datasets and extensive reinforcement learning (RL) pipelines. In contrast, Stanford's s1 introduces a streamlined, data-efficient method that surpasses previous open-source and open-weights reasoning models-most notably DeepSeek-R1-using only a tiny fraction of the data and compute. A core innovation of s1 is its "s1K" dataset, a meticulously curated set of 1,000 high-quality, step-by-step reasoning examples drawn from challenging math, logic, and science problems. Fine-tuning on this compact dataset required only minutes of GPU time, demonstrating unprecedented sample-and cost-efficiency. A second breakthrough is s1's inference-time "budget forcing" mechanism, which allows controllable test-time scaling.
19 pages