Alibaba 'ZeroSearch' can reduce AI for search training cost by 88%, company claims

Alibaba says AI-generating search results could not only reduce reliance on Google's APIs, but cut costs by up to 88%.

May 13, 2025 - 19:18
 0
Alibaba 'ZeroSearch' can reduce AI for search training cost by 88%, company claims

  • Alibaba's ZeroSearch can generate training material for its AI
  • Cost savings of up to 88% are possible
  • The tech requires additional GPUs

Alibaba's Tongyi Lab has found a way to train AI search models without using real search engines, which it says can reduce search training costs by up to 88% compared to commercial APIs like Google.

In a paper entitled "Incentivize the Search Capability of LLMs without Searching," Alibaba explains how the development uses simulated AI-generated documents to mimic real search engine outputs.

Interestingly, Alibaba's researchers also note that using simulated documents can actually improve the quality of training, because "the quality of documents returned by search engines is often unpredictable" and risks introducing noise into the training process.

Alibaba will train AI search models on AI-generated documents

"The primary difference between a real search engine and a simulation LLM lies in the textual style of the returned content," the researchers wrote. ZeroSearch can also gradually degrade the quality of documents in order to simulate increasingly challenging retrieval scenarios.

Of course, the key benefit to this technology is the significant cost saving available. Training with ZeroSearch's 14B model costs around $70.80 per 64,000 queries, compared with around $586.70 via Google's APIs. Costs are even lower for the 7B and 3B models, at $35.40 and $17.70 per 64,000 queries, and yet all three of the ZeroSearch models and the Google API method take the same amount of time.

However, Alibaba acknowledged that one, two, or four A100 GPUs are required for its ZeroSearch method, compared with no GPU requirement via the Google API method, which could present a negative impact in terms of sustainability, like energy consumption and emissions.

"Our approach has certain limitations. Deploying the simulated search LLM requires access to GPU servers. While more cost-effective than commercial API usage, this introduces additional infrastructure costs," the researchers concluded.

Still, challenging the reliance on expensive and gated platforms like Google Search APIs and reducing the costs could help democratize AI development even further.

You might also like