Chinese AI startup Z.ai, formerly known as Zhipu AI, has announced the immediate release of GLM-5.2, a 753-billion parameter open-weights language model built specifically for long-horizon autonomous coding tasks. The model is available now on Hugging Face, the Z.ai API, and over 20 third-party coding environments. Enterprise subscription plans start at $12.60 per month.
On several industry-standard benchmarks, GLM-5.2 outscores OpenAI’s GPT-5.5, which costs roughly six times more per million output tokens. The release lands at a moment when enterprises are growing increasingly wary of relying on proprietary Western AI models, after the Trump administration’s export control directive last week blocked foreign nationals from using Anthropic’s new Claude Fable 5 model, prompting Anthropic to take those models entirely offline for all users.
For technical decision-makers, that kind of regulatory volatility is a real operational risk. GLM-5.2’s MIT license and open weights offer a concrete alternative: download the model, run it on your own infrastructure, and bypass geographic restrictions and vendor lock-in entirely.
The cost gap with proprietary models is significant. GLM-5.2 is priced at $1.40 per million input tokens and $4.40 per million output tokens. GPT-5.5, by comparison, costs $5.00 per million input tokens and $30.00 per million output tokens, for a combined total of $35.00 versus GLM-5.2’s $5.80. Claude Opus 4.8 is priced even higher at $30.00 total. For enterprises running large volumes of coding tasks, that difference adds up fast.
The model’s architecture includes a new optimization Z.ai calls IndexShare. In large language models, recalculating attention across long documents is computationally expensive. IndexShare reuses the same indexer across every four sparse attention layers, which reduces per-token compute by 2.9 times at the maximum 1-million-token context length. The model also has an upgraded Multi-Token Prediction layer for speculative decoding, which improves accepted token length by up to 20% during inference.
Z.ai also added selectable thinking modes. Users can choose between “Max” mode, which pushes for peak reasoning quality, or “High” mode, which cuts token output roughly in half while sacrificing only a few benchmark points. That trade-off matters for latency-sensitive applications where generating 85,000 output tokens per task, as Max mode does, would be impractical.
On benchmarks, GLM-5.2 holds up well against much more expensive competition. Here’s how it compares on key evaluations:
- SWE-bench Pro: GLM-5.2 scored 62.1, ahead of GPT-5.5 (58.6) and its predecessor GLM-5.1 (58.4)
- FrontierSWE (Dominance): GLM-5.2 hit 74.4%, beating GPT-5.5 (72.6%) and coming close to Claude Opus 4.8 (75.1%)
- MCP-Atlas (tool use): GLM-5.2 scored 77.0, ahead of GPT-5.5 (75.3) and just behind Claude Opus 4.8 (77.8)
- Humanity’s Last Exam (with tools): GLM-5.2 reached 54.7, beating GPT-5.5 (52.2) and trailing Claude Opus 4.8 (57.9)
- PostTrainBench: GLM-5.2 scored 34.3% against GPT-5.5’s 25.0% on extended multi-hour engineering workloads
- SWE-Marathon: GLM-5.2 hit 13.0% against GPT-5.5’s 12.0%
The model does trail on Terminal-Bench 2.1, scoring 81.0 against GPT-5.5’s 84.0 and Claude Opus 4.8’s 85.0. But it comfortably beats Google’s Gemini 3.1 Pro at 74.0 on that test. On the crowdsourced Design Arena benchmark, GLM-5.2 took first place with an ELO score of 1360, beating even Anthropic’s Claude Fable 5.
The MIT license is arguably the most consequential part of this release. Unlike many open-weight models that ship with restrictive acceptable-use policies or commercial limitations, MIT places almost no conditions on use. Z.ai’s documentation explicitly states the license guarantees “no regional limits” and “technical access without borders.” Engineering teams can download the weights, fine-tune the model for their specific use case, and run it on their own servers, paying only for compute. No royalties, no governance restrictions, no vendor relationship to manage.
That point is not lost on the developer community. On X, AI observer Lisan al Gaib (@scaling01) argued that “frontier labs are absolutely scamming you on API pricing,” noting that massive open models like GLM-5.2 and DeepSeek-V4-Pro charge a fraction of what Anthropic and OpenAI charge, and suggesting that the leading proprietary labs may be operating at over 90% margins.
To support developer adoption, Z.ai also launched the GLM Coding Plan, which is designed for agentic development workflows rather than simple chat use. It supports out-of-the-box integration with Claude Code, OpenClaw, Cline, Kilo Code, Crush, and Factory, among others. Pricing tiers, billed annually, are:
- Lite: $12.60 per month, for lightweight work on small repositories
- Pro: $50.40 per month, with 5x the usage of Lite for mid-sized projects
- Max: $112.00 per month, with 20x the Lite usage and dedicated resources during peak hours
For long-context workloads, Z.ai also offers cached input at $0.26 per million tokens, with a limited-time offer for free cached input storage.
Developer reception has been fast. The Kilo Code team confirmed day-one integration, posting on X: “GLM-5.2 runs in Kilo Code on day one. The 1M context window and Max effort mode are both live. Point your config at it and go!” Cline IDE called it “the first open-weights model to cross 80% on Terminal-Bench” and described it as “frontier-level” performance at a fraction of the cost. Eigent AI also tested it on complex multi-step agentic tasks and noted improvements in long-horizon planning.
Z.ai’s release fits into a broader pattern of Chinese AI labs shipping competitive open-weight models at prices that challenge the business model of proprietary Western competitors. DeepSeek, Qwen, MiniMax, and now Z.ai have each released models that perform near or at the frontier while costing significantly less to run. With regulatory uncertainty now adding a new dimension to the build-versus-buy calculation, enterprises have more reason than ever to take open-weight alternatives seriously.




