Nebius, the Amsterdam-based technology company, has officially released the Nebius Token Factory.
It's a production inference platform, which is particularly designed to launch and optimise open-source and custom AI models at enterprise scale.
Built on Nebius’s AI Cloud 3.0 “Aether” infrastructure, it unifies fine-tuning, inference, and access management with enterprise-grade security and 99.9% uptime.
The recently introduced Token factory supports more than 40 open models that includes DeepSeek, NVIDIA Nemotron, DeepSeek, and OpenAI’s GPT-OSS, and lets customers host their own.
It offers sub-second latency, autoscaling throughput, and transparent cost-per-token tracking.
The platform simplifies the post-training lifecycle, turning open-source model weights into optimized, production-ready systems with nearly 70% reduced inference costs. Incorporated distillation pipelines and fine-tuning allow faster, and improved adaptation of large models to certain business data.
Early adopters that include Prosus, Higgsfield AI, and Hugging Face reported significant performance and cost gains. Prosus has accomplished up to 26× cost reductions and handled 200 billion tokens per day, while Hugging Face appreciated its scalability for developers.
With built-in SSO, team management, auditing, and compliance certifications (SOC 2 Type II, HIPAA, ISO 27001), Nebius Token Factory provides transparent, secure, and efficient AI-powered operations to streamline the process.
Availability
Nebius’s recently launched platform is currently available, upgrading existing Nebius AI Studio users automatically.