Apertus has crossed one million downloads since its September 2025 release. Most Swiss companies still route their prompts through ChatGPT. Here is the alternative.
Key Takeaways
Apertus is a fully open-source Swiss LLM in two sizes, 8B and 70B parameters, released under Apache 2.0 by ETH Zurich, EPFL, and the Swiss National Supercomputing Centre. It was trained on the Alps supercomputer, the eighth most powerful in the world, on 100% carbon-neutral electricity. Its training corpus, architecture, and weights are fully documented, making it the only major LLM that meets the EU AI Act's transparency requirements out of the box. Apertus can be self-hosted on Swiss sovereign infrastructure with full data residency, audit logs, and zero data egress, using a private inference stack such as vLLM behind a RAG pipeline.
Why Apertus Matters for Regulated Swiss Sectors
The strategic vulnerability of depending on foreign AI providers has moved from theoretical risk to policy priority. In February 2025, the Federal Council ratified the Council of Europe Framework Convention on Artificial Intelligence, making Switzerland the first non-EU state to formally commit to the treaty's data governance and transparency standards. The Federal Department of Justice and Police is expected to release a consultation draft for domestic AI legislation by end of 2026, with alignment to the EU AI Act's risk-based framework widely anticipated.
Financial regulators have moved faster. FINMA's December 2024 guidance on AI in supervised institutions established that banks and insurers bear full responsibility for AI-driven decisions, including those made by third-party models hosted outside Switzerland. For wealth managers, private banks, and cantonal institutions handling client data subject to banking secrecy, routing prompts through US-hosted infrastructure creates compliance exposure that no contractual clause can fully mitigate.
The Federal Administration itself changed course in February 2026, issuing a directive to reduce dependence on individual foreign technology suppliers across critical digital infrastructure. The message to the private sector is clear: sovereign alternatives are no longer a niche preference but a risk management baseline.
What Running Apertus Privately Actually Means
Running Apertus privately means downloading the model weights to infrastructure you control, deploying an inference engine to serve requests, and routing your applications to that local endpoint rather than an external API. The 8B parameter version runs on a single high-end GPU; the 70B version requires a multi-GPU setup, typically four NVIDIA H100s or the newer RTX PRO 6000 Blackwell-class cards, with sufficient VRAM and interconnect bandwidth.
The de facto standard for open-source LLM inference in 2026 is vLLM, a high-throughput serving engine that handles batching, memory management, and quantization. Most enterprise deployments pair vLLM with a reverse proxy, rate limiting, and authentication layer before exposing the model to internal applications.
There is an important distinction between using Apertus through a managed service, such as Swisscom's Swiss AI platform, and deploying it inside your own perimeter. Managed services reduce operational burden but still involve data leaving your environment, even if it stays within Switzerland. For institutions handling client data, patient records, or classified government information, the only fully sovereign option is deployment on infrastructure where no third party has administrative access to the runtime environment.
The Four-Layer Sovereign Deployment Pattern
A sovereign Apertus deployment comprises four layers, each with distinct choices that affect compliance posture and operational complexity.
Infrastructure: Swiss sovereign cloud providers such as Hidora or Infomaniak offer GPU instances with contractual guarantees of Swiss data residency and no foreign parent company. Alternatively, on-premise deployment in your own data centre or colocation facility provides maximum control but requires in-house hardware expertise.
Inference: vLLM is the standard choice for serving Apertus at enterprise scale, with support for continuous batching and PagedAttention. For smaller deployments or edge cases, llama.cpp offers lower resource requirements at the cost of throughput.
Orchestration: A RAG pipeline with a private knowledge base allows Apertus to answer questions grounded in your internal documents. A smart router that directs simple queries to Apertus and complex reasoning tasks to a sovereignly accessed frontier model optimizes both cost and capability.
Access: Role-based access control, API key management, and audit logging ensure that every prompt and response is traceable, a requirement for FINMA-supervised entities and any organization anticipating EU AI Act compliance audits.
Where Apertus Fits, and Where It Does Not
Apertus excels in use cases where compliance constraints outweigh raw capability requirements. It handles multilingual content with native support for German, French, Italian, Romansh, and Swiss German dialects, making it well suited for internal knowledge retrieval, document summarization, and first-draft generation in regulated Swiss contexts. Its fully documented training corpus means you can answer auditor questions about what data influenced the model's outputs, a transparency standard no closed-source frontier model can match.
Apertus is not yet at frontier-model performance on complex multi-step reasoning, advanced code generation, or tasks requiring the latest world knowledge. For these edge cases, the realistic enterprise pattern in 2026 is a hybrid setup: route the majority of queries to Apertus for cost, speed, and sovereignty, and escalate only the most demanding requests to a frontier model accessed through a compliant gateway. This architecture captures most of the productivity benefit while containing the compliance exposure.
How HybridLLM Operationalizes This
HybridLLM is a Swiss-hosted enterprise AI workspace designed for organizations that cannot afford ambiguity about where their data goes. It integrates Apertus and other locally hosted models behind a private RAG pipeline, with persistent knowledge bases, audit logs, and role-based access. The platform supports hybrid routing, allowing teams to use Apertus for the bulk of their queries while accessing frontier models through a sovereignty-preserving gateway when needed.
Hosted on Swiss sovereign infrastructure via Hidora, HybridLLM holds Swiss Made Software certification with the Swiss Hosting, Swiss Digital Services, and Swiss Digital Services with AI labels. No customer data is used for model training, and each client operates in a fully isolated environment with no cross-tenant data sharing.
Frequently Asked Questions
Is Apertus production-ready for Swiss banks and healthcare providers? Yes. Apertus is deployed in production by Swiss financial institutions and research hospitals. Its Apache 2.0 license permits commercial use without royalty, and its documented training corpus satisfies the provenance requirements that regulators increasingly expect.
What infrastructure do I need to run the 70B version? The 70B model requires approximately 140GB of VRAM for full-precision inference, typically four NVIDIA H100 80GB GPUs with NVLink interconnect. Quantized versions can run on less hardware at a modest quality tradeoff.
Can Apertus run fully air-gapped? Yes. Once the model weights are downloaded, Apertus requires no external network connectivity. It can operate in classified environments, isolated networks, or facilities with strict data egress controls.
How does Apertus compare to Mistral or Llama for Swiss compliance? Mistral and Llama are trained by French and American companies respectively, with less transparent training data documentation. Apertus is the only major open-weight model developed entirely within Switzerland, trained on documented Swiss supercomputing infrastructure, with full weight and corpus transparency.
Is Apertus compatible with the EU AI Act transparency requirements? Yes. The EU AI Act requires providers of general-purpose AI models to document training data, energy consumption, and model capabilities. Apertus publishes all of this information, making it the most compliance-ready open model for EU and Swiss deployments.
How long does a private Apertus deployment take with HybridLLM? A typical first use case is operational within one to three weeks, including knowledge base integration, access controls, and audit logging configuration. A one-month proof of concept is available before any long-term commitment.
Talk to a Sovereign AI Architect
Book a 30-minute working session to map your Apertus deployment to your compliance perimeter. Contact us at [email protected] or request a call through our contact page.