Skymizer HTX301: The PCIe Card That Runs 700B LLMs at Just 240W
Tags
- AI
- LLM
- hardware
- inference
- on-prem-ai
- Skymizer
- NVIDIA

A New Player Enters the AI Inference Arena
While NVIDIA and AMD continue their power-hungry race for AI dominance, a Taiwanese company called Skymizer quietly stepped onto the stage with a card that flips the conversation. Their new HTX301 is not another GPU — it's a purpose-built PCIe AI accelerator designed for one job: running massive language models locally, without the cluster, the cooling tower, or the electric bill.
What's Inside the Card
The HTX301 is built on Skymizer's HyperThought platform, powered by a next-generation LPU (Language Processing Unit) IP. Each PCIe card packs six HTX301 chips working in concert, paired with up to 384 GB of memory — enough to hold a 700-billion-parameter model end-to-end on a single card.
Skymizer made a deliberate choice here: instead of expensive HBM or GDDR7, the card uses standard LPDDR4 and LPDDR5 DRAM. That decision keeps the bill of materials and the power envelope dramatically lower, while still feeding the chip enough bandwidth for inference workloads.
The Numbers That Matter
- Memory: Up to 384 GB (LPDDR4 / LPDDR5)
- Max model size: 700 billion parameters on a single card
- Throughput: 240 tokens/sec on Llama2 7B prefill (single chip), scaling to 1,200 tokens/sec across multiple chips
- Power: ~240W
- Weight compression: beats open-source llama.cpp by 9% to 17.8%
- KV-cache compression: less than 0.06% to 3.52% perplexity loss
- Scalability: from 1 chip / 32 GB up to 6 chips / 384 GB, supporting models from 4B to 700B parameters
The Question Everyone Will Ask
If a 240W card can run a 700-billion-parameter model, why does NVIDIA's flagship RTX PRO 6000 Blackwell still draw 600W and only carry 96 GB of VRAM?
The Answer Is in What Each Card Was Built For
The two cards aren't actually competitors — they're built for different problems, and that's exactly why the comparison is so revealing.
| Specification | Skymizer HTX301 | NVIDIA RTX PRO 6000 Blackwell |
|---|---|---|
| Memory | Up to 384 GB LPDDR4/5 | 96 GB GDDR7 ECC |
| Max model size (single card) | 700B parameters | ~70B parameters (Q4) |
| Power | ~240W | 600W |
| Architecture | LPU (inference-only) | Blackwell GPU (general-purpose) |
| Primary use case | LLM inference | Training + inference + rendering + simulation |
| Approx. price | TBD (workstation-tier) | $8,500 – $9,200 |
The RTX PRO 6000 Blackwell is a generalist. With 24,064 CUDA cores, fifth-generation Tensor Cores with FP4 support, and the full Blackwell architecture, it's the gold standard for studios that need AI training, 3D rendering, scientific simulation, and inference — all from one card. That versatility costs power and memory bandwidth, which is why it sits at 600W and 96 GB.
The HTX301 does only one thing: it decodes tokens. By stripping out training capability, graphics pipelines, and general-purpose compute, Skymizer can spend its silicon and power budget on memory capacity and decode efficiency. That's how it fits a 700B model on a single card at less than half the power.
What This Means for On-Prem AI
For companies that want to run their own copilots, code assistants, or RAG pipelines on private infrastructure — without paying cloud inference bills that scale with usage — a 240W card with 384 GB of memory is a fundamentally different economic proposition. It collapses what used to require a multi-GPU cluster into a single PCIe slot.
Skymizer plans to demo the HTX301 at Computex. Until independent benchmarks land, the claims should be treated as exactly that — claims. But if the numbers hold up under real-world load, this is the kind of card that quietly redraws the map of where serious AI workloads can live.
For the meantime, NVIDIA still owns training, rendering, and the ecosystem. The HTX301 isn't trying to win those battles. It's making a different bet: that for inference, the future belongs to specialized silicon — not to the most powerful GPU you can fit in a slot.