Skymizer HTX301: The PCIe Card That Runs 700B LLMs at Just 240W

A New Player Enters the AI Inference Arena

While NVIDIA and AMD continue their power-hungry race for AI dominance, a Taiwanese company called Skymizer quietly stepped onto the stage with a card that flips the conversation. Their new HTX301 is not another GPU — it's a purpose-built PCIe AI accelerator designed for one job: running massive language models locally, without the cluster, the cooling tower, or the electric bill.

What's Inside the Card

The HTX301 is built on Skymizer's HyperThought platform, powered by a next-generation LPU (Language Processing Unit) IP. Each PCIe card packs six HTX301 chips working in concert, paired with up to 384 GB of memory — enough to hold a 700-billion-parameter model end-to-end on a single card.

Skymizer made a deliberate choice here: instead of expensive HBM or GDDR7, the card uses standard LPDDR4 and LPDDR5 DRAM. That decision keeps the bill of materials and the power envelope dramatically lower, while still feeding the chip enough bandwidth for inference workloads.

The Numbers That Matter

Memory: Up to 384 GB (LPDDR4 / LPDDR5)
Max model size: 700 billion parameters on a single card
Throughput: 240 tokens/sec on Llama2 7B prefill (single chip), scaling to 1,200 tokens/sec across multiple chips
Power: ~240W
Weight compression: beats open-source llama.cpp by 9% to 17.8%
KV-cache compression: less than 0.06% to 3.52% perplexity loss
Scalability: from 1 chip / 32 GB up to 6 chips / 384 GB, supporting models from 4B to 700B parameters

The Question Everyone Will Ask

If a 240W card can run a 700-billion-parameter model, why does NVIDIA's flagship RTX PRO 6000 Blackwell still draw 600W and only carry 96 GB of VRAM?

The Answer Is in What Each Card Was Built For

The two cards aren't actually competitors — they're built for different problems, and that's exactly why the comparison is so revealing.

Specification	Skymizer HTX301	NVIDIA RTX PRO 6000 Blackwell
Memory	Up to 384 GB LPDDR4/5	96 GB GDDR7 ECC
Max model size (single card)	700B parameters	~70B parameters (Q4)
Power	~240W	600W
Architecture	LPU (inference-only)	Blackwell GPU (general-purpose)
Primary use case	LLM inference	Training + inference + rendering + simulation
Approx. price	TBD (workstation-tier)	$8,500 – $9,200

The RTX PRO 6000 Blackwell is a generalist. With 24,064 CUDA cores, fifth-generation Tensor Cores with FP4 support, and the full Blackwell architecture, it's the gold standard for studios that need AI training, 3D rendering, scientific simulation, and inference — all from one card. That versatility costs power and memory bandwidth, which is why it sits at 600W and 96 GB.

The HTX301 does only one thing: it decodes tokens. By stripping out training capability, graphics pipelines, and general-purpose compute, Skymizer can spend its silicon and power budget on memory capacity and decode efficiency. That's how it fits a 700B model on a single card at less than half the power.

What This Means for On-Prem AI

For companies that want to run their own copilots, code assistants, or RAG pipelines on private infrastructure — without paying cloud inference bills that scale with usage — a 240W card with 384 GB of memory is a fundamentally different economic proposition. It collapses what used to require a multi-GPU cluster into a single PCIe slot.

Skymizer plans to demo the HTX301 at Computex. Until independent benchmarks land, the claims should be treated as exactly that — claims. But if the numbers hold up under real-world load, this is the kind of card that quietly redraws the map of where serious AI workloads can live.

For the meantime, NVIDIA still owns training, rendering, and the ecosystem. The HTX301 isn't trying to win those battles. It's making a different bet: that for inference, the future belongs to specialized silicon — not to the most powerful GPU you can fit in a slot.