AI Infrastructure for the Agentic Era

Make AI ASICs the
Universal Substrate
for Agentic Workloads

We enable AI ASICs to accelerate the full toolchain of agentic AI—not just inference—eliminating multi-hardware orchestration and delivering order-of-magnitude gains in serving efficiency.

10× Faster end-to-end
CPU Off Critical path
Unified Execution substrate

Agentic AI is Bottlenecked by
Infrastructure, Not Intelligence

The industry is shifting from single-model serving to multi-step agentic execution, but the underlying infrastructure is still built for AI-only computation.

Heterogeneous Stack Tax

Today's agentic systems split execution across CPUs, GPUs, and AI ASICs. Every device hop adds latency, bandwidth cost, and engineering complexity.

Latency Compounds

Multi-step agents call tools dozens of times per query. Each CPU round-trip adds milliseconds that compound into seconds of end-to-end latency.

Wasted Compute Potential

AI ASICs have massive throughput—matrix engines, vector units, high-bandwidth memory—but only use them for model forward passes, leaving capacity idle during tool execution.

Today's Agentic Serving
CPUOrchestration + Tools
AI ASICInference only
CPUCrypto / Data processing
Multiple device hops, CPU on critical path, data movement overhead
vs
CROSS
AI ASICInference + Tools + Crypto
Unified execution, no device hops, data stays on-chip

Run the Full Agent Loop
Where the Data Already Lives

We build the compiler and runtime framework that maps non-AI workloads onto AI ASICs, turning them into a general-purpose agentic serving substrate.

01

Workload Analysis

We systematically profile the agentic tool loop to identify compute-intensive operations that can benefit from ASIC acceleration—cryptographic primitives, data transforms, structured parsing, and more.

02

Compilation Framework

Our compiler maps non-AI operations to ASIC hardware primitives—converting high-precision modular arithmetic to INT8 matrix multiplies, aligning memory layouts to eliminate fine-grained shuffles, and scheduling operations for maximum throughput.

03

Unified Runtime

A lightweight runtime orchestrates both AI inference and tool execution on the same accelerator, keeping intermediate data on-chip and removing the CPU from the critical path of multi-step agent workflows.

04

Cloud Integration

Drop-in integration with existing cloud AI infrastructure. Compatible with TPUs, Trainium, MAIA, and similar platforms. No hardware modifications required—pure software unlock of latent ASIC capability.

Unlocking AI ASIC Capabilities
Beyond AI Inference

AI ASICs should not be defined by their original purpose, but by their computational capabilities—high-throughput matrix engines, vector units, and efficient coarse-grained data movement.

Multi-Layer Compilation Stack

Five-layer framework: Packing, Mapping, Scheduling, Decomposing, and Binding—systematically transforming arbitrary compute workloads into ASIC-native operations.

Homomorphic Encryption Kernel Library

Production-ready implementations of HE operations (CKKS): multiplication, rotation, rescaling, basis conversion, and NTT acceleration—all running natively on AI ASICs.

View on GitHub →

Zero Knowledge Proof Primitive Acceleration Library

Accelerates ZKP primitives on AI ASICs—enabling proof generation and verification workloads to run on the same unified substrate alongside AI inference and HE operations.

View on GitHub →

Cryptography on AI ASICs:
CROSS Framework

Our first proof-of-concept: running the full CKKS homomorphic encryption stack natively on Google TPUs. Presented at ASPLOS 2026 and published at HPCA 2026.

CROSS: CKKS Homomorphic Encryption on TPU
Homomorphic Encryption

Full CKKS scheme: encode, encrypt, compute, decrypt—all on TPU

TPU-Native Execution

Matrix engines (MXU) run cryptographic kernels via BAT transformation

Verified Correctness

Bit-exact match with OpenFHE, interoperable ciphertext format

Order-of-Magnitude Speedup

Dramatically faster than CPU-based HE for privacy-preserving AI

Publications & Presentations

They Optimize Coordination.
We Optimize Execution.

Existing solutions improve the software layer around agents. We go one layer deeper—making the full agentic compute run natively on AI infrastructure.

Company Layer What They Do
LangChain / LangGraph Control Plane Durable execution, memory, streaming, human-in-the-loop orchestration
LlamaIndex Control Plane Event-driven, async-first workflow engine for multi-step agents
CrewAI / Composio Control Plane Multi-agent collaboration, tool integrations, governance
OpenAI / Anthropic / Cloud Control Plane SDKs, managed services for tools, sessions, tracing, deployment
CROSS Execution Substrate Move the full agent loop onto AI ASICs—inference + tools + crypto on unified hardware
Key Insight: Everyone else optimizes how agents are coordinated. We optimize where the agentic computation runs. These approaches are complementary—better orchestration on top of our execution substrate yields compounding gains.

Built for Cloud Providers
With In-House AI Chips

Google Cloud

TPU v4 / v5e / v6e

AWS

Trainium / Inferentia

Microsoft

MAIA 100

Meta

MTIA

Cloud vendors are the first to feel the pain of agentic serving—they must support increasingly complex workflows where the bottleneck is no longer just model inference, but the surrounding tools and system functions.

Deep Expertise in
Hardware-Software Co-Design

Our team brings together expertise in AI accelerator architecture, compiler design, cryptography, and cloud infrastructure from leading research institutions and industry.

Jianming Tong

Jianming Tong

Co-Founder

Jingtian Dang

Jingtian Dang

Co-Founder

Ready to Accelerate
Your Agentic Infrastructure?

We're working with cloud providers to bring unified agentic serving to production. Let's talk about how CROSS can transform your AI ASIC fleet.