Skip to content
EMARQUE.AI
Resources

AI Infrastructure Glossary

Plain-English definitions of the technical terms used across EMARQUE product and solution pages. Aimed at procurement, finance, and IT decision-makers — the people who sign the purchase order — not just the ML engineers who specify it.

GPU + memory

The compute and memory architecture inside every AI server. These terms appear on every EMARQUE product page.

GPU(Graphics Processing Unit)
A processor with thousands of small cores optimised for parallel matrix arithmetic — the operation that dominates neural network training and inference. EMARQUE deploys NVIDIA RTX 5090 (consumer), RTX 6000 Ada / Blackwell (workstation), and DGX-class (datacentre) GPUs depending on workload and budget.
HBM3e(High Bandwidth Memory 3 extended)
Stacked DRAM mounted directly next to the GPU die, delivering 4–5× the memory bandwidth of GDDR7. Used on NVIDIA H100, H200, B200, B300, GB200, and GB300 cards. The 'e' variant adds 50% more capacity per stack vs HBM3. Required for the largest models because bandwidth, not capacity, is usually the inference bottleneck.
GDDR7
The seventh-generation graphics DDR memory standard, used on consumer and workstation GPUs (RTX 5090, RTX PRO 6000 Blackwell). Slower than HBM3e but far cheaper and available on cards that fit a standard workstation chassis. Sufficient for inference on 7B–70B models; HBM-class becomes necessary above that.
VRAM(Video RAM)
The memory pool on a GPU — either GDDR7 or HBM3e depending on tier. The single most important spec for AI workloads: a model that doesn't fit in VRAM either runs slowly via CPU spillover or fails to load. A quantized 70B model needs ~40 GB; full-precision 70B needs ~140 GB.
PCIe Gen5
The fifth-generation PCI Express bus standard, doubling Gen4's bandwidth to 64 GB/s per ×16 slot. Required to keep modern GPUs fed with data from CPU and storage. EMARQUE AI Server uses Gen5 throughout for GPU, NVMe, and networking.
NVLink
NVIDIA's proprietary high-bandwidth interconnect between GPUs in the same chassis — 10–20× faster than PCIe Gen5. Allows two or more GPUs to share memory coherently so a model larger than one GPU's VRAM can run as if on a single device. Present on DGX systems and HGX OEMs; absent on PCIe-only RTX PRO 6000 SE configurations.
NVLink-C2C(NVLink Chip-to-Chip)
NVIDIA's coherent CPU-to-GPU interconnect introduced on Grace platforms (DGX Spark, DGX Station, GB200, GB300). The CPU and GPU share a single memory address space at NVLink bandwidth — eliminating the PCIe data-copy step that bottlenecks traditional CPU+GPU systems. Roughly equivalent to Apple's unified memory but at server scale.
NVL72
An NVIDIA rack-scale platform pairing 72 GPUs with 36 CPUs in a single liquid-cooled cabinet, connected by 5th-generation NVLink switching. The whole rack appears to software as one giant GPU with petabyte-class memory bandwidth. EMARQUE sells GB300 NVL72 and Vera Rubin NVL72 as turnkey AI factory systems.
MIG(Multi-Instance GPU)
NVIDIA's hardware feature that partitions a single physical GPU into up to 7 isolated logical instances, each with its own memory, compute, and cache. Lets one card serve multiple tenants or workloads with full isolation — useful for multi-tenant inference where each tenant has predictable but small VRAM needs.

CPU + system

The host platform around the GPUs — CPU, memory, storage, and power.

EPYC
AMD's server CPU family. EMARQUE AI Server uses EPYC 9745 / 9755 (Zen 5 'Turin', 128 cores) for high core count and PCIe Gen5 lane availability — important when feeding 8 GPUs simultaneously.
Threadripper PRO
AMD's workstation CPU family — fewer cores than EPYC server chips but with full ECC memory support and 128 PCIe lanes. EMARQUE AI PRO 500 builds on Threadripper PRO 7975WX / 7995WX for single-workstation deployments where multi-GPU PCIe routing is the bottleneck.
Xeon Platinum
Intel's flagship server CPU line. Granite Rapids generation (Xeon Platinum 8570) is the Intel-side option on EMARQUE AI Server for buyers standardised on Intel platforms. Performance comparable to EPYC in most AI host scenarios; choice often driven by existing fleet rather than benchmark.
ECC memory(Error-Correcting Code memory)
Server-grade DRAM that detects and corrects single-bit errors automatically. Required for 24/7 production deployments where a memory glitch can corrupt a model or crash a long training run. EMARQUE AI PRO 500 and AI Server use ECC throughout; AI Work 100 uses consumer non-ECC for cost.
RDIMM(Registered DIMM)
ECC server memory with an extra register chip that improves stability at high capacity. Allows up to 2 TB of RAM per server. Used on EMARQUE AI Server and DGX-class systems.
NVMe Gen5
Fifth-generation NVMe storage delivering up to 14 GB/s per drive. Required to feed the GPU complex during dataset loading and checkpoint saves. EMARQUE configurations use Gen5 NVMe in RAID 0 or 1 for primary storage.
RDMA(Remote Direct Memory Access)
Network protocol that lets one server read or write another server's RAM without involving the host CPU. Cuts inter-node latency to single-digit microseconds — essential for multi-node distributed training. Implemented on InfiniBand and on RoCE (RDMA over Converged Ethernet).
InfiniBand HDR / NDR
High-throughput networking standard used to scale AI training across many servers. HDR runs at 200 Gbps per link; NDR at 400 Gbps. EMARQUE configures NDR InfiniBand on multi-server AI Factory deployments where training jobs span multiple GB300 NVL72 racks.

Models + inference

The software side — model architectures, quantization, fine-tuning, and serving.

LLM(Large Language Model)
A neural network trained on large text corpora to generate or understand language. EMARQUE focuses on open-weight LLMs that can be deployed on-prem: Llama, DeepSeek, Mistral, Qwen, GPT-OSS, and customer fine-tunes.
RAG(Retrieval-Augmented Generation)
A pattern where an LLM is given relevant excerpts from the customer's documents at query time, instead of relying only on what was in its training data. Allows the model to answer questions about private corporate data without retraining. The dominant on-prem use case EMARQUE deploys.
Fine-tuning
Continuing to train a pre-trained base model on a smaller, domain-specific dataset so it learns the customer's vocabulary, style, or task. EMARQUE supports full fine-tuning (rewrites all weights) and parameter-efficient methods like LoRA / QLoRA (touches a small adapter only).
LoRA(Low-Rank Adaptation)
A fine-tuning technique that freezes the base model weights and trains a small adapter alongside — typically 0.1–1% the size of the base model. Cuts training cost by 10–100× and lets one base model carry multiple swappable adapters (one per task or customer).
QLoRA
LoRA combined with 4-bit quantization of the base model weights during training. Halves memory use again, enabling fine-tuning of 70B-class models on a single workstation GPU like the RTX 6000 Ada. EMARQUE AI PRO 500 is sized for QLoRA workflows.
Quantization
Compressing model weights from 16-bit to 8-bit, 4-bit, or smaller. A 4-bit quantized 70B model fits in ~40 GB of VRAM vs ~140 GB for full precision — making it deployable on a workstation instead of a server. Modest accuracy trade-off; usually invisible in production.
Inference
Running a trained model to produce predictions or text — the production phase as opposed to training. Inference is typically the bulk of an on-prem deployment's GPU-time; sizing for inference (latency, throughput, concurrency) drives most EMARQUE hardware recommendations.
vLLM
An open-source inference server optimised for LLM serving — implements PagedAttention, continuous batching, and KV-cache management to deliver 2–4× higher throughput than naive PyTorch serving. EMARQUE pre-installs vLLM on every system that ships.
TGI(Text Generation Inference)
Hugging Face's inference server, an alternative to vLLM. Slightly easier ops integration; comparable throughput on most workloads. EMARQUE supports either depending on customer preference and existing tooling.
Triton
NVIDIA's general-purpose inference server. Multi-framework (PyTorch, TensorFlow, ONNX, TensorRT) and used when a deployment needs to serve mixed model types from one endpoint. Most appropriate for vision + voice + LLM combined services.
TFLOPS / PFLOPS(Tera- / Peta-FLoating-point Operations Per Second)
Theoretical compute throughput. TFLOPS = trillion ops/sec; PFLOPS = thousand TFLOPS. Useful for first-pass comparison between GPUs but inference performance depends as much on memory bandwidth, batching, and software stack. Numbers in EMARQUE spec sheets are NVIDIA-published.

Software + platform

The systems software that runs underneath an on-prem AI deployment.

DGX OS
NVIDIA's curated Ubuntu-based OS shipped on every DGX system — pre-configured with drivers, CUDA, NCCL, Docker, and the NVIDIA AI Enterprise software stack. EMARQUE delivers DGX systems on stock DGX OS with optional customer-specific application layers added on top.
CUDA(Compute Unified Device Architecture)
NVIDIA's programming platform that lets software use GPU compute. Every AI framework (PyTorch, TensorFlow, JAX) runs on CUDA. Specific version compatibility matters — EMARQUE pins CUDA and driver versions to validated combinations and ships systems with the pin documented.
NCCL(NVIDIA Collective Communications Library)
GPU-to-GPU communication library used by distributed training. Implements collective operations (all-reduce, broadcast, all-gather) over NVLink and InfiniBand. Required for any multi-GPU training job; EMARQUE validates NCCL throughput as part of acceptance testing on AI Server and DGX configurations.
Kubernetes
Container orchestration platform. EMARQUE supports Kubernetes deployments on AI Server and DGX systems where customers want pod-based GPU allocation, autoscaling, and lifecycle management. Most lightweight deployments don't need it; production multi-team deployments usually do.
Talk to EMARQUE

Tell us about your workload.

Model size, concurrency, latency budget, deployment site. EMARQUE returns a quote in MYR within one Malaysian business day, sized to the workload — not the salesperson’s quota.

  1. 01

    Key Account Manager

    +6012 627 2280
  2. 02

    Request for Quotation

    business@emarque.co