Question 1

What is GPU (Graphics Processing Unit)?

Accepted Answer

A processor with thousands of small cores optimised for parallel matrix arithmetic — the operation that dominates neural network training and inference. EMARQUE deploys NVIDIA RTX 5090 (consumer), RTX PRO 6000 Blackwell (workstation), and DGX-class (datacentre) GPUs depending on workload and budget.

Question 2

What is HBM3e (High Bandwidth Memory 3 extended)?

Accepted Answer

Stacked DRAM mounted directly next to the GPU die, delivering 4–5× the memory bandwidth of GDDR7. Used on NVIDIA H100, H200, B200, B300, GB200, and GB300 cards. The 'e' variant adds 50% more capacity per stack vs HBM3. Required for the largest models because bandwidth, not capacity, is usually the inference bottleneck.

Question 3

What is GDDR7?

Accepted Answer

The seventh-generation graphics DDR memory standard, used on consumer and workstation GPUs (RTX 5090, RTX PRO 6000 Blackwell). Slower than HBM3e but far cheaper and available on cards that fit a standard workstation chassis. Sufficient for inference on 7B–70B models; HBM-class becomes necessary above that.

Question 4

What is VRAM (Video RAM)?

Accepted Answer

The memory pool on a GPU — either GDDR7 or HBM3e depending on tier. The single most important spec for AI workloads: a model that doesn't fit in VRAM either runs slowly via CPU spillover or fails to load. A quantized 70B model needs ~40 GB; full-precision 70B needs ~140 GB.

Question 5

What is PCIe Gen5?

Accepted Answer

The fifth-generation PCI Express bus standard, doubling Gen4's bandwidth to 64 GB/s per ×16 slot. Required to keep modern GPUs fed with data from CPU and storage. EMARQUE AI Server uses Gen5 throughout for GPU, NVMe, and networking.

Question 6

What is NVLink?

Accepted Answer

NVIDIA's proprietary high-bandwidth interconnect between GPUs in the same chassis — 10–20× faster than PCIe Gen5. Allows two or more GPUs to share memory coherently so a model larger than one GPU's VRAM can run as if on a single device. Present on DGX systems and HGX OEMs; absent on PCIe-only RTX PRO 6000 SE configurations.

Question 7

What is NVLink-C2C (NVLink Chip-to-Chip)?

Accepted Answer

NVIDIA's coherent CPU-to-GPU interconnect introduced on Grace platforms (DGX Spark, DGX Station, GB200, GB300). The CPU and GPU share a single memory address space at NVLink bandwidth — eliminating the PCIe data-copy step that bottlenecks traditional CPU+GPU systems. Roughly equivalent to Apple's unified memory but at server scale.

Question 8

What is NVL72?

Accepted Answer

An NVIDIA rack-scale platform pairing 72 GPUs with 36 CPUs in a single liquid-cooled cabinet, connected by 5th-generation NVLink switching. The whole rack appears to software as one giant GPU with petabyte-class memory bandwidth. EMARQUE sells GB300 NVL72 and Vera Rubin NVL72 as turnkey AI factory systems.

Question 9

What is MIG (Multi-Instance GPU)?

Accepted Answer

NVIDIA's hardware feature that partitions a single physical GPU into up to 7 isolated logical instances, each with its own memory, compute, and cache. Lets one card serve multiple tenants or workloads with full isolation — useful for multi-tenant inference where each tenant has predictable but small VRAM needs.

Question 10

What is EPYC?

Accepted Answer

AMD's server CPU family. EMARQUE AI Server uses EPYC 9745 / 9755 (Zen 5 'Turin', 128 cores) for high core count and PCIe Gen5 lane availability — important when feeding 8 GPUs simultaneously.

Question 11

What is Threadripper PRO?

Accepted Answer

AMD's workstation CPU family — fewer cores than EPYC server chips but with full ECC memory support and 128 PCIe lanes. EMARQUE AI PRO 500 builds on Threadripper PRO 7975WX / 7995WX for single-workstation deployments where multi-GPU PCIe routing is the bottleneck.

Question 12

What is Xeon Platinum?

Accepted Answer

Intel's flagship server CPU line. Granite Rapids generation (Xeon Platinum 8570) is the Intel-side option on EMARQUE AI Server for buyers standardised on Intel platforms. Performance comparable to EPYC in most AI host scenarios; choice often driven by existing fleet rather than benchmark.

Question 13

What is ECC memory (Error-Correcting Code memory)?

Accepted Answer

Server-grade DRAM that detects and corrects single-bit errors automatically. Required for 24/7 production deployments where a memory glitch can corrupt a model or crash a long training run. EMARQUE AI PRO 500 and AI Server use ECC throughout; AI Work 100 uses consumer non-ECC for cost.

Question 14

What is RDIMM (Registered DIMM)?

Accepted Answer

ECC server memory with an extra register chip that improves stability at high capacity. Allows up to 2 TB of RAM per server. Used on EMARQUE AI Server and DGX-class systems.

Question 15

What is NVMe Gen5?

Accepted Answer

Fifth-generation NVMe storage delivering up to 14 GB/s per drive. Required to feed the GPU complex during dataset loading and checkpoint saves. EMARQUE configurations use Gen5 NVMe in RAID 0 or 1 for primary storage.

Question 16

What is RDMA (Remote Direct Memory Access)?

Accepted Answer

Network protocol that lets one server read or write another server's RAM without involving the host CPU. Cuts inter-node latency to single-digit microseconds — essential for multi-node distributed training. Implemented on InfiniBand and on RoCE (RDMA over Converged Ethernet).

Question 17

What is InfiniBand HDR / NDR?

Accepted Answer

High-throughput networking standard used to scale AI training across many servers. HDR runs at 200 Gbps per link; NDR at 400 Gbps. EMARQUE configures NDR InfiniBand on multi-server AI Factory deployments where training jobs span multiple GB300 NVL72 racks.

Question 18

What is LLM (Large Language Model)?

Accepted Answer

A neural network trained on large text corpora to generate or understand language. EMARQUE focuses on open-weight LLMs that can be deployed on-prem: Llama, DeepSeek, Mistral, Qwen, GPT-OSS, and customer fine-tunes.

Question 19

What is RAG (Retrieval-Augmented Generation)?

Accepted Answer

A pattern where an LLM is given relevant excerpts from the customer's documents at query time, instead of relying only on what was in its training data. Allows the model to answer questions about private corporate data without retraining. The dominant on-prem use case EMARQUE deploys.

Question 20

What is Fine-tuning?

Accepted Answer

Continuing to train a pre-trained base model on a smaller, domain-specific dataset so it learns the customer's vocabulary, style, or task. EMARQUE supports full fine-tuning (rewrites all weights) and parameter-efficient methods like LoRA / QLoRA (touches a small adapter only).

Question 21

What is LoRA (Low-Rank Adaptation)?

Accepted Answer

A fine-tuning technique that freezes the base model weights and trains a small adapter alongside — typically 0.1–1% the size of the base model. Cuts training cost by 10–100× and lets one base model carry multiple swappable adapters (one per task or customer).

Question 22

What is QLoRA?

Accepted Answer

LoRA combined with 4-bit quantization of the base model weights during training. Halves memory use again, enabling fine-tuning of 70B-class models on a single workstation GPU like the RTX PRO 6000 Blackwell. EMARQUE AI PRO 500 is sized for QLoRA workflows.

Question 23

What is Quantization?

Accepted Answer

Compressing model weights from 16-bit to 8-bit, 4-bit, or smaller. A 4-bit quantized 70B model fits in ~40 GB of VRAM vs ~140 GB for full precision — making it deployable on a workstation instead of a server. Modest accuracy trade-off; usually invisible in production.

Question 24

What is Inference?

Accepted Answer

Running a trained model to produce predictions or text — the production phase as opposed to training. Inference is typically the bulk of an on-prem deployment's GPU-time; sizing for inference (latency, throughput, concurrency) drives most EMARQUE hardware recommendations.

Question 25

What is vLLM?

Accepted Answer

An open-source inference server optimised for LLM serving — implements PagedAttention, continuous batching, and KV-cache management to deliver 2–4× higher throughput than naive PyTorch serving. EMARQUE pre-installs vLLM on every system that ships.

Question 26

What is TGI (Text Generation Inference)?

Accepted Answer

Hugging Face's inference server, an alternative to vLLM. Slightly easier ops integration; comparable throughput on most workloads. EMARQUE supports either depending on customer preference and existing tooling.

Question 27

What is Triton?

Accepted Answer

NVIDIA's general-purpose inference server. Multi-framework (PyTorch, TensorFlow, ONNX, TensorRT) and used when a deployment needs to serve mixed model types from one endpoint. Most appropriate for vision + voice + LLM combined services.

Question 28

What is TFLOPS / PFLOPS (Tera- / Peta-FLoating-point Operations Per Second)?

Accepted Answer

Theoretical compute throughput. TFLOPS = trillion ops/sec; PFLOPS = thousand TFLOPS. Useful for first-pass comparison between GPUs but inference performance depends as much on memory bandwidth, batching, and software stack. Numbers in EMARQUE spec sheets are NVIDIA-published.

Question 29

What is DGX OS?

Accepted Answer

NVIDIA's curated Ubuntu-based OS shipped on every DGX system — pre-configured with drivers, CUDA, NCCL, Docker, and the NVIDIA AI Enterprise software stack. EMARQUE delivers DGX systems on stock DGX OS with optional customer-specific application layers added on top.

Question 30

What is CUDA (Compute Unified Device Architecture)?

Accepted Answer

NVIDIA's programming platform that lets software use GPU compute. Every AI framework (PyTorch, TensorFlow, JAX) runs on CUDA. Specific version compatibility matters — EMARQUE pins CUDA and driver versions to validated combinations and ships systems with the pin documented.

Question 31

What is NCCL (NVIDIA Collective Communications Library)?

Accepted Answer

GPU-to-GPU communication library used by distributed training. Implements collective operations (all-reduce, broadcast, all-gather) over NVLink and InfiniBand. Required for any multi-GPU training job; EMARQUE validates NCCL throughput as part of acceptance testing on AI Server and DGX configurations.

Question 32

What is Kubernetes?

Accepted Answer

Container orchestration platform. EMARQUE supports Kubernetes deployments on AI Server and DGX systems where customers want pod-based GPU allocation, autoscaling, and lifecycle management. Most lightweight deployments don't need it; production multi-team deployments usually do.

AI Infrastructure Glossary

GPU + memory

CPU + system

Models + inference

Software + platform

Tell us about your workload.

Key Account Manager

Request for Quotation