Senior Solution Consultant, Generative AI / Storage Solution

Fresher

Save

Early Applicant

Job Description

Key Responsibilities

Consultative Solution Design:Partner with clients to understand their specific Gen AI workloads, dissecting business goals to propose tailored on-premise infrastructure solutions (Compute, Storage, and Networking).
Technical Architecture & Collaboration:Act as the bridge between the client and our internal Engineering/R&D teams toarchitect robust AI clusters. Translate client requirements into technical specifications and feasible system designs.
Infrastructure Sizing & Proposal:Lead the creation of technical proposals, including Bill of Materials (BOM), capacity planning (storage/compute sizing), and total cost of ownership (TCO) analysis.
On-Premise Deployment & Integration:Oversee and assist in the hardware installation, rack configuration, and software stack deployment of high-performance AI systems and storage servers.
Technical Troubleshooting:Diagnose complex interoperability issues between AI accelerators (GPUs), storage fabrics, and software layers with the assistance of the Engineering team.
Documentation & Knowledge Transfer:Maintain detailed documentation of solution architectures, proof-of-concept (PoC) results, and technical resolutions.

Requirements

Education:Bachelor's degree or equivalent in Computer Science, Data Science, Computer Engineering, or a related field.
Gen AI & AI server Knowledge:Solid understanding of the Generative AI landscape (LLMs and multi-modal) and the High-Performance Computing (HPC) infrastructure required to train/run them.
Communication:Ability to articulate complex architectural concepts (e.g., cluster networking, storage throughput) to both C-level executives (layman) and IT Directors (technical).
Hardware Fluency:Deep familiarity with server components includingServer Motherboards, Enterprise CPUs (AMD EPYC/Intel Xeon), Data Center GPUs (NVIDIA H100/A100/L40s), High-speed RAM, and PCIe/NVLink interconnects.
Storage Expertise:Proven understanding of storage requirements for AI, including differences between Block, File, and Object storage, and the importance of IOPS/Throughput in model training.
Problem Solving:Strong analytical skills to troubleshoot bottlenecks in hardware performance or software compatibility.

Technical Skills

1. AI Server & Compute Infrastructure:

GPU Architecture:Knowledge of Multi-GPU configurations,
Cluster Management:Familiarity with HPC scheduling tools (Slurm) or container orchestration (Kubernetes/K8s) for AI workloads.
Linux Mastery:Advanced Linux command line proficiency (RHEL, Ubuntu Server), including kernel tuning and driver installation (NVIDIA Drivers, CUDA Toolkit).

2. Storage Server & Data Management:

High-Performance Storage:Understanding ofNVMe and NVMe-oF(NVMe over Fabrics) for low-latency data access.
File Systems:Familiarity with Parallel File Systems used in AI (e.g.,Lustre, GPFS/IBM Spectrum Scale, BeeGFS) or high-performance NAS (ZFS).
Object Storage:Knowledge of S3-compatible object storage for large datasets (e.g.,MinIO, Ceph).
RAID & Data Protection:Configuration of HW/SW RAID (0, 1, 5, 6, 10) for redundancy and performance optimization.

3. DevOps & MLOps: