Search by job, company or skills

MaiStorage

Senior Solution Consultant, Generative AI / Storage Solution

new job description bg glownew job description bg glownew job description bg svg
  • Posted 14 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Key Responsibilities

  • Consultative Solution Design:Partner with clients to understand their specific Gen AI workloads, dissecting business goals to propose tailored on-premise infrastructure solutions (Compute, Storage, and Networking).
  • Technical Architecture & Collaboration:Act as the bridge between the client and our internal Engineering/R&D teams toarchitect robust AI clusters. Translate client requirements into technical specifications and feasible system designs.
  • Infrastructure Sizing & Proposal:Lead the creation of technical proposals, including Bill of Materials (BOM), capacity planning (storage/compute sizing), and total cost of ownership (TCO) analysis.
  • On-Premise Deployment & Integration:Oversee and assist in the hardware installation, rack configuration, and software stack deployment of high-performance AI systems and storage servers.
  • Technical Troubleshooting:Diagnose complex interoperability issues between AI accelerators (GPUs), storage fabrics, and software layers with the assistance of the Engineering team.
  • Documentation & Knowledge Transfer:Maintain detailed documentation of solution architectures, proof-of-concept (PoC) results, and technical resolutions.

Requirements

  • Education:Bachelor's degree or equivalent in Computer Science, Data Science, Computer Engineering, or a related field.
  • Gen AI & AI server Knowledge:Solid understanding of the Generative AI landscape (LLMs and multi-modal) and the High-Performance Computing (HPC) infrastructure required to train/run them.
  • Communication:Ability to articulate complex architectural concepts (e.g., cluster networking, storage throughput) to both C-level executives (layman) and IT Directors (technical).
  • Hardware Fluency:Deep familiarity with server components includingServer Motherboards, Enterprise CPUs (AMD EPYC/Intel Xeon), Data Center GPUs (NVIDIA H100/A100/L40s), High-speed RAM, and PCIe/NVLink interconnects.
  • Storage Expertise:Proven understanding of storage requirements for AI, including differences between Block, File, and Object storage, and the importance of IOPS/Throughput in model training.
  • Problem Solving:Strong analytical skills to troubleshoot bottlenecks in hardware performance or software compatibility.

Technical Skills

1. AI Server & Compute Infrastructure:

  • GPU Architecture:Knowledge of Multi-GPU configurations,
  • Cluster Management:Familiarity with HPC scheduling tools (Slurm) or container orchestration (Kubernetes/K8s) for AI workloads.
  • Linux Mastery:Advanced Linux command line proficiency (RHEL, Ubuntu Server), including kernel tuning and driver installation (NVIDIA Drivers, CUDA Toolkit).

2. Storage Server & Data Management:

  • High-Performance Storage:Understanding ofNVMe and NVMe-oF(NVMe over Fabrics) for low-latency data access.
  • File Systems:Familiarity with Parallel File Systems used in AI (e.g.,Lustre, GPFS/IBM Spectrum Scale, BeeGFS) or high-performance NAS (ZFS).
  • Object Storage:Knowledge of S3-compatible object storage for large datasets (e.g.,MinIO, Ceph).
  • RAID & Data Protection:Configuration of HW/SW RAID (0, 1, 5, 6, 10) for redundancy and performance optimization.

3. DevOps & MLOps:

  • Docker/Containerization (building and deploying AI containers).
  • Basic understanding of CI/CD pipelines for model deployment.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 137848467