← all jobs

[Remote] Forward Deployed Engineer: AI + HPC

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Cedana is a company focused on maximizing AI and HPC cluster utilization and reliability. As a Forward Deployed Engineer, you will lead technical engagements with customers, deploying Cedana's solutions in various environments and optimizing platform performance.

Responsibilities

  • Engineer solutions at client sites: Lead customer integrations. Install, configure, and deploy Cedana into SLURM, Kubernetes, and Dynamo environments
  • Drive product innovation from the field: Identify technical gaps while embedded with clients, then provide product feedback for new capabilities that become core product features
  • Measure and optimize platform performance: Measure reliability, throughput, and performance using our internal tools. Design and implement policy-based migration automations to optimize reliability, throughput, and performance
  • Own critical deployments: Ensure our platform performs reliably for clients' critical operations, debugging issues across the full stack. Debug install issues against unfamiliar customer infrastructure, and escalate to engineering when necessary
  • Improve scalability : Build and own the internal installation playbook so that the second customer in each segment is onboarded faster than the first
  • Respect our customers : Understand how to make their lives easier and minimize their time and overhead

Skills

  • Team management experience. Requires strong project and time management skills, delivering milestones on time, and effective
  • 3-10 years of software engineering experience with a track record of configuring and managing SLURM deployments
  • A multi-month enterprise or research deployment you led end-to-end, from scoping through signoff. You write effective status updates to keep your team updated and on schedule
  • Production experience in standing up SLURM in a customer or research environment. You've configured slurmctld, slurmdbd, accounting, cgroup integration, and GPU resource selection
  • Strong Linux fundamentals of systemd, cgroups v2, namespaces, networking, filesystems, firewalls, kernel module loading, PAM session modules. You can read strace and dmesg output and form a hypothesis
  • Experience with Kubernetes operations including operators, CRDs, CNIs, device plugins, and node-level debugging. You've debugged a controller in production even if you haven't written one from scratch
  • Experience in an HPC integrator field team
  • Client-facing technical experience working directly with customers
  • Background in national lab user services or university research computing
  • You've developed SLURM plug-ins, and understand their architecture and how they fit into the overall platform
  • Familiarity with CRIU, container runtimes, GPU driver internals, distributed training stacks
  • Hands-on with NVIDIA Dynamo, Determined, Ray, Kueue, KServe, or comparable AI orchestration
  • Contributed to open-source schedulers or job systems (SLURM, Flux, Torque, PBS)
  • A passion for debugging a weird cgroup issue at 11pm just as much as writing a clean install playbook the next morning

Benefits

  • 100% covered medical, dental, and vision insurance for employees and families
  • Unlimited PTO policy
  • 401K Plan

Company Overview

  • Cedana is VMWare for GPUs. We enable enterprises to orchestrate and operationalize intelligence precisely, reliably, and efficiently. It was founded in 2023, and is headquartered in New York, New York, USA, with a workforce of 2-10 employees. Its website is https://www.cedana.ai.
  • More open positions

    [Remote] Customer Support Specialist

    Work from home Full-time role

    [Remote] Strategy & Operations Manager, Revenue Operations

    Work from home Full-time role

    [Remote] Consultant Relations

    Work from home Full-time role

    [Remote] Member Operations Specialist

    Work from home Full-time role

    [Remote] Expert Systems Reliability Engineer/SCM Platform Owner - Central Technology

    Work from home Full-time role

    Licensed Mental Health Counselor Associate (LMHCA) – Remote

    Work from home Full-time role

    Behavioral Health Care Coordinator / Scheduler (Remote)

    Work from home Full-time role

    Entry-Level Remote Customer Success Associate – No Phone, No Experience

    Work from home Full-time role

    Lead, Events Operations

    Work from home Full-time role

    (Senior) AWS Cloud Engineer (m/w/d)

    Work from home Full-time role

    Remote Data Entry Specialist – Clinical Pharmacy Services – $27/hr – Full‑Time Remote Position with careerzynith

    Work from home Full-time role

    Social Media Manager – Instagram, Part-Time

    Work from home Full-time role

    Group Product Manager, Digital Lending

    Work from home Full-time role

    Experienced Business Operations Specialist – Developer Support

    Work from home Full-time role

    Experienced Full Stack Data Entry Specialist – Remote Opportunity with careerzynith

    Work from home Full-time role

    Junior C# .NET Developer | Energy Trading | ETRM | £60,000 | London (Hybrid)

    Work from home Full-time role

    Advanced Markets, Case Management (Bilingual - English/French)

    Work from home Full-time role

    [Remote] Senior Workday Consultant with Integrations Experience - W2 Position

    Work from home Full-time role

    Remote Math Teacher (in Costa Rica)

    Work from home Full-time role

    Sr Director, Change Management

    Work from home Full-time role

    Senior Software Engineering Manager, Patient & Provider Experiences

    Work from home Full-time role