Loading…
RMACC 2026 has ended
Venue: Simplot A clear filter
arrow_back View All Dates
Tuesday, May 12
 

2:30pm MDT

Promoting Cloud Native Development and Deployment at NCAR
Tuesday May 12, 2026 2:30pm - 3:00pm MDT
Modern research environments often face increasing demands for agility and reproducibility, hindered by traditional, monolithic software architectures. This talk explores the transition to cloud-native computing by leveraging containers and Kubernetes to create portable, consistent computational infrastructures that decouple applications from underlying hardware. By adopting these technologies, research teams can ensure their workflows operate identically across diverse environments, from local development machines to production-grade clusters, thereby eliminating the notorious "works on my machine" problem.

The National Center for Atmospheric Research has deployed an internal Kubernetes based platform called CIRRUS and has been working to further modernize application development and deployments by promoting CI, CD, and GitOps practices. By implementing automated testing and declarative infrastructure management, teams can drastically reduce manual errors and accelerate the deployment of new methodologies. Ultimately, embracing these methodologies empowers researchers to focus on scientific discovery rather than logistical bottlenecks.
Speakers
Tuesday May 12, 2026 2:30pm - 3:00pm MDT
Simplot A

3:15pm MDT

Accelerating Innovation with Google Cloud’s AI Infrastructure
Tuesday May 12, 2026 3:15pm - 4:15pm MDT
This presentation provides a technical overview of machine learning infrastructure on Google Cloud Platform (GCP), focused on hardware and operational efficiency.
We will discuss evaluating hardware accelerators based on specific workload requirements and take a deeper dive into Google Tensor Processing Units (TPUs), specifically examining their architecture for large-scale matrix operations and the optimization of FLOPS per dollar. We will then look at other key considerations for running ML lab environments in GCP. Topics include:
  • Infrastructure Selection: An overview of GCP’s ML-optimized compute and storage offerings.
  • Operational Management: Strategies for capacity planning, cost control, and maximizing goodput.
  • Frameworks and libraries enabling model training and serving.

Speakers
Tuesday May 12, 2026 3:15pm - 4:15pm MDT
Simplot A

4:30pm MDT

CANCELLED- Redesigning the Hellgate HPC: Lessons in overcoming growing pains and purgatory on mid-sized clusters.
Tuesday May 12, 2026 4:30pm - 5:00pm MDT
Over the past four years, the University of Montana Hellgate HPC has rapidly grown from a ragtag collection of individual lab systems to the largest computing resource at our institution, comprising roughly 3,000 compute cores, 130 GPUs, and over 200 users. However, this development has exceeded the scale of our original HPC design, with consequences to stability, performance, and user experience. These issues were addressed by revisiting our stateless provisioning strategies, network/hardware topology, authentication flow, head node stack, and administrative standards of practice. We discuss our experiences in employing Ansible and Git to create scalable infrastructure and sustainable SOPs, publishing a living user documentation base, redesigning physical layout, and the obstacles, decisions, and surprises we encountered along the way.
Speakers
Tuesday May 12, 2026 4:30pm - 5:00pm MDT
Simplot A
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -