As demand for AI-enabled research continues to grow, institutions across higher education are facing a common challenge: how to develop the skilled workforce needed to operate advanced computing infrastructure and support researchers using AI tools. This session will highlight two complementary workforce development models designed to address this need. The RMACC Student System Administrator Cohort provides hands-on training in research computing operations, bringing together students from multiple institutions to gain practical experience supporting production high-performance computing environments. In parallel, the AI Unlocked workshop series—developed through the National Artificial Intelligence Research Resource Pilot—introduces researchers and technical staff to applied AI workflows and access to national-scale computing resources through national and regional training events. Panelists will discuss lessons learned in designing these programs, approaches for scaling training beyond a single institution, and how regional and national initiatives can work together to build sustainable AI workforce pipelines.
This presentation explores the mutually reinforcing relationship between artificial intelligence (AI) and high-performance computing (HPC) through applied research, education, and workforce development initiatives centered on Anvil, an NSF-funded national advanced computing resource. We highlight a 2025 NSF Research Experiences for Undergraduates (REU) case study in which undergraduate researchers used large language models (LLMs) to improve Anvil’s user support infrastructure by automatically generating FAQs from historical support tickets, demonstrating AI for Anvil. At the same time, Anvil’s scalable, production-grade environment enabled realistic AI model training and evaluation, illustrating Anvil for AI. Beyond research, the session addresses ethical AI implementation frameworks, governance considerations, and classroom integration. We also discuss expanding K-12 and educator outreach, including AI-enhanced CyberSafe Heroes and Code Explorers summer camps, and a new K-12 teacher in-service focused on practical AI in the classroom. Together, these efforts demonstrate a sustainable model for responsible AI adoption that spans national cyberinfrastructure, undergraduate research, and early-pipeline education.
Evolutionary Computation (EC) techniques, including Genetic Algorithms, Evolution Strategies, and Genetic Programming, have long demonstrated strong performance in solving complex, non-convex optimization problems; however, despite their inherent parallelism, their deployment at exascale supercomputing levels remains relatively underexplored. In this paper, we present a comprehensive study of EC applications on modern supercomputing architectures, emphasizing massively parallel and hybrid implementations, and propose a scalable framework that leverages heterogeneous computing resources by integrating multi-core CPUs and GPUs to accelerate evolutionary processes. The framework is evaluated on a suite of large-scale, real-world optimization problems, including the Traveling Salesman Problem, hyperparameter optimization for deep neural networks, and neural architecture search, with experimental results demonstrating significant improvements in scalability, convergence speed, and solution quality compared to traditional implementations.
Quantum computing holds immense promise for achieving computational gains once thought unattainable, given the physical limits to conventional (dubbed classical) compute resources. Certain classes of problems could benefit greatly from speedups obtainable from the exploitation of quantum mechanical properties, including superposition, entanglement and interference, by quantum computing systems. During this talk, we'll consider realistic instances in this pre-fault-tolerant era, strategies for the integration of quantum resources with HPC, and approaches for facilitating research use cases. Symposium members are encouraged to contribute efforts made by their HPC departments, both from emulation/simulation and hardware integration perspectives; the presentation is intended as an opportunity to explore ideas, share experiences and gather knowledge to advance quantum-centric HPC in our region.
National institutions that support research computing like CASC and CaRCC tend to be focused on the needs of large Universities. We talk about a project from CaRCC that is oriented to smaller institutions.
Achieving efficient utilization of shared compute resources is a primary objective of HPC providers, to maximize value for both researchers and institutions. While providers leverage workload managers to efficiently allocate many resources across many users over time, scheduling alone cannot prevent allocated resources from sitting idle or underutilized in terms of raw compute and memory resources. Common causes are misconfigured workload parameters; a user may unintentionally request too many resources, or of the wrong type. Between wait times and the often-opaque nature of batch execution, users may “fire-and-forget" their batch workloads and be unaware of serious inefficiencies. To address this issue, we develop a real-time workload monitoring and alerting system that rapidly informs users and HPC administrators of inefficient workloads, even while those workloads are still running. We will present the architecture of our system, which includes components from Slurm, Prometheus, VictoriaMetrics, PostgreSQL, and CHPC software. We will also provide a data-driven analysis of the results we have achieved with the system, as well as lessons learned and our future roadmap
The changing funding environment necessitates changes in how we provide and charge for research computing services. In this talk, we'll go over what we are discussing in Utah, including some level of operation re-charge, subscription plans, compute as a service with several priority levels, and persistent services on VMs. We hope to initiate discussion on these topics among the attendees to share their thoughts and experiences.
Workflow management systems like Nextflow are increasingly popular among researchers building computational pipelines, but their default configurations rarely account for the realities of shared HPC clusters. Left untuned, these tools can flood schedulers with thousands of short-lived jobs, request resources they never use, or create bursty submission patterns that degrade cluster performance for all users. This presentation examines Nextflow resource management on SLURM clusters with a focus on the concerns that matter most to HPC operators: scheduler interaction, fair-share impact, resource efficiency, and cluster-wide utilization. Using a computationally demanding genome alignment pipeline as an example, we'll explore how executor configuration, process-level resource directives, and monitoring strategies affect not just individual pipeline performance but overall cluster health. We'll cover common anti-patterns we've encountered—over-provisioned memory requests, runaway task submissions, poor locality awareness—and the configuration and design patterns that prevent them. Whether you're supporting researchers who use workflow managers or evaluating how to integrate them into your site's policies and documentation, the goal is to give you practical knowledge for keeping these tools running well on shared infrastructure.
The release of Snakemake 8 and 9 introduced breaking changes that forced HPC-dependent bioinformatics workflows to fundamentally rethink their cluster integration strategy — most notably, the removal of the long-standing `--cluster` flag in favor of a modern executor plugin interface. This talk walks through the real-world migration of DETECT, a simulation-based de novo mutation detection pipeline, from Snakemake 7 to Snakemake 9 on a SLURM HPC cluster. We cover the technical challenges encountered along the way: replacing inline cluster submission strings with the `snakemake-executor-plugin-slurm`, restructuring resource declarations into profile-based `set-resources` blocks, implementing automatic partition selection, and adding submission rate limiting to prevent SLURM socket timeouts at scale. Beyond the executor change, the migration prompted a broader modernization — containerizing GATK with Apptainer to eliminate conda instability, resolving Python environment path issues in shell rules, and building operational tooling for log analysis and interactive setup. Attendees will leave with a concrete migration roadmap and reusable patterns applicable to migrating from any Snakemake 7 workflow to Snakemake 8 and 9running on SLURM.
Are you struggling to provide comprehensive software documentation to users of your HPC system, or want to make your documentation easier to maintain? Join CU Boulder Research Computing (CURC)’s User Support Team for a demonstration of the ACCESS Software Documentation Service (SDS), a package designed to automate the creation and maintenance of software documentation for HPC clusters! We’ll give a demo of CURC’s implementation of the SDS, discuss implementation, and answer any questions you may have.
In this session we'll overview the landscape around LLMs and Generative AI and look at the Hugging Face Transformers library for working with LLMs. This session will also include a Jupyter Notebook lab that will take attendees through the process of using Falcon-7B for inference, memory efficient finetuning, and RAG.
Language models use word-level embeddings that are trained using text using a pre-train, fine-tune training and evaluation regime. In this presentation, we will see how the embeddings can be enriched with visual knowledge as they are pre-trained and fine-tuned on multiple linguistic tasks. Knowledge of python is needed; experience with torch and huggingface helps, but is not required.
This presentation will provide an overview of the Slurm HPC dashboard developed and used at ASU to monitor cluster utilization, GPU resources, and overall system health. I will explain how the dashboard supports day to day operations, demonstrate recently added features, and discuss the roadmap for future development as we continue expanding its capabilities to meet growing demands.