DCC Partitions

SLURM partitions are separate queues that divide up a cluster's nodes based on specific attributes. Each partition has its own constraints, which control which jobs can run in it. There are many DCC partitions. Access to each partition is restricted based on your slurm account. Users may have multiple slurm accounts and must specify the correct slurm account to gain access to restricted partitions.

General use DCC Partitions

common for jobs that will run on the DCC core node
gpu-common for jobs that will run on DCC GPU nodes
scavenger for jobs that will run on lab-owned nodes in “low priority” (kill and requeue preemption).
scavenger-gpu for GPU jobs that will run on lab-owned nodes in “low priority” (kill and requeue preemption)
courses and courses-gpu are special paritions used to support Duke courses. You must have an account=coursess25 to access these partitions. If you are only part of the DCC through a course, you will not be able to use the common or gpu-common partitions.
interactive for debugging and testing scripts. This should be limited to short interactive sessions or short-duration tests and is not intended for use in general analyses. Default resources are low. See below for partition limits per user.
h200ea for access to h200 GPUs. Limited access, by request only from faculty PI. Send requests to rescomputing@duke.edu.

Note: If a partition is not specified, the default partition is the common partition.

Duke Compute Account Limits

These limits are subject to change.

Duke Compute Cluster Partition Limits	Default Value
Max running Mem per user account	1.5TB
Max running CPUs per user account	400
Max queued jobs per user per account	400

Parition Limits Per User

Partition Name	Max CPU	Max Memory (GB)
interactive	10	64

Configuration

Hardware

Partition Name	Number of Nodes	Processors	RAM (GB)	GPU	Max Walltime (days-hours:minutes:seconds)
common	57	4844	37791	--	90-00:00:00
common-gpu	32	424	3354	34	2-00:00:00
scavenger	88	7502	48078	--	90-00:00:00
scavenger-gpu	187	3643	24283	215	7-00:00:00
courses	50	4200	23334	--	7-00:00:00
courses-gpu	10	840	4666	20	2-00:00:00
compalloc	4	496	3975	--	30-00:00:00
interactive	61	5340	42673	--	1-00:00:00
h200ea	6	1116	11780	54	7-00:00:00

GPUs

GPU Type	GPU Full Name	GPU per Node	Number of Nodes with GPU Config	Partitions
2080	nvidia_geforce_rtx_2080_ti	1	7	common-gpu
2080	nvidia_geforce_rtx_2080_ti	3	1	common-gpu
5000_ada	nvidia_rtx_5000_ada_generation	1	24	common-gpu
2080	nvidia_geforce_rtx_2080_ti	1	67	scavenger-gpu
2080	nvidia_geforce_rtx_2080_ti	4	2	scavenger-gpu
5000_ada	nvidia_rtx_5000_ada_generation	1	4	scavenger-gpu
6000_ada	nvidia_rtx_6000_ada_generation	1	24	scavenger-gpu
6000_ada	nvidia_rtx_6000_ada_generation	4	1	scavenger-gpu
a5000	nvidia_rtx_a5000	1	68	scavenger-gpu
a5000	nvidia_rtx_a5000	4	3	scavenger-gpu
a6000	nvidia_rtx_a6000	1	8	scavenger-gpu
p100	tesla_p100-pcie-16gb	2	10	scavenger-gpu

Useful SLURM commands for resources

These commands can be used to view cluster resources.

General partition info

scontrol show partition partitionName will give you general information about a specific partition.

Example for the common partition:

(base) rm145@dcc-login-02  ~ $ scontrol show partition common
PartitionName=common
   AllowGroups=ALL DenyAccounts=courses,coursess25,panlab AllowQos=ALL
   AllocNodes=ALL Default=YES QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO ExclusiveTopo=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=90-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED
   Nodes=dcc-comp-[01-10],dcc-core-[01-12,14-41,43-49]
   PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=GANG,REQUEUE
   State=UP TotalCPUs=4844 TotalNodes=57 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
   TRES=cpu=4844,mem=38698635M,node=57,billing=4844

Specific GPU info per partition

scontrol show node `sinfo -h -r -p partitionName -o %N` | grep Gres | sort | uniq -c

Example for GPUs in the common-gpu partition:

rm145@dcc-login-05  ~ $ scontrol show node `sinfo -h -r -p gpu-common -o %N` | grep Gres | sort | uniq -c
      7    Gres=gpu:2080:1
      1    Gres=gpu:2080:3
     24    Gres=gpu:5000_ada:1