DCC Partitions
SLURM partitions are separate queues that divide up a cluster's nodes based on specific attributes. Each partition has its own constraints, which control which jobs can run in it. There are many DCC partitions. Access to each partition is restricted based on your slurm account. Users may have multiple slurm accounts and must specify the correct slurm account to gain access to restricted partitions.
General use DCC Partitions
- common for jobs that will run on the DCC core node
- gpu-common for jobs that will run on DCC GPU nodes
- scavenger for jobs that will run on lab-owned nodes in “low priority” (kill and requeue preemption).
- scavenger-gpu for GPU jobs that will run on lab-owned nodes in “low priority” (kill and requeue preemption)
- courses and courses-gpu are special paritions used to support Duke courses. You must have an
account=coursess25
to access these partitions. If you are only part of the DCC through a course, you will not be able to use the common or gpu-common partitions. - interactive for debugging and testing scripts. This should be limited to short interactive sessions or short-duration tests and is not intended for use in general analyses. Default resources are low. See below for partition limits per user.
Note: If a partition is not specified, the default partition is the common partition.
Duke Compute Account Limits
These limits are subject to change.
Duke Compute Cluster Partition Limits | Default Value |
---|---|
Max running Mem per user account | 1.5TB |
Max running CPUs per user account | 400 |
Max queued jobs per user per account | 400 |
Parition Limits Per User
Partition Name | Max CPU | Max Memory (GB) |
---|---|---|
interactive | 10 | 64 |
Configuration
Hardware
Partition Name | Number of Nodes | Processors | RAM (GB) | GPU | Max Walltime (days-hours:minutes:seconds) |
---|---|---|---|---|---|
common | 57 | 4844 | 37791 | -- | 90-00:00:00 |
common-gpu | 32 | 424 | 3354 | 34 | 2-00:00:00 |
scavenger | 88 | 7502 | 48078 | -- | 90-00:00:00 |
scavenger-gpu | 187 | 3643 | 24283 | 215 | 7-00:00:00 |
courses | 50 | 4200 | 23334 | -- | 7-00:00:00 |
courses-gpu | 10 | 840 | 4666 | 20 | 2-00:00:00 |
compalloc | 4 | 496 | 3975 | -- | 30-00:00:00 |
interactive | 61 | 5340 | 42673 | -- | 1-00:00:00 |
GPUs
GPU Type | GPU Full Name | VRAM (GB) | GPU per Node | Number of Nodes with GPU Config | Partitions |
---|---|---|---|---|---|
2080 | nvidia_geforce_rtx_2080_ti | 1 | 7 | common-gpu | |
2080 | nvidia_geforce_rtx_2080_ti | 3 | 1 | common-gpu | |
5000_ada | nvidia_rtx_5000_ada_generation | 1 | 24 | common-gpu | |
2080 | nvidia_geforce_rtx_2080_ti | 1 | 67 | scavenger-gpu | |
2080 | nvidia_geforce_rtx_2080_ti | 4 | 2 | scavenger-gpu | |
5000_ada | nvidia_rtx_5000_ada_generation | 1 | 4 | scavenger-gpu | |
6000_ada | nvidia_rtx_6000_ada_generation | 1 | 24 | scavenger-gpu | |
6000_ada | nvidia_rtx_6000_ada_generation | 4 | 1 | scavenger-gpu | |
a5000 | nvidia_rtx_a5000 | 1 | 68 | scavenger-gpu | |
a5000 | nvidia_rtx_a5000 | 4 | 3 | scavenger-gpu | |
a6000 | nvidia_rtx_a6000 | 1 | 8 | scavenger-gpu | |
p100 | tesla_p100-pcie-16gb | 2 | 10 | scavenger-gpu |
Useful SLURM commands for resources
These commands can be used to view cluster resources.
General partition info
scontrol show partition partitionName
will give you general information about a specific partition.
Example for the common partition:
(base) rm145@dcc-login-02 ~ $ scontrol show partition common
PartitionName=common
AllowGroups=ALL DenyAccounts=courses,coursess25,panlab AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO ExclusiveTopo=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=90-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED
Nodes=dcc-comp-[01-10],dcc-core-[01-12,14-41,43-49]
PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=GANG,REQUEUE
State=UP TotalCPUs=4844 TotalNodes=57 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
TRES=cpu=4844,mem=38698635M,node=57,billing=4844
Specific GPU info per partition
scontrol show node `sinfo -h -r -p partitionName -o %N` | grep Gres | sort | uniq -c
Example for GPUs in the common-gpu partition:
rm145@dcc-login-05 ~ $ scontrol show node `sinfo -h -r -p gpu-common -o %N` | grep Gres | sort | uniq -c
7 Gres=gpu:2080:1
1 Gres=gpu:2080:3
24 Gres=gpu:5000_ada:1