Difference between revisions of "Research Computing Infrastructure"

From edegan.com
Jump to navigation Jump to search
Line 1: Line 1:
 
Note: Those with access can view more details and help guides on my infrastructure from the [[Administration]] page. Readers may also be interested in the page on [[Addressing Ubuntu NVIDIA Issues]].
 
Note: Those with access can view more details and help guides on my infrastructure from the [[Administration]] page. Readers may also be interested in the page on [[Addressing Ubuntu NVIDIA Issues]].
  
==Research Computing Infrastructure==
+
==DIGITS DevBox (Bastard)==
 +
 
  
===DIGITS DevBox (Bastard)===
 
  
 
'''Full article: [[DIGITS DevBox]]'''
 
'''Full article: [[DIGITS DevBox]]'''
Line 11: Line 11:
 
See also [[Using the DevBox]], which provides examples on how to connect to and use our GPU compute machine
 
See also [[Using the DevBox]], which provides examples on how to connect to and use our GPU compute machine
  
===Father and Mother===
+
==Father and Mother==
  
 
Father is our Windows 2019 Server, which provides bulk storage on a RAID array and Remote Desktop Protocol (RDP) based computing and applications. Mother is our Linux server, running Ubuntu 20.04, and provides both the main structured data research computing environment (through Postgres 12), as well as the Apache2 web server and so the public facing research computing platform.  
 
Father is our Windows 2019 Server, which provides bulk storage on a RAID array and Remote Desktop Protocol (RDP) based computing and applications. Mother is our Linux server, running Ubuntu 20.04, and provides both the main structured data research computing environment (through Postgres 12), as well as the Apache2 web server and so the public facing research computing platform.  

Revision as of 19:00, 21 September 2020

Note: Those with access can view more details and help guides on my infrastructure from the Administration page. Readers may also be interested in the page on Addressing Ubuntu NVIDIA Issues.

DIGITS DevBox (Bastard)

Full article: DIGITS DevBox

Top1000.jpg

Our DIGITS DevBox, affectionately named after Lois McMaster Bujold's fifth God, has a XEON e5-2620v3 processor, 256GB of DDR4 RAM, two GPUs - one Titan RTX and one Titan Xp - with room for two more, a 500GB SSD hard drive (mounting /), and an 8TB RAID5 array bcached with a 512GB m.2 drive (mounting the /bulk share, which is available over samba). It runs Ubuntu 18.04, CUDA 10.0, cuDNN 7.6.1, Anaconda3-2019.03, python 3.7, tensorflow 1.13, digits 6, and other useful machine learning tools/libraries.

See also Using the DevBox, which provides examples on how to connect to and use our GPU compute machine

Father and Mother

Father is our Windows 2019 Server, which provides bulk storage on a RAID array and Remote Desktop Protocol (RDP) based computing and applications. Mother is our Linux server, running Ubuntu 20.04, and provides both the main structured data research computing environment (through Postgres 12), as well as the Apache2 web server and so the public facing research computing platform.

Full article: Research Computing Hardware

Full article: Research Computing Configuration

The component lists for Father and Mother, our current Research Computing Hardware, are provided below. These parts work nicely together, which can be a challenge. Both machines use lots of common components - the same Supermicro boards, the same RAM, the same drives (more or less), etc. The boards were chosen because they support dual chip Intel Scalable CPUs on socket 3647, DDR4 at 2666MHz, have NVMe connections for the solid state drives (provided you remember to buy the oculink cables!), and have room for multiple GPUs using 16 channels of PCI-E 3.0 (though the BIOS of this board seems to prevent them from working). The chips all have fast enough clock speeds to match the RAM, and sufficient channels for the drives and GPUs. Each machine has a RAID 10 array made up initially of 4 6TB NAS drives, which are in a hot-swappable bay.

See Research Computing Configuration for how we set them up.

Father's (RDP) Hardware Components

The RDP has dual 12-core CPUs. We compromised on clock speed to save on price, but this is a good "all-purpose" configuration. The OS lives on the 400Gb NVMe SSD. Currently this box has 512Gb of DDR4 2.666Ghz, but it is expandable to 1Tb. The board supports 2Tb but you need 64Gb sticks, which are currently prohibitively expensive.

Quantity Part
1 Supermicro Motherboard MBD-X11DAI-N-O Xeon Dual Socket S3647 C621 Max.2TB PCI Express EATX (MBD-X11DAI-N-O)
2 Intel CD8067303405900 Xeon Gold 6126, 12 Cores, 2.6 GHz, 19.25 MB Cache, DDR4 up to 2666 MHz, 125W TDP - OEM
1 512GB (8x64GB) DDR4-2666MHz PC4-21300 4Rx4 288-Pin 1.2V ECC Load Reduced LRDIMM Memory by NEMIX RAM
1 Intel 750 Series 2.5" 400GB PCI-Express 3.0 x4 MLC Internal Solid State Drive (SSD) SSDPE2MW400G4X1
2 Noctua NH-D9 DX-3647 4U Premium CPU Cooler for Intel Xeon LGA3647
1 EVGA SuperNOVA 1600 T2 220-T2-1600-X1 80+ TITANIUM 1600W Fully Modular EVGA ECO Mode Includes FREE Power On Self Tester Power Supply
4 WD Red 6TB NAS Hard Disk Drive - 5400 RPM Class SATA 6Gb/s 64MB Cache 3.5 Inch - WD60EFRX
1 Rosewill RSV-L4000 - 4U Rackmount Server Case / Chassis - 8 Internal Bays, 7 Cooling Fans Included
1 Rosewill RSV-SATA-Cage-34 - Hard Disk Drives - Black, 3 x 5.25" to 4 x 3.5" Hot-Swap - SATA III / SAS - Cage
1 ASUS 24X DVD Burner - Bulk 24X DVD+R 8X DVD+RW 8X DVD+R DL 24X DVD-R 6X DVD-RW 16X DVD-ROM 48X CD-R 24X CD-RW 48X CD-ROM Black SATA Model DRW-24B1ST/BLK/B/AS - OEM
1 Rosewill RDRD-11003 2.5" SSD / HDD Mounting Kit for 3.5" Drive Bay with 60mm Fan
1 Arctic Silver 5 High-Density Polysynthetic Silver Thermal Compound AS5-3.5G
1 AmazonBasics Wired Keyboard and Wired Mouse Bundle Pack

Mother's (Dbase Server) Components

The database server has a single 4-core 3.6Ghz Skylake chip, as clock speed matters much more than cores in this set-up. The OS lives on a 400Gb NVMe SSD and the postgresql installation lives on the 1.2Tb NVMe SSD. The 12 TB RAID 10 array is for deep bulk storage. Because we only have a single CPU on the board, we are maxed out at 512Gb with the 8Gb sticks of DDR4 2.66Ghz.

Quantity Part
1 Supermicro Motherboard MBD-X11DAI-N-O Xeon Dual Socket S3647 C621 Max.2TB PCI Express EATX (MBD-X11DAI-N-O)
1 Intel Xeon Scalable Gold 5122 SkyLake 4-Core 3.6 GHz (3.7 GHz Turbo) LGA 3647 105W BX806735122 Server Processor
1 512GB (8x64GB) DDR4-2666MHz PC4-21300 4Rx4 288-Pin 1.2V ECC Load Reduced LRDIMM Memory by NEMIX RAM
1 Intel 750 Series 2.5" 400GB PCI-Express 3.0 x4 MLC Internal Solid State Drive (SSD) SSDPE2MW400G4X1
1 Intel 750 Series 2.5" 1.2TB PCI-Express 3.0 x4 MLC Internal Solid State Drive (SSD) SSDPE2MW012T4X1
1 Noctua NH-D9 DX-3647 4U Premium CPU Cooler for Intel Xeon LGA3647
1 EVGA SuperNOVA 1600 T2 220-T2-1600-X1 80+ TITANIUM 1600W Fully Modular EVGA ECO Mode Includes FREE Power On Self Tester Power Supply
4 WD Red 6TB NAS Hard Disk Drive - 5400 RPM Class SATA 6Gb/s 64MB Cache 3.5 Inch - WD60EFRX
1 Rosewill RSV-L4000 - 4U Rackmount Server Case / Chassis - 8 Internal Bays, 7 Cooling Fans Included
1 Rosewill RSV-SATA-Cage-34 - Hard Disk Drives - Black, 3 x 5.25" to 4 x 3.5" Hot-Swap - SATA III / SAS - Cage
1 ASUS 24X DVD Burner - Bulk 24X DVD+R 8X DVD+RW 8X DVD+R DL 24X DVD-R 6X DVD-RW 16X DVD-ROM 48X CD-R 24X CD-RW 48X CD-ROM Black SATA Model DRW-24B1ST/BLK/B/AS - OEM
1 Rosewill RDRD-11003 2.5" SSD / HDD Mounting Kit for 3.5" Drive Bay with 60mm Fan
1 Arctic Silver 5 High-Density Polysynthetic Silver Thermal Compound AS5-3.5G
1 AmazonBasics Wired Keyboard and Wired Mouse Bundle Pack