Difference between revisions of "DIGITS DevBox"

From edegan.com
Jump to navigation Jump to search
Line 94: Line 94:
 
===Main OS Install===
 
===Main OS Install===
  
Install Ubuntu 18.04 (note that the original DiGIT DevBox ran 14.04), not the LTS version from a freshly burnt DVD.
+
Install Ubuntu 18.04 (note that the original DiGIT DevBox ran 14.04), not the LTS version, from a freshly burnt DVD.
  
 
Choose the first network hardware option and make sure that the second (right most) network port is connected to a DHCP broadcasting router.
 
Choose the first network hardware option and make sure that the second (right most) network port is connected to a DHCP broadcasting router.
Line 113: Line 113:
  
 
Give the box a reboot for safety.
 
Give the box a reboot for safety.
 +
 +
Check the release (says the LTS version!).
 +
lsb_release -a
 +
  No LSB modules are available.
 +
  Distributor ID: Ubuntu
 +
  Description:    Ubuntu 18.04.2 LTS
 +
  Release:        18.04
 +
  Codename:      bionic
  
 
===Video Drivers===
 
===Video Drivers===

Revision as of 12:17, 13 May 2019

Documentation

The documentation from NVIDIA is here:

Hardware specs from other builds:

The best instructions that I could find:

Some firms, including Lamdba Labs, Bizon-tech, are selling variants on them, but the details on their specs are limited (the MoBo and config details are missing entirely):

Unfortunately, the form to get help from NVIDIA is closed:

At around $15k (the Lamdba variants go from $10k to $23k), buying one is prohibitive for most people. But the parts cost is perhaps $5k now for the original spec.

Hardware

We mostly followed the original hardware spec from NVIDIA, updating the capacity of the drives and other minor things, as we had many of these parts available as salvage from other boxes. Though we had to buy the ASUS X99-E WS motherboard (as well as some new drives) just for this project.

We opted to use a Xeon e5-2620v3 processor, rather than the Core i7-5930K (which we did have available). Both support 40 channels and mount in the LGA 2011-v3 socket, and both have 6 cores, 15mb caches etc. The i7 has a faster clock speed but the Xeon takes registered (buffered), ECC DDR4 RDIMMs, which means we can put 256Gb on the board, rather than just 64Gb. For the GPUs we have a TITAN RTX and an older TITAN Xp available to start, and we can add a 1080Ti later, or buy some additional GPUs if needed. We also put the whole thing in a Rosewill RSV-L4000 case.

Quantity Part
1 ASUS X99-E WS/USB 3.1 LGA 2011-v3 Intel X99 SATA 6Gb/s USB 3.1 USB 3.0 CEB Intel Motherboard
1 Intel Haswell Xeon e5-2620v3, 6 core @ 2.4ghz, 6x256k level 1 cache, 15mb level 2 cache, socket LGA 2011-v3
8 Crucial DDR4 RDIMM, 2133Mhz , Registered (buffered) and ECC, 32GB
1 NVIDIA TITAN RTX DirectX 12 900-1G150-2500-000 SB 24GB 384-Bit GDDR6 HDCP Ready Video Card
1 NVIDIA TITAN Xp Graphics Card (900-1G611-2530-000)
1 SAMSUNG 970 EVO PLUS 500GB Internal Solid State Drive (SSD) MZ-V7S500B/AM
1 Samsung 850 EVO 500GB 2.5-Inch SATA III Internal SSD (MZ-75E500/EU)
3 WD Red 4TB NAS Hard Disk Drive - 5400 RPM Class SATA 6Gb/s 64MB Cache 3.5 Inch - WD40EFRX
1 DVDRW: Asus 24x DVD-RW Serial-ATA Internal OEM Optical Drive DRW-24B1ST
1 EVGA SuperNOVA 1600 T2 220-T2-1600-X1 80+ TITANIUM 1600W Fully Modular EVGA ECO Mode Power Supply
1 Rosewill RSV-L4000 - 4U Rackmount Server Case / Chassis - 8 Internal Bays, 7 Cooling Fans Included
1 Rosewill RSV-SATA-Cage-34 - Hard Disk Drives - Black, 3 x 5.25" to 4 x 3.5" Hot-Swap - SATA III / SAS - Cage
1 Rosewill RDRD-11003 2.5" SSD / HDD Mounting Kit for 3.5" Drive Bay w/ 60mm Fan
3 Corsair ML120 PRO LED CO-9050043-WW 120mm Blue LED 120mm Premium Magnetic Levitation PWM Fan
2 ARCTIC F8 PWM Fluid Dynamic Bearing Case Fan, 80mm PWM Speed Control, 31 CFM at 22dBA

Old notes on a prior look at a GPU Build are on the wiki too.

There weren't any particularly noteworthy things about the hardware build. The GPUs need to go in slots 1 and 3, which means they sit tight on each other. I put the Titan XP in slot 1 (and plugged the monitor into its HDMI port), because then the fans for the Titan RTX (which I expect will get heavier use) are in the clear.

The initial BIOS boot was weird - the machine ran at full power for a short period then powered off multiple times before finally giving a single system beep and loading the BIOS. It may have been memory checking or some such.

BIOS

The machine boots to BIOS. I made the following changes:

  • The GPUs are being recognized - see the tool section!
  • All of the SATA drives are being recognized
  • Set the three hard disks to hotswapable enable
  • Set the fans to PWM, which drastically cuts down the noise, and set the lower thresholds to 200 (not that it seemed to matter, they seem to be idling at around 1k)
  • Listed the OS as OS rather than windows, and set enhanced mode to disabled
  • Delete the PK to disable secure boot
  • Change the boot order to be CD first (not as UEFI, and then the Samsung 850)


Notes:

  • We will do RAID 5 array in software, rather using X99 through the BIOS
  • The m.2 drive is visible in the BIOS and will be used as a cache for the RAID 5 array (using bcache)

Software

Main OS Install

Install Ubuntu 18.04 (note that the original DiGIT DevBox ran 14.04), not the LTS version, from a freshly burnt DVD.

Choose the first network hardware option and make sure that the second (right most) network port is connected to a DHCP broadcasting router.

Under partitions:

  1. Put one large partition, formatted as ext4, mounted as /, bootable on the 850
  2. Partition each SATA drive as RAID
  3. Put one large partition, formatted as ext4, not mounted on the 970 (for later)
  4. Put software RAID5 over the 3 SATA drives, format the RAID as ext4 and mount as /bulk

Install SSH and Samba. When prompted, add the MBR to the front of the 850.

After a reboot, the screen freezes. Either change the bootloader, adding nomodeset (see https://www.pugetsystems.com/labs/hpc/The-Best-Way-To-Install-Ubuntu-18-04-with-NVIDIA-Drivers-and-any-Desktop-Flavor-1178/#step-4-potential-problem-number-1), or just SSH onto the box.

Run as root:

apt-get update
apt-get dist-upgrade

Give the box a reboot for safety.

Check the release (says the LTS version!).

lsb_release -a
 No LSB modules are available.
 Distributor ID: Ubuntu
 Description:    Ubuntu 18.04.2 LTS
 Release:        18.04
 Codename:       bionic

Video Drivers

Check that the hardware is being seen:

lspci -vk

05:00.0 VGA compatible controller: NVIDIA Corporation GP102 [TITAN Xp] (rev a1) (prog if 00 [VGA controller])
       Subsystem: NVIDIA Corporation GP102 [TITAN Xp]
       Flags: bus master, fast devsel, latency 0, IRQ 72, NUMA node 0
       Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
       Memory at c0000000 (64-bit, prefetchable) [size=256M]
       Memory at d0000000 (64-bit, prefetchable) [size=32M]
       I/O ports at d000 [size=128]
       Expansion ROM at 000c0000 [disabled] [size=128K]
       Capabilities: [60] Power Management version 3
       Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
       Capabilities: [78] Express Legacy Endpoint, MSI 00
       Capabilities: [100] Virtual Channel
       Capabilities: [250] Latency Tolerance Reporting
       Capabilities: [128] Power Budgeting <?>
       Capabilities: [420] Advanced Error Reporting
       Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
       Capabilities: [900] #19
       Kernel driver in use: nouveau
       Kernel modules: nvidiafb, nouveau

06:00.0 VGA compatible controller: NVIDIA Corporation Device 1e02 (rev a1) (prog-if 00 [VGA controller])
       Subsystem: NVIDIA Corporation Device 12a3
       Flags: fast devsel, IRQ 24, NUMA node 0
       Memory at f8000000 (32-bit, non-prefetchable) [size=16M]
       Memory at a0000000 (64-bit, prefetchable) [size=256M]
       Memory at b0000000 (64-bit, prefetchable) [size=32M]
       I/O ports at c000 [size=128]
       Expansion ROM at f9000000 [disabled] [size=512K]
       Capabilities: [60] Power Management version 3
       Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
       Capabilities: [78] Express Legacy Endpoint, MSI 00
       Capabilities: [100] Virtual Channel
       Capabilities: [250] Latency Tolerance Reporting
       Capabilities: [258] L1 PM Substates
       Capabilities: [128] Power Budgeting <?>
       Capabilities: [420] Advanced Error Reporting
       Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
       Capabilities: [900] #19
       Capabilities: [bb0] #15
       Kernel modules: nvidiafb, nouveau

This looks good. The second card is the Titan RTX (see https://devicehunt.com/view/type/pci/vendor/10DE/device/1E02).

Currently we are using the nouveau driver for the Xp, and have no driver loaded for the RTX.