Data Center Thermal Research Tray

Capstone Project - Google

Full Technical Report: https://digitalcommons.calpoly.edu/mesp/815/

Overview

With AI and cloud service demand driving data center usage to new heights, tech companies more than ever are relying on colocation centers to support this rapid increase in computing capacity. This growth introduces a new challenge: how to efficiently cool increasingly powerful processing chips within the constraints of a colocation center’s existing thermal infrastructure. Our sponsor tasked our team with researching the cooling systems commonly offered by top-tier colocation centers and developing a test platform to evaluate and compare the most effective operating conditions for cooling next-generation chips in these environments.

Figure 1. Rack integration diagram

Mechanical Design

The mechanical design of the development rack tray was driven by the need to replicate key physical and thermal characteristics of colocation data center hardware while maintaining flexibility for experimental iteration. The system was designed to support high heat-flux components, integrate a liquid direct-to-chip (DTC) cooling loop, and allow controlled variation of flowrate, inlet temperature, and power dissipation.

The overall design was divided into three primary subassemblies: the computer component assembly, the cooling loop assembly, and the tray base assembly. This modular approach allowed each subsystem to be designed and validated independently before full system integration. The computer component assembly housed the CPU, GPU, motherboard, power supply, and storage required to generate representative data center heat loads. High-performance consumer hardware was selected to approximate the thermal output of modern data center processors while remaining compatible with commercially available liquid cooling components.

Figure 2. Final design diagram

Figure 3. Tray-level CAD, all fluids lines not shown

The cooling loop assembly was the primary focus of the mechanical design. A liquid direct-to-chip configuration was selected for both the CPU and GPU to reflect cooling strategies currently being adopted in colocation environments. Commercial off-the-shelf water blocks were used for both components to ensure reliability and repeatability, as custom block manufacturing was outside the scope of the project. Tubing and fittings were selected to balance pressure limitations, flexibility, and ease of reconfiguration. A nominal 3/8-inch inner diameter tubing was used to align with standard G1/4-inch fittings commonly found in liquid cooling systems. Rotary elbow fittings were incorporated to reduce sharp bends, minimize pressure losses, and simplify tubing routing.

Figure 4. Fluid line diagram

The tray base assembly was designed to emulate an Open Compute Project (OCP) rack tray while remaining easy to modify. Constructed from plywood, the tray provided sufficient structural support and allowed clear access to components and tubing. The open layout facilitated airflow visualization and simplified mechanical adjustments between tests. The GPU was mounted flat using a PCIe extender cable to better represent typical chip orientations and reduce stress on the motherboard connection.

Figure 5. Hardware

Testing & Data Collection

Testing was conducted to evaluate the thermal performance of the liquid direct-to-chip cooling system under a range of operating conditions representative of colocation data centers. The primary objective was to quantify how changes in coolant inlet temperature, flowrate, and component power dissipation affected system performance while maintaining safe operating limits for all hardware.

All tests were performed using a fixed mechanical configuration with the CPU and GPU cooled in a series loop. Prior to data collection, the cooling loop was assembled and leak-checked independently, then integrated with the computer system. Each test was run long enough to reach steady-state conditions, verified by stable coolant and component temperature readings over multiple chiller cycles.

Figure 6. Cycling inlet temperature, 25 C chiller set point

Three primary variables were systematically varied during testing. Coolant inlet temperature was controlled by adjusting the chiller setpoint, allowing evaluation across a range of supply temperatures typical of data center cooling loops. Flowrate was adjusted by varying pump speed, with corresponding flow values determined using a timed volume measurement method and scaled linearly with pump power. Total system wattage was varied using stress-testing software to impose controlled heat loads on the CPU and GPU, spanning low to high power operation.

For each test condition, coolant temperatures, component temperatures, flowrate, and electrical power consumption were recorded using in-line temperature sensors and system monitoring software.

Results

Increasing coolant flowrate improved heat removal, but with diminishing returns beyond a moderate flow threshold, indicating that additional pump power yielded minimal thermal benefit at higher flowrates. This suggests an optimal operating region where cooling effectiveness can be maintained while reducing pumping energy.

Figure 7. Efficiency Vs. flowrate at different chiller input temperatures

Results

Coolant inlet temperature had a measurable impact on component temperatures, though its influence decreased at higher system wattages. At elevated heat loads, system performance converged toward a consistent cooling effectiveness regardless of inlet temperature, indicating that thermal resistance within the system became the dominant limiting factor rather than coolant conditions alone.

As total system wattage increased, the cooling capacity ratio (CCR) approached an asymptotic value, demonstrating predictable behavior at high heat fluxes. This convergence suggests that, under efficient operating conditions, CCR behaves as a system-level characteristic rather than a tunable parameter.

Figure 8. Efficiency Vs. wattage at different chiller input temperatures

Overall, the results highlight trade-offs between flowrate, inlet temperature, and energy consumption, and demonstrate that effective cooling of high-power processors can be achieved without excessive pumping or aggressive coolant temperature reduction. These findings align with trends observed in existing colocation cooling strategies and support the relevance of the test platform.

Next project

Page updated

Google Sites

Report abuse