xCAT (Extreme Cloud Administration Toolkit) utilizes the MSN Hider (Master/Service Node Hider) mechanism to secure multi-tenant clusters. It prevents compute nodes from seeing the internal architecture, hostnames, or real IP addresses of Management Nodes (MN) and Service Nodes (SN).
If your nodes are leaking master information, or losing connectivity during network deployment, this guide covers the core setup and common fixes. Core MSN Hider Setup
The MSN Hider works by intercepting cluster setup templates and utilizing customized Network Address Translation (NAT) rules and strict postscripts.
Enable Strict Node Aliasing: Force nodes to look for generic aliases (e.g., xcatmaster) instead of the master’s true hostname.
Define Generic Master Targets: In the xCAT site table, ensure xcatmaster points to the virtual loopback or the mapped NAT address rather than the real MN interface.
Deploy Security Zones: Group nodes into isolated zones using the chzone command to restrict inter-node communications and discovery. Common Issues & Troubleshooting Fixes 1. Compute Nodes Fail to Resolve “xCATMaster”
Symptom: Nodes get stuck during deployment or fail to run postscripts because they cannot reach the Management Node.
Cause: The generic alias is missing from the local /etc/hosts or the DNS zones served by the master.
Fix: Regenerate the DNS maps using makedns -n. Verify that the short name matches exactly in lowercase, as xCAT DNS resolutions are case-sensitive. 2. Real Management IP Leaking in Kickstart Templates
Symptom: Compute nodes successfully deploy but log files display the true IP of your master node.
Cause: The installation templates are explicitly calling the #MASTER# macro instead of the hidden macro.
Fix: Edit your deployment template (e.g., /opt/xcat/share/xcat/install/rh/compute.tmpl). Replace any explicit #MASTER# references with #XCATMASTER# to dynamically enforce the hidden mapping. 3. Postscripts Fail Post-Installation
Symptom: The operating system installs completely, but the final configuration scripts (postscripts) timeout.
Cause: Strict firewall rules or improper NAT routing on the Service Nodes block the hidden port forwarding.
Fix: Use xcatdebug to track the exact network communication breakdown. Ensure ports 3001 (xcatd client) and 3002 (install status) are open and properly mapped on your gateway interfaces. 4. DHCP Leaks Real Hostnames
Symptom: Nodes grab IP addresses but accept the real master domain instead of the hidden domain alias.
Fix: Run makedhcp -a -d to clean up old leases and rebuild the configuration file cleanly without cache interference. Verification Checklist
Run these quick checks on a provisioned compute node to confirm that MSN Hider is functioning correctly:
ping xcatmaster should point to the masked/NAT IP, not the private admin network IP.
cat /etc/xcatinfo must list the generic alias for the master server daemon.
iptables -t nat -L -n -v on your Service Node should explicitly show active forwarding targets masking the MN.
To continue fixing your cluster environment, you can provide your specific OS type (e.g., RHEL, SLES), share the error log output from /var/log/xcat/xcat.log, or ask how to configure custom postscripts to enforce hidden network topologies. xCAT / Wiki / Debugging_xCAT_Problems – SourceForge
Leave a Reply