OpenShift Installation Baremetal on AWS

INPG - DO NOT FOLLOW BLINDLY!

Other notes:

Table of Content

Set up bastion
Download your Pull Secret Before you can generate the installation configuration
Create TargetGroups and NLB
- Create the Empty Target Groups:
- Create NLB
Setup DNS
Prepare install-config.yaml
Prepare Deployment
Preparing EC2 instances
Prepare or update security group
Finding CoreOS AMI
- Launch EC2 Instance for OCP
Register the Instances to your Target Groups
TODO
Troubleshooting

Set up bastion

$ ssh-keygen -t ed25519 -N '' -f ~/.ssh/id_rsa

# Set the version environment variable
export VERSION=4.20.0

# Download and extract the OpenShift CLI (oc)
curl -s https://mirror.openshift.com/pub/openshift-v4/clients/ocp/$VERSION/openshift-client-linux.tar.gz | tar zxvf - oc

# Move the oc binary to a directory on your PATH
sudo mv oc /usr/local/bin/

# Download and extract the OpenShift Installer
curl -s https://mirror.openshift.com/pub/openshift-v4/clients/ocp/$VERSION/openshift-install-linux.tar.gz | tar zxvf - openshift-install

# Move the installer binary to your PATH
sudo mv openshift-install /usr/local/bin/

# Verify that the tools are installed correctly by checking the version:
oc version
openshift-install version

Install aws CLI (https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)

Download your Pull Secret Before you can generate the installation configuration

Log in to the Red Hat OpenShift Cluster Manager (console.redhat.com/openshift) - https://console.redhat.com/openshift/install/pull-secret
Download or copy your pull secret to the bastion host (e.g., save it as pull-secret.txt).

Create TargetGroups and NLB

Create the Empty Target Groups:

Go to the AWS console and create the Target Groups for the required ports (like 6443, 22623, 443, and your AAP port) using the TCP protocol
You must specify the VPC where the cluster will live, but when AWS prompts you to register targets, simply skip that step and save the empty target group.
- For the Kubernetes API (Port 6443): TCP/6443 -> /readyz
- For the Machine Config Server (Port 22623): TCP/HTTPS/22623 -> /healthz
- For the Application Ingress (Port 443): HTTP health check on port 1936 using the /healthz/ready (he Ingress Controller publishes its health statistics to port 1936 specifically for external load balancers to use)

Create NLB

Note: For the OpenShift Network Load Balancers (NLBs), the protocol must be strictly TCP (Layer 4 load balancing, also referred to as Raw TCP or SSL Passthrough)

For the API NLB (targeting your 3 Master nodes + 1 temporary Bootstrap node):
- Listener TCP/6443 -> Add target group for port 6443 (Kubernetes API)
- Listener TCP/22623 -> Add target group for port 22623 (Machine Config Server)
For the Application Ingress NLB (targeting your 3 Worker nodes):
- Create a TCP target group for port 443 (HTTPS traffic)
- Create a TCP target group for port 80 (HTTP traffic)

Setup DNS

api.<cluster_name>.gineesh.com: This provides name resolution for the Kubernetes API and should point to the API NLB
api-int.<cluster_name>.gineesh.com: This is used for internal cluster communications and must also point to the API NLB
*.apps.<cluster_name>.gineesh.com: This provides name resolution for the wildcard routes (like the OpenShift web console and default applications) and should point to the application ingress NLB
aap.gineesh.com (or whatever specific URL you want for AAP): Point this directly to the AWS DNS name of the AAP ALB.

Create the following CNAME records in Cloudflare (e.g under gineesh.com):

Name: api.ocp420 Target: <Your_API_NLB_URL> Purpose: Provides name resolution for the Kubernetes API for external clients

Name: api-int.ocp420 Target: <Your_API_NLB_URL> Purpose: Provides name resolution for the Kubernetes API for internal cluster communications

Note: However, there is one internal security consideration: the api-int record is used strictly for internal cluster communication between your nodes. If you create this record in Cloudflare, your internal cluster routing becomes publicly resolvable on the internet. While your AWS Security Groups will still block unauthorized external access to port 22623, some organizations prefer to keep internal records completely hidden. If you want to keep internal traffic strictly private, you could create an AWS Route 53 Private Hosted Zone attached to your VPC specifically to host the api-int.ocp420.gineesh.com record, while leaving the rest in Cloudflare. If you do not mind the DNS record itself being public, you can put all of them in Cloudflare and bypass Route 53 entirely.

Name: *.apps.ocp420 Target: <Your_Ingress_NLB_URL> Purpose: Provides name resolution for the wildcard routes, pointing to the application ingress load balancer

Name: aap (or your chosen URL for AAP) Target: <Your_AAP_ALB_URL> Purpose: Points your Ansible Automation Platform URL to your dedicated AWS Application Load Balancer.

Prepare install-config.yaml

mkdir $HOME/clusterconfig

Create $HOME/clusterconfig/install-config.yaml

apiVersion: v1
baseDomain: gineesh.com # e.g., example.com
metadata:
  name: ocp420 # e.g., ocp-aap
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16 # Update to match the CIDR of your AWS VPC subnets
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
fips: false
pullSecret: '{"auths": ...}' # Paste your downloaded pull secret JSON here
sshKey: 'ssh-ed25519 AAAA...' #

Details to prepare:

machineNetwork (Your AWS VPC CIDR) This value specifies the IP address pool for your cluster machines. The machineNetwork CIDR must match or encompass the CIDR blocks of the AWS VPC and subnets where you will deploy your EC2 instances. You must ensure that the networking.machineNetwork matches the CIDR where your EC2 network interfaces will reside
clusterNetwork (Pod IPs) This defines the internal IP address blocks from which pod IP addresses are allocated. You do not need to find this in AWS, as it is an internal, virtual network managed entirely by OpenShift. You can leave this as the default (10.128.0.0/14), as long as it does not overlap with your AWS VPC CIDR (machineNetwork) or your serviceNetwork

Prepare Deployment

Generate the manifests

Note: Make sure compute.replicas is set to 0 in your install-config.yaml before running this, so the installer does not wait for worker machines it shouldn’t provision

$ openshift-install create manifests --dir $HOME/clusterconfig

INFO Consuming Install Config from target directory
WARNING Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings
INFO Successfully populated MCS CA cert information: root-ca 2036-03-21T16:25:10Z 2026-03-24T16:25:10Z
INFO Successfully populated MCS TLS cert information: root-ca 2036-03-21T16:25:10Z 2026-03-24T16:25:10Z
INFO Manifests created in: /home/ec2-user/clusterconfig, /home/ec2-user/clusterconfig/manifests and /home/ec2-user/clusterconfig/openshift

Generate the Ignition configuration files:

$ openshift-install create ignition-configs --dir $HOME/clusterconfig

This command consumes the manifests and generates several files in your directory, most notably bootstrap.ign, master.ign, worker.ign, and an auth/ directory containing your kubeadmin password and kubeconfig.

$ tree clusterconfig/
clusterconfig/
├── 000_capi-namespace.yaml
├── auth
│   ├── kubeadmin-password
│   └── kubeconfig
├── bootstrap.ign
├── master.ign
├── metadata.json
└── worker.ign

1 directory, 7 files

Host bootstrap.ign file

You can host it in a simple webserver (using Python or Apache) or store it in S3 (complicated with IAM and AWS part)

Python way:

$ cp clusterconfig/bootstrap.ign ignition-files/

$ cd ~/ignition-files

# Serve on port 8080 (no root needed)
nohup python3 -m http.server 8080 &

# Verify it's running
curl http://localhost:8080/bootstrap.ign | head -5

Open Firewall on Bastion (RHEL 9) - if needed

sudo firewall-cmd --add-port=8080/tcp --permanent
sudo firewall-cmd --reload

# Verify
sudo firewall-cmd --list-ports

Ensure the port 8080 is allowed in the Security group from 10.0.0.0/16 (Restrict to VPC CIDR only — bootstrap.ign contains sensitive cluster material, never expose to 0.0.0.0/0)

Cleanup After Bootstrap Completes

# Kill the python server
kill $(lsof -t -i:8080)

Encode the Master and Worker Ignition files

Because you will pass the master.ign and worker.ign files into the EC2 User Data field during instance creation, it is recommended to convert them to base64 encoding

base64 -w0 $HOME/clusterconfig/master.ign > $HOME/clusterconfig/master.64
base64 -w0 $HOME/clusterconfig/worker.ign > $HOME/clusterconfig/worker.64

Preparing EC2 instances

Prepare or update security group

For Master & Bootstrap Nodes:
- TCP 6443: Kubernetes API. Allow inbound from your API NLB and your VPC CIDR
- TCP 22623: Machine Config Server. Allow inbound from your API NLB and your VPC CIDR
- TCP 22: SSH. Allow inbound from your Bastion host’s security group or IP
- TCP 19531: (Bootstrap node only) Allow inbound from the master nodes
For Worker Nodes:
- TCP 80 & 443: HTTP/HTTPS application traffic. Allow inbound from your Ingress NLB and AAP ALB
- TCP 22: SSH. Allow inbound from your Bastion host

Finding CoreOS AMI

Ensure using the Red Hat Enterprise Linux CoreOS (RHCOS) AMI for your region.

Tip: Find the right AMI for your OCP version for the region’ example below:

$ openshift-install coreos print-stream-json | \
  python3 -c "
import json, sys
data = json.load(sys.stdin)
amis = data['architectures']['x86_64']['images']['aws']['regions']
print('ap-southeast-2 AMI:', amis['ap-southeast-2']['image'])
"
ap-southeast-2 AMI: ami-007439e088223214a

When launching your EC2 instance in the AWS Console, on the AMI selection screen:

Click “Browse more AMIs”
Select “Community AMIs” tab
In the search box, paste: ami-007439e088223214a

Launch EC2 Instance for OCP

Launch 1 Bootstrap Node: In the User Data field (under Advanced Details), paste a small JSON snippet that points to your Web server or S3 bucket. It should look like this (ensure you update the URL)

{
  "ignition": {
    "config": {
      "replace": {
        "source": "http://10.0.27.231:8080/bootstrap.ign"
      }
    },
    "version": "3.2.0"
  }
}

Monitor using the OpenShift Installer (Recommended) From your bastion host, run the following command to make the installer actively watch the cluster’s progress

$ openshift-install --dir $HOME/clusterconfig wait-for bootstrap-complete --log-level=info

This command will pause and output status messages. When the Kubernetes API server is fully bootstrapped on your master nodes, the command will succeed and output a message saying:

INFO It is now safe to remove the bootstrap resources

Launch 3 Master Nodes: Paste the entire base64-encoded contents of your master.64 file into the User Data field for each master instance
Launch 3 Worker Nodes: Paste the entire base64-encoded contents of your worker.64 file into the User Data field for each worker instance

Register the Instances to your Target Groups

Now that your EC2 instances are running and have IP addresses, go back to the AWS Load Balancer Target Groups you created earlier:

API Target Group (Port 6443): Register your 1 Bootstrap node and 3 Master nodes.
Machine Config Target Group (Port 22623): Register your 1 Bootstrap node and 3 Master nodes.
Ingress Target Group (Port 443): Register your 3 Worker nodes.
AAP NodePort Target Group: Register your 3 Worker nodes.

At this point, the instances will boot, grab their Ignition configurations, and automatically begin forming the OpenShift cluster! You can monitor the progress from your bastion host using:

$ openshift-install wait-for bootstrap-complete --dir $HOME/clusterconfig

TODO

Troubleshooting

ssh -i <key.pem> core@<bootstrap-private-ip>

# Check ignition completed
sudo journalctl -b -u ignition* --no-pager

# Check bootkube (this starts the API)
sudo journalctl -b -u bootkube.service -f

# Check kubelet
sudo systemctl status kubelet

# Check if API port is listening locally
ss -tlnp | grep 6443