Worker pool management

Gardener Dashboard

Shoot-gardener-dashboard-overview-worker-pool-management

Click Settings button as shown on picture below.
Click + sign to add worker group.

Continue in pop-up window as shown on image below:

Group Name: enter here name of new worker pool
Machine Type: select from drop-down menu of currently supported Machine Types, according to your needs
Machine Image: select from drop-down menu of currently supported Machine Images, according to your needs
Container Runtime: select from drop-down menu of currently supported Container Runtime, according to your needs
Volume Size: enter here volume size in GiB
Autoscaling configuration:
1. Autoscaling Min.: enter here minimum number of worker nodes
2. Autoscaling Min.: enter here maximum number of worker nodes
3. Max. Surge: enter here max surge
Zone: select Availability Zones

Shoot-worker-pool-management

Garden Cluster

On the Garden cluster level the Shoot object can be edited (using kubectl edit or kubectl patch) to update worker pool configuration.

Example worker group setup:

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
spec:
  provider:
    type: onmetal
    # Please note that workers is a list of worker pool configurations.
    workers:
    - name: idp
      cri:
        name: containerd
      machine:
        architecture: amd64
        image:
          name: gardenlinux
          version: 1061.0.20
        type: <machine-class>
      maxSurge: 1
      maxUnavailable: 0
      maximum: 12
      minimum: 12
      systemComponents:
        allow: true
      volume:
        size: 100Gi
        type: fast
      zones:
      - mdb1-pool1
      - mdb2-pool1
      - mdb3-pool1
    # Here an another worker pool configuration can be added

Machine controller advanced settings

There is new section in the Shoot manifest where various timeouts can be configured:

machineCreationTimeout: MachineCreationTimeout is the period after which creation of a Machine is declared as failed.
machineDrainTimeout: MachineDrainTimeout is the period after which a Machine is forcefully deleted.
machineHealthTimeout: MachineHealthTimeout is the period after which a Machine is declared as failed.
maxEvictRetries: MaxEvictRetries are the number of eviction retries on a Pod after which a drain operations is declared as failed, and a forceful deletion is triggered.
nodeConditions: NodeConditions are a set of conditions which, if are true for the period of MachineHealthTimeout, a Machine will be declared as failed.

Example:

kind: Shoot
apiVersion: core.gardener.cloud/v1beta1
metadata:
  name: dash
  namespace: garden-dash
spec:
 …
 provider:
    type: onmetal
    workers:
      - cri:
          name: containerd
          …
        machineControllerManager:
          machineDrainTimeout: 2h0m0s
          machineHealthTimeout: 10m0s
          machineCreationTimeout: 20m0s
          maxEvictRetries: 10
          nodeConditions:
            - ReadonlyFilesystem
            - KernelDeadlock
            - DiskPressure