Skip to content

OSC Node Feature Discovery Extension

Introduction

This document describes the basic usage of the OSC Node Feature Discovery Extension.

Enabling the Extension for a Shoot Cluster

For enabling this extension for a Shoot cluster, the extension service named osc-nfd-shoot-service needs to be added to the extensions in the Shoot Custom Resource manifest:

apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
  
spec:
  
  extensions:
    - type: osc-nfd-shoot-service
    

You can issue this command to check the extensions on your Shoot cluster:

kubectl get cm -n kube-system shoot-info -o jsonpath={.data.extensions}

Disabling globally enabled extensions

To disable extensions which are enabled by default, add the following snippet to the shoot manifest:

kind: Shoot
...
spec:
  extensions:
  - type: osc-nfd-shoot-service
    disabled: true
...

Components of NFD extension

When the NFD extension is installed in the shoot, the osc-nfd-shoot-controller-manager also automatically deploys the NodeFeatureRule Custom Resource in the shoot cluster. Through the NodeFeatureRule configuration mechanisms, we have the capability to advertise node-level resources, such as available EPC memory as extended resources.

However, be aware that node-level resources added in this manner on node capacity are not transparent to Kubernetes, and as a result, there is no built-in mechanism for controlling their consumption. Hence, the responsibility for managing these resources falls upon the users.

Shoot extension modifications

The NFD extension is currently managing multiple helm charts. Helm values from the shoot manifests can be passed down to these helm charts. Moreover, it's possible to disable each service individually.

The providerConfig in osc-nfd-shoot-service allows modifications to the following components:

  • cgroups-prometheus-exporter: This component exports the cgroups statistics to Prometheus.
  • node-feature-rule: This component defines the rules for node feature discovery.
  • node-feature-discovery: This component discovers the features of the node.
  • nri-sgx-epc: This plugin can be used to set limits on the SGX EPC memory using annotations.

Warning

While it's possible to disable node-feature-discovery or node-feature-rule, it's not recommended due to the potential for unexpected behavior. The node-feature-rule doesn't currently have any values to overwrite, except for extensionName, which is a string type.

Example of a Shoot YAML manifest:

kind: Shoot
apiVersion: core.gardener.cloud/v1beta1
metadata:
  name: myshoot
  namespace: myproject
spec:
  extensions:
    - type: osc-nfd-shoot-service
      providerConfig:
        apiVersion: nfd.osc.extensions.config.gardener.cloud/v1alpha1
        kind: Configuration
        cgroups-prometheus-exporter:
          enabled: true
          values: |
            image:
              repository: mtr.devops.telekom.de/osc/common/monitoring/cgroups-prometheus-exporter
              tag: v0.1.0
              pullPolicy: Always
            prometheus:
              enablePrometheusRule: false
              enableServiceMonitor: false
        node-feature-rule:
          enabled: true
        node-feature-discovery:
          values: |
            image:
              repository:  mtr.devops.telekom.de/osc/gardener/node-feature-discovery
              # This should be set to 'IfNotPresent' for released version
              pullPolicy: IfNotPresent
              tag: v0.13.4-minimal
            imagePullSecrets: []
            # name is immutable! can be set only once
            nameOverride: ""
            fullnameOverride: ""
            namespaceOverride: ""
          enabled: true
        nri-sgx-epc:
          enabled: true
          values: |
            nri:
              runtime:
                patchConfig: true
            image:
              name: ghcr.io/containers/nri-plugins/nri-sgx-epc
      disabled: false

Cgroups Prometheus Exporter

The cgroups-prometheus-exporter is responsible for publishing metrics based on the memory.current, misc.current, misc.events and misc.max files available on each shoot cluster. This data provides information about the usage of standard and EPC memory. For more informations please visit the Cgroups-prometheus-exporter documentation.

NRI-SGX-EPC Plugin

The NRI-SGX-EPC plugin allows users to define the EPC limit. This is achieved by configuring the declared limit through the container runtime, containerd.

Containerd does not currently support miscellaneous cgroups. To address this, the containerd community introduced the concept of NRI. NRI plugins function similarly to mutating webhooks for Kubernetes, in that they "mutate" the container specification before containerd instructs the low-level container runtime (runc).

Implementation

The NFD extension deploys the NRI plugin by default if node selector requirement is met. By setting the node selector to intel.feature.node.kubernetes.io/sgx: "true", the EPC NRI plugin pods will only be scheduled for SGX enabled nodes. This label is added by the NFD add-on automatically to SGX enabled nodes with running OSC Scone Service Operator. ContainerD config is defined in /etc/containerd/config.toml.

The NRI EPC plugin includes an init container. The container patches config of containerd automatically to enable NRI support in containerd.

For more information, please visit the official NRI documentation.

The Requirements for running NRI plugin:

  • containerd v1.7.x
  • shoot cluster with SGX memory support
  • The OSC Scone Service Operator must be deployed to provide labels and the sgxplugin
  • NRI must be enabled:
    • config is stored in /etc/containerd/config.toml
    • NRI must be enabled in [plugins."io.containerd.nri.v1.nri"\] disable = false
    • config is automatically patched into enabled state

It's possible to trigger the init container to patch the config by setting the value pathConfig in shoot manifest:

nri:
  runtime:
    patchConfig: true

This plugin can be disabled by modifying the shoot manifest as shown in Shoot extension modifications. It is also possible to modify some values, for example:

values: |
  image:
    name: ghcr.io/containers/nri-plugins/nri-sgx-epc
    #tag: unstable
    pullPolicy: IfNotPresent

  resources:
    cpu: 25m
    memory: 100Mi

  nri:
    plugin:
      index: 90
    runtime:
      patchConfig: false

  initContainerImage:
    name: ghcr.io/containers/nri-plugins/nri-config-manager
    #tag: unstable
    pullPolicy: IfNotPresent

  tolerations: []
  affinity: []
  nodeSelector: []
  podPriorityClassNodeCritical: true

Deployment example

Annotations can be defined for the whole pod or its containers. The values are in bytes. For more information, please visit the NRI Plugin docs.

...
metadata:
  annotations:
    # for all containers in the pod
    epc-limit.nri.io/pod: "32768"
    # alternative notation for all containers in the pod
    epc-limit.nri.io: "8192"
    # for container c0 in the pod
    epc-limit.nri.io/container.c0: "16384"
...

Known issues

Be aware that any pod or container attempting to consume more memory than allowed by the epc-limit annotation will be restarted automatically.

ContainerD must be restarted to reflect the misc.max values for limits if they are not shown properly.

SGX must have the same limits and requests specified for EPC memory. Otherwise, we will face the following error message:

The Deployment "sgx-epc-stress-test" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: "1Gi": must be equal to sgx.intel.com/epc limit

If the annotation with epc-limit is changed, then ContainerD will provide data for both the older and newer resources for a few seconds, and these will overlap in the cgroups directory and metrics provided by the osc-cgroups-prometheus-exporter.

NRI-SGX-EPC Support matrix

NRI-SGX-EPC was tested in following configurations:

NRI-SGX-EPC Plugin version Garden Linux version Kubernetes version Containerd version
v0.7.1 1550.0 1.28.11 1.7.15
v0.7.1 1605.0 1.26.14 1.7.20
v0.7.1 1605.0 1.28.14 1.7.20
v0.7.1 1510.0 1.26.8 1.7.11
v0.7.1 1510.0 1.29.9 1.7.11
v0.7.1 1510.0 1.30.8 1.7.11
v0.7.1 1510.0 1.31.4 1.7.11