OSC Node Feature Discovery Extension
Introduction
This document describes the basic usage of the OSC Node Feature Discovery Extension.
Enabling the Extension for a Shoot Cluster
For enabling this extension for a Shoot cluster,
the extension service named osc-nfd-shoot-service
needs to be added
to the extensions in the Shoot
Custom Resource manifest:
apiVersion: core.gardener.cloud/v1beta1
kind: Shoot
metadata:
…
spec:
…
extensions:
- type: osc-nfd-shoot-service
…
You can issue this command to check the extensions on your Shoot cluster:
kubectl get cm -n kube-system shoot-info -o jsonpath={.data.extensions}
Disabling globally enabled extensions
To disable extensions which are enabled by default, add the following snippet to the shoot manifest:
kind: Shoot
...
spec:
extensions:
- type: osc-nfd-shoot-service
disabled: true
...
Components of NFD extension
When the NFD extension is installed in the shoot, the osc-nfd-shoot-controller-manager
also automatically deploys the NodeFeatureRule Custom Resource in the shoot cluster. Through the NodeFeatureRule configuration mechanisms, we have the capability to advertise node-level resources, such as available EPC memory as extended resources.
However, be aware that node-level resources added in this manner on node capacity are not transparent to Kubernetes, and as a result, there is no built-in mechanism for controlling their consumption. Hence, the responsibility for managing these resources falls upon the users.
Shoot extension modifications
The NFD extension is currently managing multiple helm charts. Helm values from the shoot manifests can be passed down to these helm charts. Moreover, it's possible to disable each service individually.
The providerConfig
in osc-nfd-shoot-service
allows modifications to the following components:
cgroups-prometheus-exporter
: This component exports the cgroups statistics to Prometheus.node-feature-rule
: This component defines the rules for node feature discovery.node-feature-discovery
: This component discovers the features of the node.nri-sgx-epc
: This plugin can be used to set limits on the SGX EPC memory using annotations.
Warning
While it's possible to disable node-feature-discovery
or node-feature-rule
,
it's not recommended due to the potential for unexpected behavior.
The node-feature-rule
doesn't currently have any values to overwrite, except for extensionName
, which is a string type.
Example of a Shoot YAML manifest:
kind: Shoot
apiVersion: core.gardener.cloud/v1beta1
metadata:
name: myshoot
namespace: myproject
spec:
extensions:
- type: osc-nfd-shoot-service
providerConfig:
apiVersion: nfd.osc.extensions.config.gardener.cloud/v1alpha1
kind: Configuration
cgroups-prometheus-exporter:
enabled: true
values: |
image:
repository: mtr.devops.telekom.de/osc/common/monitoring/cgroups-prometheus-exporter
tag: v0.1.0
pullPolicy: Always
prometheus:
enablePrometheusRule: false
enableServiceMonitor: false
node-feature-rule:
enabled: true
node-feature-discovery:
values: |
image:
repository: mtr.devops.telekom.de/osc/gardener/node-feature-discovery
# This should be set to 'IfNotPresent' for released version
pullPolicy: IfNotPresent
tag: v0.13.4-minimal
imagePullSecrets: []
# name is immutable! can be set only once
nameOverride: ""
fullnameOverride: ""
namespaceOverride: ""
enabled: true
nri-sgx-epc:
enabled: true
values: |
nri:
runtime:
patchConfig: true
image:
name: ghcr.io/containers/nri-plugins/nri-sgx-epc
disabled: false
Cgroups Prometheus Exporter
The cgroups-prometheus-exporter is responsible for publishing metrics based on the memory.current
, misc.current
, misc.events
and misc.max
files available on each shoot cluster. This data provides information about the usage of standard and EPC memory.
For more informations please visit the Cgroups-prometheus-exporter documentation.
NRI-SGX-EPC Plugin
The NRI-SGX-EPC plugin allows users to define the EPC limit. This is achieved by configuring the declared limit through the container runtime, containerd.
Containerd does not currently support miscellaneous cgroups. To address this, the containerd community introduced the concept of NRI. NRI plugins function similarly to mutating webhooks for Kubernetes, in that they "mutate" the container specification before containerd instructs the low-level container runtime (runc).
Implementation
The NFD extension deploys the NRI plugin by default if node selector requirement is met.
By setting the node selector to intel.feature.node.kubernetes.io/sgx: "true"
, the EPC NRI plugin pods will only be scheduled for SGX enabled nodes. This label is added by the NFD add-on automatically to SGX enabled nodes with running OSC Scone Service Operator
. ContainerD config is defined in /etc/containerd/config.toml
.
The NRI EPC plugin includes an init container. The container patches config of containerd automatically to enable NRI support in containerd.
For more information, please visit the official NRI documentation.
The Requirements for running NRI plugin:
- containerd
v1.7.x
- shoot cluster with SGX memory support
- The OSC Scone Service Operator must be deployed to provide labels and the
sgxplugin
- NRI must be
enabled
:- config is stored in
/etc/containerd/config.toml
- NRI must be
enabled
in[plugins."io.containerd.nri.v1.nri"\] disable = false
- config is automatically patched into
enabled
state
- config is stored in
It's possible to trigger the init container to patch the config by setting the value pathConfig
in shoot manifest:
nri:
runtime:
patchConfig: true
This plugin can be disabled by modifying the shoot manifest as shown in Shoot extension modifications. It is also possible to modify some values, for example:
values: |
image:
name: ghcr.io/containers/nri-plugins/nri-sgx-epc
#tag: unstable
pullPolicy: IfNotPresent
resources:
cpu: 25m
memory: 100Mi
nri:
plugin:
index: 90
runtime:
patchConfig: false
initContainerImage:
name: ghcr.io/containers/nri-plugins/nri-config-manager
#tag: unstable
pullPolicy: IfNotPresent
tolerations: []
affinity: []
nodeSelector: []
podPriorityClassNodeCritical: true
Deployment example
Annotations can be defined for the whole pod or its containers. The values are in bytes. For more information, please visit the NRI Plugin docs.
...
metadata:
annotations:
# for all containers in the pod
epc-limit.nri.io/pod: "32768"
# alternative notation for all containers in the pod
epc-limit.nri.io: "8192"
# for container c0 in the pod
epc-limit.nri.io/container.c0: "16384"
...
Known issues
Be aware that any pod or container attempting to consume more memory than allowed by the epc-limit
annotation will be restarted automatically.
ContainerD must be restarted to reflect the misc.max
values for limits if they are not shown properly.
SGX must have the same limits and requests specified for EPC memory. Otherwise, we will face the following error message:
The Deployment "sgx-epc-stress-test" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: "1Gi": must be equal to sgx.intel.com/epc limit
If the annotation with epc-limit
is changed, then ContainerD will provide data for both the older and newer resources for a few seconds, and these will overlap in the cgroups directory and metrics provided by the osc-cgroups-prometheus-exporter
.
NRI-SGX-EPC Support matrix
NRI-SGX-EPC was tested in following configurations:
NRI-SGX-EPC Plugin version | Garden Linux version | Kubernetes version | Containerd version |
---|---|---|---|
v0.7.1 | 1550.0 | 1.28.11 | 1.7.15 |
v0.7.1 | 1605.0 | 1.26.14 | 1.7.20 |
v0.7.1 | 1605.0 | 1.28.14 | 1.7.20 |
v0.7.1 | 1510.0 | 1.26.8 | 1.7.11 |
v0.7.1 | 1510.0 | 1.29.9 | 1.7.11 |
v0.7.1 | 1510.0 | 1.30.8 | 1.7.11 |
v0.7.1 | 1510.0 | 1.31.4 | 1.7.11 |