No description
  • Python 99.6%
  • Shell 0.4%
Find a file
daniel.vanco@external.t-systems.com f789479270 Merge branch 'fix/dpservice_ha' into 'main'
Fix HA dpservice support, add checks

See merge request cas-devs/osc/onmetal/tools/metaldig!30
2025-03-20 15:51:46 +00:00
examples Add support for HA dpservice 2025-03-15 01:04:41 +01:00
src Fix HA dpservice support, add checks 2025-03-19 23:30:34 +01:00
.gitignore Reorganize sources 2025-02-23 21:27:30 +01:00
install.sh Provide an install script 2025-02-23 21:28:12 +01:00
README.md Provide an install script 2025-02-23 21:28:12 +01:00

Metaldig

Metaldig is a helper tool/utility to dig data from complex multi-cluster-layer Metal infrastructure.

It was designed with these main goals in mind:

  • single Python 3.8+ source file to make installation and maintenance (update) easier and faster
  • do not use any complex libraries with lot of dependecies unless absolutely necessary
  • check consistency across layers (e.g. check if hostname of a VM is the same as node name, etc.)

Getting started

It is highly recommended to have your terminal support colors as metaldig's output can be overwhelming otherwise. You can test it by trying ls -la --color.

Kubectl

Metaldig needs working kubectl setup in your PATH environment variable. Official installation guide

To access shell on cluster nodes, you also need the node-shell extension. Github repository with guide

Installation

You can use OSC homebrew or install manually:

git clone https://gitlab.devops.telekom.de/cas-devs/osc/onmetal/metaldig.git
sudo ./install.sh

You also need to be authorized to access clusters. This is done automatically by using the right SSH key that should be registered in central user management.

Displaying help

$ metaldig -v
metaldig 0.107
$ metaldig -h
usage: metaldig [-h] [-v] [--debug] [--batch] [--init [KEY[,USER]]] [--color {always,never,auto}] [--tree] [-c COLUMNS] [-l LAYERS] [-s SORT_TABLES] [-t [{header,no-header}]] -r {mdb,ffm,lab} [--metalbond | --vmcluster [PATTERN] | --vmnode [CLUSTER-N]] [--machine [PATTERN] | --volume [PATTERN] | --bucket [PATTERN] | --vip [PATTERN] | --ngw [PATTERN] | --lb [PATTERN] | --nic [PATTERN] | --network [PATTERN] | --router [PATTERN] | --storage [{prod,stage,dev}] | --ip [IPV4[/LEN][@VNI]] | --ul [IPV6[/LEN]] | --metalnet] [--pod [PATTERN]] [--logs [DIR] | --delete | --shell | --virsh {shutdown,destroy} | --describe | --k3s [OUTFILE]] [--dpservice] [--check-routes]

metaldig 0.107 is a helper tool to help dig data from complex multi-cluster-layer Metal infrastructure

options:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  --debug               Run in debug mode
  --batch               Run query on all results instead of listing choices. Use with caution!
  --init [KEY[,USER]]   Initialize metaldig's local/region storage
  --color {always,never,auto}
                        Output colorization (default: auto)
  --tree                Display output in a tree format where applicable
  -c COLUMNS, --columns COLUMNS
                        Limit output to given comma-separated columns
  -l LAYERS, --layers LAYERS
                        Limit output to given comma-separated layers (API, REG, VM, ...)
  -s SORT_TABLES, --sort-tables SORT_TABLES
                        Sort tables by given comma-separated columns
  -t [{header,no-header}], --table-format [{header,no-header}]
                        Display table on a different format then the rest of the output
  -r {mdb,ffm,lab}, --region {mdb,ffm,lab}
                        The region to run the query in

bottom-up mode:
  --metalbond           Query metalbond-server URLs instead of kubernetes API
  --vmcluster [PATTERN]
                        Query given VM-cluster instead of API-cluster (default: *)
  --vmnode [CLUSTER-N]  Query given physical nodes instead of API-cluster (default: *-*)

queries (default pattern: *):
  --machine [PATTERN]   Show Machine (logical node) info / list matching Machines
  --volume [PATTERN]    Show Volume info / list matching Volumes
  --bucket [PATTERN]    Show Bucket info / list matching Buckets
  --vip [PATTERN]       Show VirtualIP info / list matching VirtualIPs
  --ngw [PATTERN]       Show NATGateway info / list matching NATGateways
  --lb [PATTERN]        Show LoadBalancer info / list matching LoadBalancers
  --nic [PATTERN]       Show NetworkInterface info / list matching NetworkInterfaces
  --network [PATTERN]   Show Network info / list matching Networks
  --router [PATTERN]    Show router info / list matching routers
  --storage [{prod,stage,dev}]
                        Get appropriate storage cluster for given environment
  --ip [IPV4[/LEN][@VNI]]
                        Show/list whatever given ip/range@vni belongs to
  --ul [IPV6[/LEN]], --underlay [IPV6[/LEN]]
                        Show/list whatever given underlay ip/range belongs to
  --metalnet            Perform metalnet layer checks

query actions:
  --pod [PATTERN]       Show Pod info / list matching Pods (default: *)
  --logs [DIR]          Show logs for selected Pods (or save them to DIR)
  --delete              Delete (effectively restart) selected Pod
  --shell               Enter shell where applicable
  --virsh {shutdown,destroy}
                        Execute virsh command on running VM
  --describe            Describe selected Pod or Node/Router
  --k3s [OUTFILE]       Download k3s kubeconfig

extra options:
  --dpservice           Query dpservice where appropriate
  --check-routes        Check dpservice routing tables (expert mode only)

Initialization

Although manual setup of ~/.metaldig directory is possible, it is quite complicated and out of the scope of this document.

Metaldig however provides a way to automatically initialize local storage for each region:

$ metaldig -r lab --init <your ssh key>[,<your ssh user>]
Fetching data from AZ1 partition cluster
Local storage for lab initialized

You can now start using metaldig and it will fetch .kubeconfig files as-needed.

You only need to provide an SSH key once. For other regions, feel free to omit it.

SSH-agent users still need to provide a key that metaldig will store locally, but it will respect your agent environment variables and prefer it.

Common concepts

Pattern matching

Most queries are based on some form of id/name. Metaldig accepts simple (filename-like) patterns in these places, i.e. you can write prefix*, *part* or reg?-vm?-n? instead of a full identifier.

If such pattern matches uniquely (there is only one match), Metaldig will proceed the same way it would have with a plain identifier. Otherwise it will list matches for you to choose and re-run the query.

Batch mode

If a --batch switch is used, Metaldig will run the query on all matches of the given pattern (instead of listing choices). This can be useful for mapping the environment or gathering logs, etc.

This can take a lot of time and cluster communication, please use responsibly.

Output filtering

Even thoug the output can be "grepped", it is clunky, can hide errors/warnings and drops colors by default. The better way is to use filtering via Metaldig itself. Use -c to only show needed columns (like -c name,ip) or layers (like -l api,reg).

Command-line argument shortening

You do not have to write the whole command-line argument if only the given portion is unambiguous. For example, --vol instead of --volume is enough. Consequently, most of the plural-form arguments can be used as singular.

Bottom-up mode

If a bottom-up option --vmcluster or --vmnode is provided, metaldig will first query a VM cluster and then proceed to match the result to an object in REG and API clusters.

Consistency checks

Metaldig will run checks in the context of each equery. For example, when querying loadbalancer info on multiple layers, it will try to test if all data is consistent across API, REG and VM objects.

Error reporting

Metaldig will throw a warning and return with a non-zero value when a problem is encountered. It will try to print as much information before stopping. In batch-mode it will simply move on to the next object to process. This can be used for finding problems by simply forwarding the normal output to /dev/null and only looking at the standard error output.

Examples

Due to the large amount of possible queries that can be executed by this tool, examples are separated by topic.

Page
machines Query machines (VMs) aka logical nodes for GoM
pods Query pods based on current context
storage Storage-related queries
network Network-related queries
nodes Query physical nodes
output Manipulate metaldig's output
scripts Use provided scripts