Design¶
Stressant is built on top of the Grml project, so it benefits from its extensive list of utilities, which cover most of the rescue systems out there (e.g. Debian Live and Debirf, see below for a more thorough comparison).
Stressant is therefore an ISO image that can be burned on CD/DVD or copied to a USB drive, or net-bootable images. As part of Grml, there are already options to perform:
- memory tests with
memtest86
- hardware detection and inventory with HDT
There are also many more options, for example loading to RAM or setting the system to be read-only, see the cheatcodes list for more details.
Note
Stressant does not currently ship with Grml, see below for details.
The stressant tool¶
Stressant is therefore mainly a Python program that calls other UNIX utilities, collects their output on the screen, in a logfile and/or sends it over email.
The objective of this software is to automate a basic stress-testing suite that, once started, will go through a basic CPU/memory/disk/network test framework and report any errors and failures.
This is done through the stressant.py
script, which performs the
following tests:
lshw
andsmartctl
for hardware inventorydd
,hdparm
,fio
andsmartctl
for disk testing -fio
can also overwrite disk drives with the proper options (--overwrite
and--size=100%
)stress-ng
for CPU testingiperf3
for network testing
Here is an example test run:
$ sudo ./stressant --email anarcat@anarc.at --fileSize 1M --cpuBurnTime 1s --iperfTime 1
INFO: Starting tests
INFO: CPU cores: 4
INFO: Memory: 16 GiB (16715816960 bytes)
INFO: Hardware inventory
DEBUG: Calling lshw -short
OUTPUT: H/W path Device Class Description
OUTPUT: ==========================================================
OUTPUT: system Desktop Computer
OUTPUT: /0 bus NUC6i3SYB
OUTPUT: /0/0 memory 64KiB BIOS
OUTPUT: /0/22 memory 64KiB L1 cache
OUTPUT: /0/23 memory 64KiB L1 cache
OUTPUT: /0/24 memory 512KiB L2 cache
OUTPUT: /0/25 memory 3MiB L3 cache
OUTPUT: /0/26 processor Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz
OUTPUT: /0/27 memory 16GiB System Memory
OUTPUT: /0/27/0 memory 16GiB SODIMM DDR4 Synchronous 2133 MHz (0.5 ns)
OUTPUT: /0/27/1 memory [empty]
OUTPUT: /0/100 bridge Skylake Host Bridge/DRAM Registers
OUTPUT: /0/100/2 display HD Graphics 520
OUTPUT: /0/100/14 bus Sunrise Point-LP USB 3.0 xHCI Controller
OUTPUT: /0/100/14/0 usb1 bus xHCI Host Controller
OUTPUT: /0/100/14/0/1 scsi3 storage USB to ATA/ATAPI Bridge
OUTPUT: /0/100/14/0/1/0.0.0 /dev/sdb disk 500GB 00ABYS-01TNA0
OUTPUT: /0/100/14/0/1/0.0.1 /dev/sdc disk 500GB 00ABYS-01TNA0
OUTPUT: /0/100/14/0/3 input Dell USB Keyboard
OUTPUT: /0/100/14/0/4 input Kensington Expert Mouse
OUTPUT: /0/100/14/0/7 communication Bluetooth wireless interface
OUTPUT: /0/100/14/1 usb2 bus xHCI Host Controller
OUTPUT: /0/100/14.2 generic Sunrise Point-LP Thermal subsystem
OUTPUT: /0/100/16 communication Sunrise Point-LP CSME HECI #1
OUTPUT: /0/100/17 storage Sunrise Point-LP SATA Controller [AHCI mode]
OUTPUT: /0/100/1c bridge Sunrise Point-LP PCI Express Root Port #5
OUTPUT: /0/100/1c/0 network Wireless 8260
OUTPUT: /0/100/1e generic Sunrise Point-LP Serial IO UART Controller #0
OUTPUT: /0/100/1e.6 generic Sunrise Point-LP Secure Digital IO Controller
OUTPUT: /0/100/1f bridge Sunrise Point-LP LPC Controller
OUTPUT: /0/100/1f.2 memory Memory controller
OUTPUT: /0/100/1f.3 multimedia Sunrise Point-LP HD Audio
OUTPUT: /0/100/1f.4 bus Sunrise Point-LP SMBus
OUTPUT: /0/100/1f.6 eno1 network Ethernet Connection I219-V
OUTPUT: /0/1 scsi2 storage
OUTPUT: /0/1/0.0.0 /dev/sda disk 500GB WDC WDS500G1B0B-
OUTPUT: /0/1/0.0.0/1 /dev/sda1 volume 511MiB Windows FAT volume
OUTPUT: /0/1/0.0.0/2 /dev/sda2 volume 244MiB EFI partition
OUTPUT: /0/1/0.0.0/3 /dev/sda3 volume 465GiB EFI partition
INFO: SMART information for /dev/sda
DEBUG: Calling smartctl -i /dev/sda
OUTPUT: smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-1-amd64] (local build)
OUTPUT: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
OUTPUT:
OUTPUT: === START OF INFORMATION SECTION ===
OUTPUT: Device Model: WDC WDS500G1B0B-00AS40
OUTPUT: Serial Number: XXXXXXXXXXXX
OUTPUT: LU WWN Device Id: XXXXXXXXXXXX
OUTPUT: Firmware Version: XXXXXXXXXXXX
OUTPUT: User Capacity: 500,107,862,016 bytes [500 GB]
OUTPUT: Sector Size: 512 bytes logical/physical
OUTPUT: Rotation Rate: Solid State Device
OUTPUT: Form Factor: M.2
OUTPUT: Device is: Not in smartctl database [for details use: -P showall]
OUTPUT: ATA Version is: ACS-2 T13/2015-D revision 3
OUTPUT: SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
OUTPUT: Local Time is: Fri Mar 17 10:24:52 2017 EDT
OUTPUT: SMART support is: Available - device has SMART capability.
OUTPUT: SMART support is: Enabled
OUTPUT:
INFO: Basic disk bandwidth tests
INFO: Writing 1MB file
DEBUG: Calling dd bs=1M count=512 conv=fdatasync if=/dev/zero of=test
OUTPUT: 512+0 records in
OUTPUT: 512+0 records out
OUTPUT: 536870912 bytes (537 MB, 512 MiB) copied, 1.39591 s, 385 MB/s
INFO: Reading 1MB file
DEBUG: Calling dd bs=1M count=512 of=/dev/null if=test
OUTPUT: 512+0 records in
OUTPUT: 512+0 records out
OUTPUT: 536870912 bytes (537 MB, 512 MiB) copied, 0.0848588 s, 6.3 GB/s
INFO: Hdparm test
DEBUG: Calling hdparm -Tt /dev/sda
OUTPUT:
OUTPUT: /dev/sda:
OUTPUT: Timing cached reads: 12406 MB in 2.00 seconds = 6207.39 MB/sec
OUTPUT: Timing buffered disk reads: 1504 MB in 3.00 seconds = 501.13 MB/sec
INFO: Disk stress test
DEBUG: Calling fio --name=stressant --readwrite=randrw --numjob=4 --sync=1 --direct=1 --group_reporting --size=1M --output=/tmp/tmpo2QJnR
OUTPUT: stressant: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
OUTPUT: ...
OUTPUT: fio-2.16
OUTPUT: Starting 4 processes
OUTPUT:
OUTPUT: stressant: (groupid=0, jobs=4): err= 0: pid=978: Fri Mar 17 10:25:07 2017
OUTPUT: read : io=2160.0KB, bw=5669.3KB/s, iops=1417, runt= 381msec
OUTPUT: clat (usec): min=141, max=2197, avg=486.59, stdev=344.70
OUTPUT: lat (usec): min=141, max=2198, avg=486.96, stdev=344.71
OUTPUT: clat percentiles (usec):
OUTPUT: | 1.00th=[ 145], 5.00th=[ 153], 10.00th=[ 161], 20.00th=[ 175],
OUTPUT: | 30.00th=[ 189], 40.00th=[ 217], 50.00th=[ 278], 60.00th=[ 676],
OUTPUT: | 70.00th=[ 748], 80.00th=[ 828], 90.00th=[ 916], 95.00th=[ 980],
OUTPUT: | 99.00th=[ 1384], 99.50th=[ 1800], 99.90th=[ 2192], 99.95th=[ 2192],
OUTPUT: | 99.99th=[ 2192]
OUTPUT: write: io=1936.0KB, bw=5081.4KB/s, iops=1270, runt= 381msec
OUTPUT: clat (usec): min=618, max=6602, avg=2566.13, stdev=1029.39
OUTPUT: lat (usec): min=619, max=6602, avg=2566.71, stdev=1029.39
OUTPUT: clat percentiles (usec):
OUTPUT: | 1.00th=[ 732], 5.00th=[ 900], 10.00th=[ 964], 20.00th=[ 1672],
OUTPUT: | 30.00th=[ 1976], 40.00th=[ 2384], 50.00th=[ 2640], 60.00th=[ 3152],
OUTPUT: | 70.00th=[ 3312], 80.00th=[ 3440], 90.00th=[ 3568], 95.00th=[ 3856],
OUTPUT: | 99.00th=[ 4704], 99.50th=[ 4960], 99.90th=[ 6624], 99.95th=[ 6624],
OUTPUT: | 99.99th=[ 6624]
OUTPUT: lat (usec) : 250=24.41%, 500=4.88%, 750=7.91%, 1000=19.34%
OUTPUT: lat (msec) : 2=10.84%, 4=30.66%, 10=1.95%
OUTPUT: cpu : usr=0.80%, sys=2.39%, ctx=1945, majf=0, minf=35
OUTPUT: IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
OUTPUT: submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
OUTPUT: complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
OUTPUT: issued : total=r=540/w=484/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
OUTPUT: latency : target=0, window=0, percentile=100.00%, depth=1
OUTPUT:
OUTPUT: Run status group 0 (all jobs):
OUTPUT: READ: io=2160KB, aggrb=5669KB/s, minb=5669KB/s, maxb=5669KB/s, mint=381msec, maxt=381msec
OUTPUT: WRITE: io=1936KB, aggrb=5081KB/s, minb=5081KB/s, maxb=5081KB/s, mint=381msec, maxt=381msec
OUTPUT:
OUTPUT: Disk stats (read/write):
OUTPUT: dm-3: ios=207/493, merge=0/0, ticks=120/296, in_queue=416, util=58.78%, aggrios=540/1527, aggrmerge=0/0, aggrticks=288/752, aggrin_queue=1040, aggrutil=75.30%
OUTPUT: dm-0: ios=540/1527, merge=0/0, ticks=288/752, in_queue=1040, util=75.30%, aggrios=540/1326, aggrmerge=0/201, aggrticks=264/704, aggrin_queue=968, aggrutil=74.49%
OUTPUT: sda: ios=540/1326, merge=0/201, ticks=264/704, in_queue=968, util=74.49%
INFO: CPU stress test for 1s
DEBUG: Calling stress-ng --timeout 1s --cpu 0 --ignite-cpu --metrics-brief --log-brief --tz --times --aggressive
OUTPUT: dispatching hogs: 4 cpu
OUTPUT: cache allocate: default cache size: 3072K
OUTPUT: successful run completed in 1.05s
OUTPUT: stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
OUTPUT: (secs) (secs) (secs) (real time) (usr+sys time)
OUTPUT: cpu 453 1.04 2.80 0.00 437.62 161.79
OUTPUT: cpu:
OUTPUT: acpitz 27.80 °C
OUTPUT: pch_skylake 32.77 °C
OUTPUT: acpitz 31.78 °C
OUTPUT: x86_pkg_temp 34.40 °C
OUTPUT: for a 1.05s run time:
OUTPUT: 4.22s available CPU time
OUTPUT: 2.81s user time ( 66.64%)
OUTPUT: 0.01s system time ( 0.24%)
OUTPUT: 2.82s total time ( 66.87%)
OUTPUT: load average: 0.34 0.58 2.52
INFO: Running network benchmark
DEBUG: Calling iperf3 -c iperf.he.net -t 1
OUTPUT: iperf3: error - the server is busy running a test. try again later
ERROR: Command failed: Command 'iperf3 -c iperf.he.net -t 1' returned non-zero exit status 1
INFO: all done
INFO: sent email to ['anarcat@anarc.at'] using anarc.at
Note that there are nice colors in an actual console, the above is just a dump of the logfile.
We currently use the iperf.he.net
server from Hurricane
Electric as a default server for our tests, but
users are encouraged to change that to a local server using the
--iperfServer
argument to get more accurate results. Notice how that
performance test failed, above, because the HE server wasn’t available:
this is just another hint that you should use your own server.
Background¶
This project emanates from a packaging effort of a custom Linux distribution called ‘breakin’. It turned into a simple Python program that reuses existing stress-testing programs packaged in Debian.
Stressant used to be built as a standalone Debian Derivative, a Pure blend based on Debirf. But in 2017, the project was rearchitectured to be based on Grml and to focus on developing a standalone stress-testing tool. While Stressant could still become its own Debian derivative, it is the hope of the author that it will be factored into Grml and be part of the project without needing to create the heavy infrastructure for another Linux distribution.
The Is your Computer Stable? post from Jeff Atwood was a motivation to get back into the project. It outlines a few basic tools to use to make sure your computer is stable:
- memtest86 - shipped with Grml
- install Ubuntu - we assume you’ll do that anyways
- MPrime to stress the CPU - not free software, [stress-ng][] was chosen instead and gives similar results here
- badblocks test (with
-sv
) - this is covered by thefio
test - smartctl -i/-a/-t to identify and test harddrives
dd
andhdparm
to get quick stats - done- bonnie++ for more extensive benchmarks - Grml people suggested we use fio instead
- iperf for network testing - this assumes a local server, we instead use iperf3 and public servers
- furmark for testing the GPU - Windows-only, no Linux equivalent, the Phoronix test suite uses ffmpeg tests for that purpose
The idea was to regroup this in a single tool that would perform all those tests, without reinventing the wheel of course.
Stressant was also highly coupled with Koumbit’s infrastructure as this is where the Debirf recipes were originally developed. It needed a CI system to build the images, which was originally done with Jenkins. This was then done with Gitlab CI, but failed to build images because of issues with debirf and, ultimately, docker itself. This is why why Grml was used as a basis for future development.
Remaining work¶
Stressant could run in a tmux or screen session that would show the current task on one pane and syslog (or journalctl) in another. This would allow for more information to be crammed in a single display while at the same time making remote access (e.g. through SSH) easier to switch to.
Finally, we need clear and better documentation on various testing tools
there are out there, a bit like TAILS is doing. For example, we used to
ship with diskscan
but i didn’t even remember that and I am not sure
what to use it for or when to use it. A summary description of the
available tools, maybe through a menu system or at least a set of HTML
files, would be useful. I would probably use Sphinx and RST for this
because of the simplicity and availability of tools like readthedocs.org
and the ease of creationg for offline documentation (PDF and ePUB).
Building images by hand¶
There is a handy build-iso.sh
script that will setup APT
repositories and run all the right commands to build a Grml + Stressant
ISO image on a recent Debian release (tested on Jessie and later). Note
that you can pass extra flags to the
grml-live command with the
$GRML_LIVE_FLAGS
environment variable. What follows is basically a
description of what that script does.
To build an image by hand, you will need to first install the
grml-live
package which is responsible for building Grml images. For
this, you will need to add the Grml Debian
repository to your sources.list
file.
Instructions for doing so are available in the files section of the
Grml site.
Note: stressant is not yet part of Grml. We have requested its addition to the official image in pull request #34. Until then you can manually add the package to the build, using:
sudo tee /etc/grml/fai/config/package_config/STRESSANT << EOF PACKAGES install stressant EOF
Once this is done, you should be able to build an image using:
sudo grml-live -c DEBORPHAN,GRMLBASE,GRML_FULL,RELEASE,AMD64,IGNORE,STRESSANT \
-s unstable -a amd64 \
-o $PWD/grml -U $USER \
-v $(date +%Y.%m) -r gossage -g grml64-full-stressant
This will build a “full” Grml release (-c
) based on Debian unstable
on a 64 bit architecture (-a
) in the ./grml
subdirectory
(-o
). The files will be owned (-U
) by the current user
($USER
). The version number (-v
), the release name (-r
) and
flavor (-g
) are just cargo-culted from the upstream official
release.
See the grml-live for further options,
but do note the -u
option that can be used to rerun the builds if
you want to only update the image to the latest release, for example.
The resulting ISO will be in
./grml/grml_isos/grml64-full_$(date +%Y.%m%d).iso
. To make a
multi-arch ISO, you should use the grml2iso
command. For example,
this is how upstream
builds
the 96
ISO which features the 32 bits and 64 bits architectures:
grml2iso -o grml96-small_2014.11.iso grml64-small_2014.11.iso grml32-small_2014.11.iso
Build system review¶
The following is a summary evaluation of the different options considered by the Stressant project to build live images.
Here’s an outline of some of the numerous possibilities. A detailed discussion of each alternative follows below.
- Debirf: Debian
InitRamFs - builds the live image into the
initrd
file, simple design, but issues with builds in newer Debian releases - FAI’s fai-diskimage: requires root and is designed for cloud images, not ISO
- vmdebootstrap: minimal, written in Python, used for Debian live images, requires root (for loop filesystems creation), doesn’t support shipping the Debian installer anymore?
- live-build: tools used by Debian live and other blends (e.g. PGP cleanroom, see how it uses live-build), now uses live-wrapper which uses vmdebootstrap
- Grml and grml-debootstrap: similar to vmdebootstrap, currently used alternative
The Debian cloud team also considered a few tools to generate their cloud images, and some are relevant here (FAI and vmdebootstrap), see this post for details. There’s also this more exhaustive list of tools to build Debian systems
DebIRF¶
Debirf was initially as a buildsystem chosen because it was simple and easy to modify. In the end, Debirf proved to be too limited for our needs: it doesn’t provide a way to embed boot-level, arbitrary binaries like memtest86 because it is too tightly coupled with the Linux kernel. Furthermore, we were having serious issues building debirf images in newer releases, either in Debian 8 (bug #806377) or 9 (bug #848834).
FAI¶
I have tried to use FAI to follow the lead of the Debian cloud
team. Unforunately, I stumbled
upon a few bugs. First, fai-diskimage
would fail to build an image
if the host uses LVM. This was fixed in FAI
5.3.3
(or maybe 5.3.4?). Also, FAI seems to fetch base files from a cleartext
URL, which seems like a
dubious security choice.
After finding this
tutorial,
I figured I would give it a try again. Unfortunately, after asking on
IRC (#debian-cloud
on OFTC), I was told (by Noah!) that
“fai-diskimage is probably not what you want for an iso image” and
they suggested I use fai-cd
. Unfortunately, fai-cd
works
completely differently: it doesn’t support the --class
system that
fai-diskimage
was built with, so we can’t reuse those already
mysterious recipes. fai-cd
seems to be built towards creating
install medium and not live images.
All this seems to make FAI mostly unusable for the task at hand, although it should be noted that grml-live uses FAI to build their images…
vmdebootstrap¶
So far we have had good results with vmdebootstrap
, but the fact
that it requires a loop device has made it difficult to use Gitlab’s CI
system. Docker has a
bug that makes it
impossible to use loop devices and kpartx
commands in it. So to
build images through Gitlab’s CI would require full virtualization
instead of just Docker, something that’s not provided by Gitlab.com
right now. Worse: VirtualBox may not make it to stretch at
all which
makes it difficult to deploy new builders for it. This problem is
probably shared by all image building tools, however, so it’s unclear if
there would be a benefit to switching to another tool right now.
live-build and live-wrapper¶
It is unclear if I am better off using live build or vmdebootstrap directly. The PGP cleanroom build uses live build, so maybe I should do that as well…
Notes: live-wrapper is where the idea of using HDT comes from. Unfortunately, it looks like the boot menus don’t actually work yet (bug #813527). Furthermore, live-wrapper doesn’t support initrd-style netboot, so we would need documentation on how to boot from ISO files over PXE.
Grml¶
Grml is a project quite similar to the original goal of stressant:
- based on Debian
- provides rescue tools
- live CD/USB image
- also provides support for netboot through grml-terminalserver, which can use remote squashfs
The project is really interesting and we have therefore switched the focus of Stressant towards creating an integrated stress-test tool on top of grml instead of trying to fix all the issues those guys are already struggling with…
It has most of the packages we had in our dependencies, except those:
blktool
bonnie++
chntpw
diskscan
e2tools
fatresize
foremost
hfsplus
i7z
lm-sensors
mtd-utils
scrub
smp-utils
stress-ng
tofrodos
u-boot-tools
wodim
Of those, only stress-ng
is actually required by stressant
.
The remaining issues with Grml integration are:
- add stressant to the Grml build (pull request #34)
- review the above packages we collected from various rescue modes and see if they are relevant
- hook stressant in the magic Grml menu to start directly from the boot
menu - we can use the
scripts=path-name
argument for this, it looks in the “DCS
dir”, which is/
or whatevermyconfig=
points at - figure out how to chain into memtest86 to complete the test suite
Because stressant
has been accepted into Debian, we should not need
to setup our own build system, unless Grml refuses to integrate the
package directly. In any case, we may want to setup our own Continuous
Integration (CI) system to build feature branches and similar. The
proper way to do this seems to be to add the .deb
file directly at
the root (/
) of the live filesystem with the install local
files technique and
the debs
boot-time argument.