Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
What's new
10
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Open sidebar
ai3
float
Commits
703b5ab0
Commit
703b5ab0
authored
Jun 14, 2019
by
ale
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Add a NeedsReboot alert when there are pending kernel upgrades
parent
a2e7e7c4
Pipeline
#3440
failed with stage
in 5 minutes and 52 seconds
Changes
1
Pipelines
2
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
20 additions
and
11 deletions
+20
-11
roles/prometheus/files/rules/alerts_base.conf.yml
roles/prometheus/files/rules/alerts_base.conf.yml
+20
-11
No files found.
roles/prometheus/files/rules/alerts_base.conf.yml
View file @
703b5ab0
...
...
@@ -13,14 +13,23 @@ groups:
summary
:
Host {{ $labels.host }} is down
description
:
'
Host
{{
$labels.host
}}
is
unreachable
(icmp
ping).'
-
alert
:
NeedsReboot
expr
:
node_reboot_required >
0
for
:
30m
labels
:
severity
:
warn
annotations
:
summary
:
'
Host
{{
$labels.host
}}
needs
to
reboot'
description
:
'
Host
{{
$labels.host
}}
needs
to
reboot,
there
are
pending
kernel
upgrades.'
-
alert
:
Reboot
expr
:
os_uptime <
6
00
for
:
1
m
expr
:
os_uptime <
9
00
for
:
5
m
labels
:
severity
:
warn
annotations
:
description
:
r
eboot on {{ $labels.host }}
summary
:
reboot on {{ $labels.host }}
summary
:
'
R
eboot
on
{{
$labels.host
}}
'
description
:
'
The
host
{{
$labels.host
}}
has
just
rebooted.
Hopefully
this
was
expected.'
-
alert
:
JobDown
expr
:
up <
1
...
...
@@ -29,7 +38,7 @@ groups:
severity
:
warn
scope
:
host
annotations
:
summary
:
Job {{ $labels.job }}@{{ $labels.host }} is down
summary
:
'
Job
{{
$labels.job
}}@{{
$labels.host
}}
is
down
'
description
:
'
Job
{{
$labels.job
}}
on
{{
$labels.host
}}
has
been
down
for
more
than
5
minutes.
If
this
is
a
prober
job,
then
the
alert
refers
to
the
prometheus-blackbox-exporter
service
itself.'
...
...
@@ -41,9 +50,9 @@ groups:
severity
:
warn
scope
:
global
annotations
:
summary
:
Job {{ $labels.job }} has degraded redundancy
summary
:
'
Job
{{
$labels.job
}}
has
degraded
redundancy
'
description
:
'
Job
{{
$labels.job
}}
is
running
with
slightly
degraded
redundancy
({{$value}})
and
may
eventually
be
at
risk.'
redundancy
({{
$value
}})
and
may
eventually
be
at
risk.'
-
alert
:
JobDown
expr
:
job:up:ratio <
0.51
...
...
@@ -52,8 +61,8 @@ groups:
severity
:
page
scope
:
global
annotations
:
summary
:
Job {{ $labels.job }} is down globally
description
:
'
Job
{{
$labels.job
}}
is
down
globally
(availability
{{$value}}).'
summary
:
'
Job
{{
$labels.job
}}
is
down
globally
'
description
:
'
Job
{{
$labels.job
}}
is
down
globally
(availability
{{
$value
}}).'
-
alert
:
ProbeFailure
expr
:
target:probe_success:ratio{probe!="ping"} <
0.5
...
...
@@ -62,7 +71,7 @@ groups:
severity
:
page
scope
:
host
annotations
:
summary
:
Probe {{ $labels.probe }}@{{ $labels.target }} is failing
summary
:
'
Probe
{{
$labels.probe
}}@{{
$labels.target
}}
is
failing
'
description
:
'
Probe
{{
$labels.probe
}}
({{
$labels.zone
}})
is
failing
for
target
{{
$labels.target
}}
(success
ratio
{{
$value
}}).'
...
...
@@ -73,6 +82,6 @@ groups:
severity
:
page
scope
:
global
annotations
:
summary
:
Probe {{ $labels.probe }} is failing globally
summary
:
'
Probe
{{
$labels.probe
}}
is
failing
globally
'
description
:
'
Probe
{{
$labels.probe
}}
({{
$labels.zone
}})
is
failing
globally
(success
ratio
{{
$value
}}).'
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment