Wednesday, May 22nd 2013, 2:29am UTC+2

You are not logged in.

  • Login
  • Register

Dear visitor, welcome to Monitoring-Portal.
Although this is a german monitoring forum, please don't hesitate to post in English. Nearly everybody here understands you and will answer in English as well.
If this is your first visit here, please read the Help. It explains how this page works. You must be registered before you can use all the page's features. Please use the registration form to register here or read more information about the registration process. If you are already registered, please login here.

Sigi

Beginner

Posts: 19

Gender: male

Location: Erlangen

Number of monitoring servers: 1

Nagios Version: 3.2.3

Icinga Version: 1.6.1

Distributed monitoring: Nein

Redundant monitoring: Nein

Number of hosts: 170

Number of services: 550

OS: Ubuntu 11.10

Plugin Version: Icinga 1.6

1

Wednesday, June 27th 2012, 3:09pm

Problem mit passiven Host-Checks

Ich habe hier eine ganze Menge von Servern, die bereits aktiv von Icinga überwacht werden. Jetzt habe ich die ersten passiven Host-Checks eingerichtet, Grund dafür ist dass diese Server hinter einer Firewall stehen und nicht von außen zugegriffen werden dürfen.

Grundsätzlich funktioniert das auch soweit. Ich bekomme also regelmäßig die Checks per NSCA an den Monitoringserver übermittelt, das lässt sich auch im syslog nachvollziehen.

Mein Ziel ist es jetzt, sobald die Lieferung vom zu überwachenden Server ausbleiben einen aktiven Dummy-Check auszuführen, der den Hoststatus auf DOWN setzt. Aber eben erst, wenn über einen längeren Zeitraum keine Meldung reingekommen ist.
Obwohl die Meldungen regelmäßig kommen, wird aber doch immer wieder der aktive Check ausgeführt und der Host auf DOWN gesetzt, nach einigen Sekunden wird dann wieder die externe Meldung geprüft und er ist wieder UP. Der Host ist also ständig im Flapping-Modus und ich weiß nicht warum?!

Hier meine aktuelle Host-Konfiguration:

Source code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
define host {
    	use 	generic-host
    	host_name   	pxx.division.company.com
    	active_checks_enabled   0
    	notifications_enabled   1
    	check_interval  300
    	max_check_attempts  	3
    	parents localhost
    	freshness_threshold 	1800
    	contacts    	fischersFritz@company.com
    	hostgroups  	EF_ES
    	process_perf_data   	1
    	retry_interval  2.000000
    	flap_detection_options  o,d,u
    	check_period	24x7
    	passive_checks_enabled  1
    	notification_options	d,u,r
    	notification_interval   30.000000
    	address 111.111.111.111
    	contact_groups  PPT
    	flap_detection_enabled  1
    	check_command   check_dummy!2!"Kein Status in der letzten halben Stunde"
    	check_freshness 1
    	_environment	prod
    	_serviceno  	111111
    	_org	f_es
    	alias   PXX
}


Kann mir hier jemand den rettenden Hinweis geben wo mein Fehler liegt?
Danke schon mal für die Unterstützung!

pitchfork

Administrator

Posts: 18,436

Location: Kassel

Occupation: Sysadmin SAP / Linux / AIX

Number of monitoring servers: 2

Hobbies: Motorrad fahren, wenns die Zeit erlaubt :-)

Nagios Version: 3.2.3 ( OMD )

Distributed monitoring: Nein

Redundant monitoring: Nein

Number of hosts: 360

Number of services: 6700

OS: Debian 6.0

Plugin Version: 1.4.x

Other Addons: SNMPTT, NagTrap, check_mk, PNP-0.6.x. Thruk

2

Wednesday, June 27th 2012, 3:26pm

Hallo,
kannst bitte den Auszug aus der objects.cache Datei für diesen Host posten?
Dann sind wir uns 100% sicher wie Icinga deine Config sieht.

Jörg
+++ PNP Developer +++ PNP 0.6.21 ist online ! +++
Hilfreiche Infos gefunden? Dann schnell ein paar Cent flattrn
OMD - Open Monitoring Distribution

Sigi

Beginner

Posts: 19

Gender: male

Location: Erlangen

Number of monitoring servers: 1

Nagios Version: 3.2.3

Icinga Version: 1.6.1

Distributed monitoring: Nein

Redundant monitoring: Nein

Number of hosts: 170

Number of services: 550

OS: Ubuntu 11.10

Plugin Version: Icinga 1.6

3

Wednesday, June 27th 2012, 4:00pm

Hallo Jörg,
danke erstmal für die schnelle Reaktion.
Der entsprechende Inhalt aus der objects.cache sieht so aus:

Source code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
define host {
    	host_name   	pxx.division.company.com
    	alias   PXX
    	address 111.111.111.111
    	parents localhost
    	check_period	24x7
    	check_command   check_dummy!2!"Kein Status in der letzten halben Stunde"
    	contacts    	fischersFritz@company.com
    	contact_groups  PXX
    	notification_period 	24x7
    	initial_state   o
    	check_interval  300.000000
    	retry_interval  2.000000
    	max_check_attempts  	3
    	active_checks_enabled   0
    	passive_checks_enabled  1
    	obsess_over_host    	1
    	event_handler_enabled   1
    	low_flap_threshold  	0.000000
    	high_flap_threshold 	0.000000
    	flap_detection_enabled  1
    	flap_detection_options  o,d,u
    	freshness_threshold 	1800
    	check_freshness 1
    	notification_options	d,u,r
    	notifications_enabled   1
    	notification_interval   30.000000
    	first_notification_delay    	0.000000
    	stalking_options    	n
    	process_perf_data   	1
    	failure_prediction_enabled  	1
    	retain_status_information   	1
    	retain_nonstatus_information	1
    	_ORG	f_es
    	_SERVICENO  	111111
    	_ENVIRONMENT	prod
    	}

dnsmichi

Super Moderator

Posts: 5,981

Birthday: May 30th 1983 (29)

Gender: male

Location: Nürnberg

Occupation: Consultant / Developer beim besten Arbeitgeber der Welt @netways

Number of monitoring servers: Icinga: 4x dev, 10++ prod, Icinga2: 2x dev

Nagios Version: s/nagios/icinga/

Icinga Version: 1.9.0 / GIT

Distributed monitoring: Ja

Redundant monitoring: Ja

Number of hosts: 1000+

Number of services: 15000+

OS: RHEL, Debian, SUSE

Plugin Version: 1.4.16

IDO-Version: 1.9.0 / GIT MySQL/Postgresql/Oracle

Other Addons: Icinga Web, PNP, check_multi, inGraph, EventDB, LConf

4

Wednesday, June 27th 2012, 4:03pm

eine log historie zu diesem host waere hilfreich.
+++ Icinga / LConf Developer +++ Senior Consultant at []NETWAYS> +++
+++ Icinga 1.9 || Icinga 2 +++ Icinga Support || IRC +++

Sigi

Beginner

Posts: 19

Gender: male

Location: Erlangen

Number of monitoring servers: 1

Nagios Version: 3.2.3

Icinga Version: 1.6.1

Distributed monitoring: Nein

Redundant monitoring: Nein

Number of hosts: 170

Number of services: 550

OS: Ubuntu 11.10

Plugin Version: Icinga 1.6

5

Wednesday, June 27th 2012, 4:25pm

also bin ich jetzt mal das syslog durchgegangen und habe mir alle Host-relevanten Einträge zu dieser Maschine über ein paar Minuten hinweg rausgefiltert. Alle Meldungen von anderen Servern sowie die zugehörigen Services sind hier also nicht enthalten.
Auch wenn der Zeitraum nur ein paar Minuten beträgt, sieht man hier doch schon die ständigen Meldungen des aktiven Checks, der meiner Meinung nach gar nicht ausgeführt werden sollte.

Source code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
Jun 27 10:19:51 E711C-304013 icinga: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
Jun 27 10:19:51 E711C-304013 icinga: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
Jun 27 10:19:51 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:19:51 E711C-304013 icinga: HOST ALERT: pxx.division.company.com;UP;SOFT;2;OK: Lets pretend everything is going to be ok.
Jun 27 10:19:52 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:20:14 E711C-304013 nsca[29889]: HOST CHECK -> Host Name: 'pxx.division.company.com', Return Code: '0', Output: 'OK: Lets pretend everything is going to be ok.|'
Jun 27 10:20:15 E711C-304013 icinga: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
Jun 27 10:20:44 E711C-304013 nsca[30115]: HOST CHECK -> Host Name: 'pxx.division.company.com', Return Code: '0', Output: 'OK: Lets pretend everything is going to be ok.|'
Jun 27 10:20:50 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:21:07 E711C-304013 icinga: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
Jun 27 10:21:14 E711C-304013 nsca[30324]: HOST CHECK -> Host Name: 'pxx.division.company.com', Return Code: '0', Output: 'OK: Lets pretend everything is going to be ok.|'
Jun 27 10:21:39 E711C-304013 icinga: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
Jun 27 10:21:40 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:21:40 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:21:40 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:21:45 E711C-304013 nsca[30566]: HOST CHECK -> Host Name: 'pxx.division.company.com', Return Code: '0', Output: 'OK: Lets pretend everything is going to be ok.|'
Jun 27 10:21:56 E711C-304013 icinga: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
Jun 27 10:21:56 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:22:06 E711C-304013 icinga: HOST ALERT: pxx.division.company.com;DOWN;SOFT;1;CRITICAL: No status received within last 30 minutes
Jun 27 10:22:15 E711C-304013 nsca[31620]: HOST CHECK -> Host Name: 'pxx.division.company.com', Return Code: '0', Output: 'OK: Lets pretend everything is going to be ok.|'
Jun 27 10:22:45 E711C-304013 nsca[31754]: HOST CHECK -> Host Name: 'pxx.division.company.com', Return Code: '0', Output: 'OK: Lets pretend everything is going to be ok.|'
Jun 27 10:22:45 E711C-304013 icinga: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
Jun 27 10:22:45 E711C-304013 icinga: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
Jun 27 10:23:15 E711C-304013 nsca[31981]: HOST CHECK -> Host Name: 'pxx.division.company.com', Return Code: '0', Output: 'OK: Lets pretend everything is going to be ok.|'
Jun 27 10:23:16 E711C-304013 icinga: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
Jun 27 10:23:45 E711C-304013 nsca[32214]: HOST CHECK -> Host Name: 'pxx.division.company.com', Return Code: '0', Output: 'OK: Lets pretend everything is going to be ok.|'
Jun 27 10:23:45 E711C-304013 nsca[32214]: HOST CHECK -> Host Name: 'pxx.division.company.com', Return Code: '0', Output: 'OK: Lets pretend everything is going to be ok.|'
Jun 27 10:23:46 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:23:46 E711C-304013 icinga: HOST ALERT: pxx.division.company.com;UP;SOFT;3;OK: Lets pretend everything is going to be ok.
Jun 27 10:23:46 E711C-304013 icinga: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
Jun 27 10:24:03 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:24:03 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:24:15 E711C-304013 nsca[32426]: HOST CHECK -> Host Name: 'pxx.division.company.com', Return Code: '0', Output: 'OK: Lets pretend everything is going to be ok.|'
Jun 27 10:24:27 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:24:27 E711C-304013 icinga: HOST ALERT: pxx.division.company.com;DOWN;SOFT;1;CRITICAL: No status received within last 30 minutes
Jun 27 10:24:27 E711C-304013 icinga: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
Jun 27 10:24:45 E711C-304013 nsca[32534]: HOST CHECK -> Host Name: 'pxx.division.company.com', Return Code: '0', Output: 'OK: Lets pretend everything is going to be ok.|'
Jun 27 10:24:49 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:24:49 E711C-304013 icinga: HOST ALERT: pxx.division.company.com;UP;SOFT;3;OK: Lets pretend everything is going to be ok.
Jun 27 10:24:49 E711C-304013 icinga: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
Jun 27 10:24:50 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:24:50 E711C-304013 icinga: HOST ALERT: pxx.division.company.com;DOWN;SOFT;2;CRITICAL: No status received within last 30 minutes
Jun 27 10:25:15 E711C-304013 nsca[645]: HOST CHECK -> Host Name: 'pxx.division.company.com', Return Code: '0', Output: 'OK: Lets pretend everything is going to be ok.|'
Jun 27 10:25:15 E711C-304013 icinga: EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
Jun 27 10:25:22 E711C-304013 icinga: HOST ALERT: pxx.division.company.com;DOWN;HARD;3;CRITICAL: No status received within last 30 minutes
Jun 27 10:25:22 E711C-304013 icinga: PASSIVE HOST CHECK: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.
Jun 27 10:25:22 E711C-304013 icinga: HOST ALERT: pxx.division.company.com;UP;HARD;1;OK: Lets pretend everything is going to be ok.
Jun 27 10:25:45 E711C-304013 nsca[868]: HOST CHECK -> Host Name: 'pxx.division.company.com', Return Code: '0', Output: 'OK: Lets pretend everything is going to be ok.|'

dnsmichi

Super Moderator

Posts: 5,981

Birthday: May 30th 1983 (29)

Gender: male

Location: Nürnberg

Occupation: Consultant / Developer beim besten Arbeitgeber der Welt @netways

Number of monitoring servers: Icinga: 4x dev, 10++ prod, Icinga2: 2x dev

Nagios Version: s/nagios/icinga/

Icinga Version: 1.9.0 / GIT

Distributed monitoring: Ja

Redundant monitoring: Ja

Number of hosts: 1000+

Number of services: 15000+

OS: RHEL, Debian, SUSE

Plugin Version: 1.4.16

IDO-Version: 1.9.0 / GIT MySQL/Postgresql/Oracle

Other Addons: Icinga Web, PNP, check_multi, inGraph, EventDB, LConf

6

Wednesday, June 27th 2012, 4:32pm

sehe ich das richtig, dass nsca alle 30 sekunden ein neues hostupdate liefert?

wie gehts denn deinem system und v.a. deiner external command pipe - kommt der core mit der verarbeitung der daten nach?

wie sieht die config aus...

Source code

1
# egrep -v "^#|^$" icinga.cfg


sowie ein run & output von icingastats
+++ Icinga / LConf Developer +++ Senior Consultant at []NETWAYS> +++
+++ Icinga 1.9 || Icinga 2 +++ Icinga Support || IRC +++

pitchfork

Administrator

Posts: 18,436

Location: Kassel

Occupation: Sysadmin SAP / Linux / AIX

Number of monitoring servers: 2

Hobbies: Motorrad fahren, wenns die Zeit erlaubt :-)

Nagios Version: 3.2.3 ( OMD )

Distributed monitoring: Nein

Redundant monitoring: Nein

Number of hosts: 360

Number of services: 6700

OS: Debian 6.0

Plugin Version: 1.4.x

Other Addons: SNMPTT, NagTrap, check_mk, PNP-0.6.x. Thruk

7

Wednesday, June 27th 2012, 4:33pm

ich würde vorher gerne prüfen ob wirklich nur eine Icinga instanz läuft
+++ PNP Developer +++ PNP 0.6.21 ist online ! +++
Hilfreiche Infos gefunden? Dann schnell ein paar Cent flattrn
OMD - Open Monitoring Distribution

Sigi

Beginner

Posts: 19

Gender: male

Location: Erlangen

Number of monitoring servers: 1

Nagios Version: 3.2.3

Icinga Version: 1.6.1

Distributed monitoring: Nein

Redundant monitoring: Nein

Number of hosts: 170

Number of services: 550

OS: Ubuntu 11.10

Plugin Version: Icinga 1.6

8

Wednesday, June 27th 2012, 4:42pm

Es läuft definitiv nur eine Icinga-Instanz auf dem Monitoringserver.

Das mit den Hostupdates siehst Du richtig, im Moment kommt das alle 30 Sekunden.
Hier der gewünschte Auszug aus der config:

Source code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
log_file=/usr/local/icinga/var/icinga.log
cfg_dir=/usr/local/icinga/etc/lconf
cfg_dir=/usr/local/icinga/etc/modules
object_cache_file=/usr/local/icinga/var/objects.cache
precached_object_file=/usr/local/icinga/var/objects.precache
resource_file=/usr/local/icinga/etc/resource.cfg
status_file=/usr/local/icinga/var/status.dat
status_update_interval=10
icinga_user=icinga
icinga_group=icinga
check_external_commands=1
command_check_interval=-1
command_file=/usr/local/icinga/var/rw/icinga.cmd
external_command_buffer_slots=4096
lock_file=/usr/local/icinga/var/icinga.lock
temp_file=/usr/local/icinga/var/icinga.tmp
temp_path=/tmp
event_broker_options=-1
broker_module=/usr/local/icinga/bin/idomod.o config_file=/usr/local/icinga/etc/idomod.cfg
log_rotation_method=d
log_archive_path=/usr/local/icinga/var/archives
use_daemon_log=1
use_syslog=1
use_syslog_local_facility=0
syslog_local_facility=5
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_current_states=1
log_external_commands=1
log_passive_checks=1
log_external_commands_user=0
log_long_plugin_output=0
service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
check_result_reaper_frequency=10
max_check_result_reaper_time=30
check_result_path=/usr/local/icinga/var/spool/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=60
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
sleep_time=0.25
service_check_timeout=120
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=/usr/local/icinga/var/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
interval_length=60
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
obsess_over_services=0
obsess_over_hosts=0
translate_passive_host_checks=0
passive_host_checks_are_soft=1
check_for_orphaned_services=1
check_for_orphaned_hosts=1
service_check_timeout_state=u
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=1
host_freshness_check_interval=60
additional_freshness_latency=15
enable_flap_detection=1
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=us
p1_file=/usr/local/icinga/bin/p1.pl
enable_embedded_perl=1
use_embedded_perl_implicitly=1
stalking_event_handlers_for_hosts=0
stalking_event_handlers_for_services=0
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
use_regexp_matching=0
use_true_regexp_matching=0
admin_email=icinga@localhost
admin_pager=pageicinga@localhost
daemon_dumps_core=0
use_large_installation_tweaks=0
enable_environment_macros=1
debug_level=-1
debug_verbosity=1
debug_file=/usr/local/icinga/var/icinga.debug
max_debug_file_size=100000000
event_profiling_enabled=0


und der Output von Icingastats:

Source code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
CURRENT STATUS DATA
------------------------------------------------------
Status File:                        	/usr/local/icinga/var/status.dat
Status File Age:                    	0d 0h 0m 3s
Status File Version:                	1.6.1

Program Running Time:               	0d 2h 32m 7s
Icinga PID:                         	1976
Used/High/Total Command Buffers:    	12 / 174 / 4096

Total Services:                     	540
Services Checked:                   	540
Services Scheduled:                 	518
Services Actively Checked:          	525
Services Passively Checked:         	15
Total Service State Change:         	0.000 / 18.160 / 0.131 %
Active Service Latency:             	0.139 / 222.843 / 126.660 sec
Active Service Execution Time:      	0.003 / 10.032 / 1.223 sec
Active Service State Change:        	0.000 / 18.160 / 0.135 %
Active Services Last 1/5/15/60 min: 	0 / 64 / 479 / 516
Passive Service Latency:            	4.177 / 21.176 / 8.209 sec
Passive Service State Change:       	0.000 / 0.000 / 0.000 %
Passive Services Last 1/5/15/60 min:	0 / 15 / 15 / 15
Services Ok/Warn/Unk/Crit:          	459 / 7 / 6 / 68
Services Flapping:                  	0
Services In Downtime:               	0

Total Hosts:                        	169
Hosts Checked:                      	169
Hosts Scheduled:                    	167
Hosts Actively Checked:             	166
Host Passively Checked:             	3
Total Host State Change:            	0.000 / 60.790 / 0.718 %
Active Host Latency:                	0.507 / 246.883 / 142.159 sec
Active Host Execution Time:         	0.003 / 4.444 / 3.876 sec
Active Host State Change:           	0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min:    	0 / 73 / 165 / 165
Passive Host Latency:               	4.177 / 10.177 / 7.300 sec
Passive Host State Change:          	0.000 / 60.790 / 40.440 %
Passive Hosts Last 1/5/15/60 min:   	0 / 3 / 3 / 3
Hosts Up/Down/Unreach:              	169 / 0 / 0
Hosts Flapping:                     	2
Hosts In Downtime:                  	0

Active Host Checks Last 1/5/15 min: 	3 / 100 / 388
   Scheduled:                       	0 / 76 / 281
   On-demand:                       	3 / 24 / 107
   Parallel:                        	1 / 83 / 300
   Serial:                          	0 / 0 / 0
   Cached:                          	1 / 17 / 87
Passive Host Checks Last 1/5/15 min:	0 / 4 / 4
Active Service Checks Last 1/5/15 min:  0 / 64 / 580
   Scheduled:                       	0 / 64 / 580
   On-demand:                       	0 / 0 / 0
   Cached:                          	0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 0 / 19 / 19

External Commands Last 1/5/15 min:  	33 / 178 / 554

dnsmichi

Super Moderator

Posts: 5,981

Birthday: May 30th 1983 (29)

Gender: male

Location: Nürnberg

Occupation: Consultant / Developer beim besten Arbeitgeber der Welt @netways

Number of monitoring servers: Icinga: 4x dev, 10++ prod, Icinga2: 2x dev

Nagios Version: s/nagios/icinga/

Icinga Version: 1.9.0 / GIT

Distributed monitoring: Ja

Redundant monitoring: Ja

Number of hosts: 1000+

Number of services: 15000+

OS: RHEL, Debian, SUSE

Plugin Version: 1.4.16

IDO-Version: 1.9.0 / GIT MySQL/Postgresql/Oracle

Other Addons: Icinga Web, PNP, check_multi, inGraph, EventDB, LConf

9

Wednesday, June 27th 2012, 4:54pm

versteh ich nicht. wie sieht dieser host im debug log aus, wenn man auf checks + notifications + external commands mittraced?
https://wiki.icinga.org/display/Dev/Icin…nfig-DebugLevel
+++ Icinga / LConf Developer +++ Senior Consultant at []NETWAYS> +++
+++ Icinga 1.9 || Icinga 2 +++ Icinga Support || IRC +++

Sigi

Beginner

Posts: 19

Gender: male

Location: Erlangen

Number of monitoring servers: 1

Nagios Version: 3.2.3

Icinga Version: 1.6.1

Distributed monitoring: Nein

Redundant monitoring: Nein

Number of hosts: 170

Number of services: 550

OS: Ubuntu 11.10

Plugin Version: Icinga 1.6

10

Wednesday, June 27th 2012, 5:04pm

was ich in meinem letzten Post geschrieben habe, war das Logging auf Seite des Monitoring-Servers. Nur da treten ja die Probleme auf weil irgendwie die aktiven Checks dazwischenfeuern.
Auf Seite des zu prüfenden Servers passt soweit alles, alle Services und Hostprüfungen laufen erfolgreich und werden regelmäßig übermittelt.
Hier aber trotzdem auch noch diese Logdatei (bzw. ein kurzer Auszug davon):

Source code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
2012-06-26 15:50:17: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): checkdisk_c
2012-06-26 15:50:17: debug:NSClient++.cpp:1144: Injecting: CheckDriveSize: ShowAll, MinWarnFree=20%, MinCritFree=10%, Drive=c:\
2012-06-26 15:50:17: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: c:\: 34.3G'
2012-06-26 15:50:17: debug:NSClient++.cpp:1181: Injected Performance Result: ''c:\ %'=32%;20;10 'c:\'=34.33GB;10;5;0;50'
2012-06-26 15:50:17: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): checkdisk_d
2012-06-26 15:50:17: debug:NSClient++.cpp:1144: Injecting: CheckDriveSize: ShowAll, MinWarnFree=20%, MinCritFree=10%, Drive=d:\
2012-06-26 15:50:17: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: d:\: 14.9G'
2012-06-26 15:50:17: debug:NSClient++.cpp:1181: Injected Performance Result: ''d:\ %'=89%;20;10 'd:\'=14.87GB;26;13;0;130'
2012-06-26 15:50:17: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): 
2012-06-26 15:50:17: debug:NSClient++.cpp:1144: Injecting: check_ok: 
2012-06-26 15:50:17: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: Lets pretend everything is going to be ok.'
2012-06-26 15:50:17: debug:NSClient++.cpp:1181: Injected Performance Result: ''
2012-06-26 15:50:17: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): my_cpu_check
2012-06-26 15:50:17: debug:NSClient++.cpp:1144: Injecting: checkCPU: warn=80, crit=90, time=20m, time=10s, time=4
2012-06-26 15:50:17: debug:NSClient++.cpp:1180: Injected Result: OK 'OK CPU Load ok.'
2012-06-26 15:50:17: debug:NSClient++.cpp:1181: Injected Performance Result: ''20m'=0%;80;90 '10s'=0%;80;90 '4'=0%;80;90'
2012-06-26 15:50:17: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): my_mem_check
2012-06-26 15:50:17: debug:NSClient++.cpp:1144: Injecting: checkMem: MaxWarn=80%, MaxCrit=90%, ShowAll, type=page
2012-06-26 15:50:17: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: page file: 4.12G'
2012-06-26 15:50:17: debug:NSClient++.cpp:1181: Injected Performance Result: ''page file %'=10%;80;90 'page file'=4.12GB;32;36;0;40'
2012-06-26 15:50:17: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): my_svc_check
2012-06-26 15:50:17: debug:NSClient++.cpp:1144: Injecting: checkServiceState: CheckAll, exclude=wampmysqld, exclude=MpfService, exclude=ShellHWDetection
2012-06-26 15:50:17: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: All services are in their appropriate state.'
2012-06-26 15:50:17: debug:NSClient++.cpp:1181: Injected Performance Result: ''
2012-06-26 15:50:17: debug:modules\NSCAAgent\NSCAThread.cpp:272: Sending to server...
2012-06-26 15:50:17: debug:modules\NSCAAgent\NSCAThread.cpp:279: Looked up 146.254.74.32 to 146.254.74.32
2012-06-26 15:50:18: debug:modules\NSCAAgent\NSCAThread.cpp:356: Finnished sending to server...
2012-06-26 15:50:47: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): checkdisk_c
2012-06-26 15:50:47: debug:NSClient++.cpp:1144: Injecting: CheckDriveSize: ShowAll, MinWarnFree=20%, MinCritFree=10%, Drive=c:\
2012-06-26 15:50:47: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: c:\: 34.3G'
2012-06-26 15:50:47: debug:NSClient++.cpp:1181: Injected Performance Result: ''c:\ %'=32%;20;10 'c:\'=34.33GB;10;5;0;50'
2012-06-26 15:50:47: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): checkdisk_d
2012-06-26 15:50:47: debug:NSClient++.cpp:1144: Injecting: CheckDriveSize: ShowAll, MinWarnFree=20%, MinCritFree=10%, Drive=d:\
2012-06-26 15:50:47: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: d:\: 14.9G'
2012-06-26 15:50:47: debug:NSClient++.cpp:1181: Injected Performance Result: ''d:\ %'=89%;20;10 'd:\'=14.87GB;26;13;0;130'
2012-06-26 15:50:47: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): 
2012-06-26 15:50:47: debug:NSClient++.cpp:1144: Injecting: check_ok: 
2012-06-26 15:50:47: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: Lets pretend everything is going to be ok.'
2012-06-26 15:50:47: debug:NSClient++.cpp:1181: Injected Performance Result: ''
2012-06-26 15:50:47: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): my_cpu_check
2012-06-26 15:50:47: debug:NSClient++.cpp:1144: Injecting: checkCPU: warn=80, crit=90, time=20m, time=10s, time=4
2012-06-26 15:50:47: debug:NSClient++.cpp:1180: Injected Result: OK 'OK CPU Load ok.'
2012-06-26 15:50:47: debug:NSClient++.cpp:1181: Injected Performance Result: ''20m'=0%;80;90 '10s'=0%;80;90 '4'=0%;80;90'
2012-06-26 15:50:47: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): my_mem_check
2012-06-26 15:50:47: debug:NSClient++.cpp:1144: Injecting: checkMem: MaxWarn=80%, MaxCrit=90%, ShowAll, type=page
2012-06-26 15:50:47: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: page file: 4.11G'
2012-06-26 15:50:47: debug:NSClient++.cpp:1181: Injected Performance Result: ''page file %'=10%;80;90 'page file'=4.11GB;32;36;0;40'
2012-06-26 15:50:47: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): my_svc_check
2012-06-26 15:50:47: debug:NSClient++.cpp:1144: Injecting: checkServiceState: CheckAll, exclude=wampmysqld, exclude=MpfService, exclude=ShellHWDetection
2012-06-26 15:50:47: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: All services are in their appropriate state.'
2012-06-26 15:50:47: debug:NSClient++.cpp:1181: Injected Performance Result: ''
2012-06-26 15:50:47: debug:modules\NSCAAgent\NSCAThread.cpp:272: Sending to server...
2012-06-26 15:50:47: debug:modules\NSCAAgent\NSCAThread.cpp:279: Looked up 146.254.74.32 to 146.254.74.32
2012-06-26 15:50:48: debug:modules\NSCAAgent\NSCAThread.cpp:356: Finnished sending to server...
2012-06-26 15:51:17: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): checkdisk_c
2012-06-26 15:51:17: debug:NSClient++.cpp:1144: Injecting: CheckDriveSize: ShowAll, MinWarnFree=20%, MinCritFree=10%, Drive=c:\
2012-06-26 15:51:17: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: c:\: 34.3G'
2012-06-26 15:51:17: debug:NSClient++.cpp:1181: Injected Performance Result: ''c:\ %'=32%;20;10 'c:\'=34.33GB;10;5;0;50'
2012-06-26 15:51:17: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): checkdisk_d
2012-06-26 15:51:17: debug:NSClient++.cpp:1144: Injecting: CheckDriveSize: ShowAll, MinWarnFree=20%, MinCritFree=10%, Drive=d:\
2012-06-26 15:51:17: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: d:\: 14.9G'
2012-06-26 15:51:17: debug:NSClient++.cpp:1181: Injected Performance Result: ''d:\ %'=89%;20;10 'd:\'=14.87GB;26;13;0;130'
2012-06-26 15:51:17: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): 
2012-06-26 15:51:17: debug:NSClient++.cpp:1144: Injecting: check_ok: 
2012-06-26 15:51:17: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: Lets pretend everything is going to be ok.'
2012-06-26 15:51:17: debug:NSClient++.cpp:1181: Injected Performance Result: ''
2012-06-26 15:51:17: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): my_cpu_check
2012-06-26 15:51:17: debug:NSClient++.cpp:1144: Injecting: checkCPU: warn=80, crit=90, time=20m, time=10s, time=4
2012-06-26 15:51:17: debug:NSClient++.cpp:1180: Injected Result: OK 'OK CPU Load ok.'
2012-06-26 15:51:17: debug:NSClient++.cpp:1181: Injected Performance Result: ''20m'=0%;80;90 '10s'=0%;80;90 '4'=0%;80;90'
2012-06-26 15:51:17: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): my_mem_check
2012-06-26 15:51:17: debug:NSClient++.cpp:1144: Injecting: checkMem: MaxWarn=80%, MaxCrit=90%, ShowAll, type=page
2012-06-26 15:51:17: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: page file: 4.12G'
2012-06-26 15:51:17: debug:NSClient++.cpp:1181: Injected Performance Result: ''page file %'=10%;80;90 'page file'=4.12GB;32;36;0;40'
2012-06-26 15:51:17: debug:modules\NSCAAgent\NSCAThread.cpp:206: Executing (from NSCA): my_svc_check
2012-06-26 15:51:17: debug:NSClient++.cpp:1144: Injecting: checkServiceState: CheckAll, exclude=wampmysqld, exclude=MpfService, exclude=ShellHWDetection
2012-06-26 15:51:17: debug:NSClient++.cpp:1180: Injected Result: OK 'OK: All services are in their appropriate state.'
2012-06-26 15:51:17: debug:NSClient++.cpp:1181: Injected Performance Result: ''
2012-06-26 15:51:17: debug:modules\NSCAAgent\NSCAThread.cpp:272: Sending to server...
2012-06-26 15:51:17: debug:modules\NSCAAgent\NSCAThread.cpp:279: Looked up 146.254.74.32 to 146.254.74.32
2012-06-26 15:51:18: debug:modules\NSCAAgent\NSCAThread.cpp:356: Finnished sending to server...

dnsmichi

Super Moderator

Posts: 5,981

Birthday: May 30th 1983 (29)

Gender: male

Location: Nürnberg

Occupation: Consultant / Developer beim besten Arbeitgeber der Welt @netways

Number of monitoring servers: Icinga: 4x dev, 10++ prod, Icinga2: 2x dev

Nagios Version: s/nagios/icinga/

Icinga Version: 1.9.0 / GIT

Distributed monitoring: Ja

Redundant monitoring: Ja

Number of hosts: 1000+

Number of services: 15000+

OS: RHEL, Debian, SUSE

Plugin Version: 1.4.16

IDO-Version: 1.9.0 / GIT MySQL/Postgresql/Oracle

Other Addons: Icinga Web, PNP, check_multi, inGraph, EventDB, LConf

11

Wednesday, June 27th 2012, 5:15pm

wenn ich schonmal dabei bin - dier ueblichen performance graphen vom icinga host, sowie eine ausfuehrliche beschreibung von

- distribution+version
- icinga version
- installierten addons
- hardware oder virtuell
- eine rdbms installiert, wenn ja welche
+++ Icinga / LConf Developer +++ Senior Consultant at []NETWAYS> +++
+++ Icinga 1.9 || Icinga 2 +++ Icinga Support || IRC +++

Sigi

Beginner

Posts: 19

Gender: male

Location: Erlangen

Number of monitoring servers: 1

Nagios Version: 3.2.3

Icinga Version: 1.6.1

Distributed monitoring: Nein

Redundant monitoring: Nein

Number of hosts: 170

Number of services: 550

OS: Ubuntu 11.10

Plugin Version: Icinga 1.6

12

Wednesday, June 27th 2012, 5:29pm

Das Monitoring läuft auf einem physikalischen Server mit Ubuntu 11.10
Icinga ist in der Version 1.6.1 am Laufen und es ist eine mySQL-Datenbank mit eingerichtet.

Der zu überwachende Server ist ein Win2k8-Server mit installiertem NSClient++ in der Version 0.3.9 in 64bit-Version.

...auch wenn mir nicht ganz klar ist wie weit das jetzt noch mit dem ursprünglichen Problem zu tun haben kann.
Trotzdem auf jeden Fall schon vielen Dank für die Unterstützung.

pitchfork

Administrator

Posts: 18,436

Location: Kassel

Occupation: Sysadmin SAP / Linux / AIX

Number of monitoring servers: 2

Hobbies: Motorrad fahren, wenns die Zeit erlaubt :-)

Nagios Version: 3.2.3 ( OMD )

Distributed monitoring: Nein

Redundant monitoring: Nein

Number of hosts: 360

Number of services: 6700

OS: Debian 6.0

Plugin Version: 1.4.x

Other Addons: SNMPTT, NagTrap, check_mk, PNP-0.6.x. Thruk

13

Wednesday, June 27th 2012, 5:43pm

Wenn du sagst, du verwendest eine mysql DB, meinst du damit das du dein Icinga mit IDE gekoppelt hast?

Mich wundern die sehr schlechten Lanency Werte für Host und Service checks. Das ist sehr ungewöhnlich für ein so kleines System.

Die Uhren deines Nagios Servers und des Windows Clients sind synchron?
+++ PNP Developer +++ PNP 0.6.21 ist online ! +++
Hilfreiche Infos gefunden? Dann schnell ein paar Cent flattrn
OMD - Open Monitoring Distribution

dnsmichi

Super Moderator

Posts: 5,981

Birthday: May 30th 1983 (29)

Gender: male

Location: Nürnberg

Occupation: Consultant / Developer beim besten Arbeitgeber der Welt @netways

Number of monitoring servers: Icinga: 4x dev, 10++ prod, Icinga2: 2x dev

Nagios Version: s/nagios/icinga/

Icinga Version: 1.9.0 / GIT

Distributed monitoring: Ja

Redundant monitoring: Ja

Number of hosts: 1000+

Number of services: 15000+

OS: RHEL, Debian, SUSE

Plugin Version: 1.4.16

IDO-Version: 1.9.0 / GIT MySQL/Postgresql/Oracle

Other Addons: Icinga Web, PNP, check_multi, inGraph, EventDB, LConf

14

Wednesday, June 27th 2012, 5:43pm

naja, ich habe den verdacht, dass dein server performance probleme hat. dh um das zu ergruenden, empfiehlt es sich einerseits das ganze zu ueberwachen, und andererseits das auch hier dann zu posten. zb nach dieser sammlung hier - https://wiki.icinga.org/display/howtos/M…the+Icinga+Host

weiters sind allerdings noch 2 fragen offen

- wieviele icinga instanzen laufen? zum testen ausfuehren und output posten (und dann den icinga service wieder starten)

Source code

1
service icinga stop ; ps aux | grep icinga

- wie sieht das debug log auszugsweise aus, wie oben schon gefragt
+++ Icinga / LConf Developer +++ Senior Consultant at []NETWAYS> +++
+++ Icinga 1.9 || Icinga 2 +++ Icinga Support || IRC +++

Sigi

Beginner

Posts: 19

Gender: male

Location: Erlangen

Number of monitoring servers: 1

Nagios Version: 3.2.3

Icinga Version: 1.6.1

Distributed monitoring: Nein

Redundant monitoring: Nein

Number of hosts: 170

Number of services: 550

OS: Ubuntu 11.10

Plugin Version: Icinga 1.6

15

Thursday, June 28th 2012, 7:53am

Danke nochmal für die vielen Tipps und Hinweise die ich hier bekomme. Ist mit Sicherheit auch sinnvoll das alles nochmal durchzugehen und das System zu optimieren.
Für den Moment geht das aber meiner Meinung nach an der ursprünglichen Frage vorbei.
Sogar wenn die Serverzeiten nicht synchron sind und eine Minute abweichen würden und dazu noch eine Latenzzeit von einer weiteren Minute dazukäme dürfte doch immer noch kein aktiver Check ausgeführt werden, da der nach Konfiguration erst aktiv werden soll wenn eine halbe Stunde lang keine Meldung angekommen ist???

dnsmichi

Super Moderator

Posts: 5,981

Birthday: May 30th 1983 (29)

Gender: male

Location: Nürnberg

Occupation: Consultant / Developer beim besten Arbeitgeber der Welt @netways

Number of monitoring servers: Icinga: 4x dev, 10++ prod, Icinga2: 2x dev

Nagios Version: s/nagios/icinga/

Icinga Version: 1.9.0 / GIT

Distributed monitoring: Ja

Redundant monitoring: Ja

Number of hosts: 1000+

Number of services: 15000+

OS: RHEL, Debian, SUSE

Plugin Version: 1.4.16

IDO-Version: 1.9.0 / GIT MySQL/Postgresql/Oracle

Other Addons: Icinga Web, PNP, check_multi, inGraph, EventDB, LConf

16

Thursday, June 28th 2012, 10:11am

ich hab noch nie ein system gesehen, das host updates im 30 sekunden takt passive schickt. und solange du mir kein debuglog zeigst, wo ich anhand des flows ansatzweise erkennen kann, was der core tut, frage ich eben alle anderen moeglichen parameter ab.
+++ Icinga / LConf Developer +++ Senior Consultant at []NETWAYS> +++
+++ Icinga 1.9 || Icinga 2 +++ Icinga Support || IRC +++

Sigi

Beginner

Posts: 19

Gender: male

Location: Erlangen

Number of monitoring servers: 1

Nagios Version: 3.2.3

Icinga Version: 1.6.1

Distributed monitoring: Nein

Redundant monitoring: Nein

Number of hosts: 170

Number of services: 550

OS: Ubuntu 11.10

Plugin Version: Icinga 1.6

17

Thursday, June 28th 2012, 10:29am

Das Intervall für die passiven Checks ist einfach aus der Standard-Config übernommen, das kann ich auch gerne ändern. Was mich in dem Zusammenhang auch noch interessieren würde ist, ob unterschiedliche Zeiten für Service- und Hostchecks möglich sind. Also die passiven Servicechecks vom NSCA alle 5 Minuten abschicken, den Check des Hosts aber nur einmal pro Stunde. In den Dokus und Foren habe ich dazu leider nichts gefunden.

...und ein Debuglog findest du in meinem Post von gestern um 17:04 Uhr, also nix für ungut :)

dnsmichi

Super Moderator

Posts: 5,981

Birthday: May 30th 1983 (29)

Gender: male

Location: Nürnberg

Occupation: Consultant / Developer beim besten Arbeitgeber der Welt @netways

Number of monitoring servers: Icinga: 4x dev, 10++ prod, Icinga2: 2x dev

Nagios Version: s/nagios/icinga/

Icinga Version: 1.9.0 / GIT

Distributed monitoring: Ja

Redundant monitoring: Ja

Number of hosts: 1000+

Number of services: 15000+

OS: RHEL, Debian, SUSE

Plugin Version: 1.4.16

IDO-Version: 1.9.0 / GIT MySQL/Postgresql/Oracle

Other Addons: Icinga Web, PNP, check_multi, inGraph, EventDB, LConf

18

Thursday, June 28th 2012, 10:50am

...und ein Debuglog findest du in meinem Post von gestern um 17:04 Uhr, also nix für ungut :)


ich rede vom icinga / nagios core, nicht vom nsclient++ - wir wollen ja schliesslich ergruenden, warum dein core aktive checks durchfuehrt, die er gar nicht durchfuehren sollte!
http://www.monitoring-portal.org/wbb/ind…3456#post173456
+++ Icinga / LConf Developer +++ Senior Consultant at []NETWAYS> +++
+++ Icinga 1.9 || Icinga 2 +++ Icinga Support || IRC +++

pitchfork

Administrator

Posts: 18,436

Location: Kassel

Occupation: Sysadmin SAP / Linux / AIX

Number of monitoring servers: 2

Hobbies: Motorrad fahren, wenns die Zeit erlaubt :-)

Nagios Version: 3.2.3 ( OMD )

Distributed monitoring: Nein

Redundant monitoring: Nein

Number of hosts: 360

Number of services: 6700

OS: Debian 6.0

Plugin Version: 1.4.x

Other Addons: SNMPTT, NagTrap, check_mk, PNP-0.6.x. Thruk

19

Thursday, June 28th 2012, 11:16am

@sigi

Wir versuchen wirklich zu helfen und uns erst mal ein Bild von deinem System zu machen.
Wir können nun mal nicht wissen mit wem wir es zu tun haben. Keine Ahnung welche Skills du hast.

Durchaus möglich das die Fragen für dich sinnlos klingen, uns helfen jedoch die Antworten um uns ein Bild von deinem System zu machen.
Wir sitzen nun mal nicht vor deinem Server.

Und Michael möchte das Icinga Debug log sehen, nicht das nsclient++ debug log

Auch hier der Hinweis, das Michael als Icinga Core Developer ein gewisses Interesse daran hat einen möglichen Fehler zu analysieren.

Jörg
+++ PNP Developer +++ PNP 0.6.21 ist online ! +++
Hilfreiche Infos gefunden? Dann schnell ein paar Cent flattrn
OMD - Open Monitoring Distribution

Sigi

Beginner

Posts: 19

Gender: male

Location: Erlangen

Number of monitoring servers: 1

Nagios Version: 3.2.3

Icinga Version: 1.6.1

Distributed monitoring: Nein

Redundant monitoring: Nein

Number of hosts: 170

Number of services: 550

OS: Ubuntu 11.10

Plugin Version: Icinga 1.6

20

Thursday, June 28th 2012, 12:29pm

@pitchfork:
ich bin ja auch froh dass ich hier Ansprechpartner finde die deutlich mehr Ahnung von der Sache haben als ich. Und auf dem Weg lerne ich ja auch wieder einiges dazu. Also nochmal: nix für ungut!

und das Icinga-Log habe ich jetzt auch durchgeforstet und die Einträge betreffend den Problemhost rausgesucht (falls der Abschnitt nicht ausreicht einfach bescheid geben):

Source code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
[1340873446.205062] [128.1] [pid=1976] Command Arguments: pxx.division.company.com;0;OK: Lets pretend everything is going to be ok.|
[1340873446.205069] [064.1] [pid=1976] Making callbacks (type 24)...
[1340873446.205076] [001.0] [pid=1976] process_external_command1()
[1340873446.205187] [064.1] [pid=1976] Making callbacks (type 9)...
[1340873446.205200] [064.1] [pid=1976] Making callbacks (type 24)...
[1340873446.205215] [001.0] [pid=1976] process_external_command2()
[1340873446.205219] [128.1] [pid=1976] External Command Type: 30
[1340873446.205221] [128.1] [pid=1976] Command Entry Time: 1340873423
[1340873446.205223] [128.1] [pid=1976] Command Arguments: pxx.division.company.com;my_cpu_check;0;OK CPU Load ok.|'20m'=1%;80;90 '10s'=0%;80;90 '4'=1%;80;90
[1340873446.205232] [064.1] [pid=1976] Making callbacks (type 24)...
[1340873446.205241] [001.0] [pid=1976] process_external_command1()
[1340873446.205349] [064.1] [pid=1976] Making callbacks (type 9)...
[1340873446.205361] [064.1] [pid=1976] Making callbacks (type 24)...
[1340873446.205377] [001.0] [pid=1976] process_external_command2()
[1340873446.205380] [128.1] [pid=1976] External Command Type: 30
[1340873446.205382] [128.1] [pid=1976] Command Entry Time: 1340873423
[1340873446.205384] [128.1] [pid=1976] Command Arguments: pxx.division.company.com;my_mem_check;0;OK: page file: 4.71G|'page file %'=9%;80;90 'page file'=4.71GB;38.4;43.2;0;48
[1340873446.205392] [064.1] [pid=1976] Making callbacks (type 24)...
[1340873446.205400] [001.0] [pid=1976] process_external_command1()
[1340873446.205508] [064.1] [pid=1976] Making callbacks (type 9)...
[1340873446.205521] [064.1] [pid=1976] Making callbacks (type 24)...
[1340873446.205538] [001.0] [pid=1976] process_external_command2()
[1340873446.205542] [128.1] [pid=1976] External Command Type: 30
[1340873446.205544] [128.1] [pid=1976] Command Entry Time: 1340873423
[1340873446.205546] [128.1] [pid=1976] Command Arguments: pxx.division.company.com;my_svc_check;0;OK: All services are in their appropriate state.|
[1340873446.205559] [064.1] [pid=1976] Making callbacks (type 24)...
[1340873446.205568] [001.0] [pid=1976] process_external_command1()
[1340873446.205648] [064.1] [pid=1976] Making callbacks (type 9)...
[1340873446.205660] [064.1] [pid=1976] Making callbacks (type 24)...
[1340873446.205674] [001.0] [pid=1976] process_external_command2()
[1340873446.205677] [128.1] [pid=1976] External Command Type: 30
[1340873446.205679] [128.1] [pid=1976] Command Entry Time: 1340873440
[1340873446.470941] [016.1] [pid=1976] Handling check result for service 'checkdisk_c' on host 'pxx.division.company.com'...
[1340873446.470944] [001.0] [pid=1976] handle_async_service_check_result()
[1340873446.470947] [016.0] [pid=1976] ** Handling check result for service 'checkdisk_c' on host 'pxx.division.company.com'...
[1340873446.470949] [016.1] [pid=1976] HOST: pxx.division.company.com, SERVICE: checkdisk_c, CHECK TYPE: Passive, OPTIONS: 0, SCHEDULED: No, RESCHEDULE: No, EXITED OK: Yes, RETURN CODE: 0, OUTPUT: OK: c:\: 34.9G|'c:\ %'=32%;20;10 'c:\'=34.87GB;10.2;5.1;0;50.98
[1340873446.471046] [064.1] [pid=1976] Making callbacks (type 9)...
[1340873446.471057] [016.1] [pid=1976] Service is OK.
[1340873446.471060] [016.1] [pid=1976] Host is NOT UP, so we'll check it to see if it recovered...
[1340873446.471063] [001.0] [pid=1976] run_async_host_check_3x()
[1340873446.471065] [016.0] [pid=1976] ** Running async check of host 'pxx.division.company.com'...
[1340873446.471068] [001.0] [pid=1976] check_host_check_viability_3x()
[1340873446.471080] [001.0] [pid=1976] check_time_against_period()
[1340873446.471094] [001.0] [pid=1976] check_host_dependencies()
[1340873446.471099] [064.1] [pid=1976] Making callbacks (type 14)...
[1340873446.471102] [016.0] [pid=1976] Checking host 'pxx.division.company.com'...
[1340873446.471105] [001.0] [pid=1976] adjust_host_check_attempt_3x()
[1340873446.471111] [001.0] [pid=1976] get_raw_command_line_r()
[1340873446.471116] [001.0] [pid=1976] process_macros_r()
[1340873446.471121] [2048.1] [pid=1976] **** BEGIN MACRO PROCESSING ***********
[1340873446.471124] [2048.1] [pid=1976] Processing: '2'
[1340873446.471127] [2048.1] [pid=1976]   Done.  Final output: '2'
[1340873446.471130] [2048.1] [pid=1976] **** END MACRO PROCESSING *************
[1340873446.471132] [001.0] [pid=1976] process_macros_r()
[1340873446.471134] [2048.1] [pid=1976] **** BEGIN MACRO PROCESSING ***********
[1340873446.471137] [2048.1] [pid=1976] Processing: '"No status received within last 30 minutes"'
[1340873446.471140] [2048.1] [pid=1976]   Done.  Final output: '"No status received within last 30 minutes"'
[1340873446.471143] [2048.1] [pid=1976] **** END MACRO PROCESSING *************
[1340873446.471145] [001.0] [pid=1976] process_macros_r()
[1340873446.471147] [2048.1] [pid=1976] **** BEGIN MACRO PROCESSING ***********
[1340873446.471149] [2048.1] [pid=1976] Processing: '$USER1$/check_dummy $ARG1$ $ARG2$'
[1340873446.471161] [2048.1] [pid=1976]   Done.  Final output: '/usr/local/icinga/libexec/check_dummy 2 "No status received within last 30 minutes"'
[1340873446.471163] [2048.1] [pid=1976] **** END MACRO PROCESSING *************
[1340873446.471200] [016.1] [pid=1976] Check result output will be written to '/tmp/checkvVZL5r' (fd=8)
[1340873446.471222] [064.1] [pid=1976] Making callbacks (type 14)...
[1340873446.472193] [001.0] [pid=31755] process_macros_r()
[1340873446.472693] [001.0] [pid=31755] process_macros_r()
[1340873446.472725] [001.0] [pid=31755] process_macros_r()
[1340873446.473573] [001.0] [pid=31755] process_macros_r()
[1340873446.473601] [001.0] [pid=31755] process_macros_r()
[1340873446.473624] [001.0] [pid=31755] process_macros_r()
[1340873446.474625] [016.0] [pid=31756] running command /usr/local/icinga/libexec/check_dummy 2 "No status received within last 30 minutes" via popen
[1340873446.477769] [016.1] [pid=1976] Service did not change state.
[1340873446.477802] [064.1] [pid=1976] Making callbacks (type 13)...
[1340873446.477850] [064.1] [pid=1976] Making callbacks (type 20)...
[1340873446.477871] [001.0] [pid=1976] check_for_service_flapping()
[1340873446.477874] [016.1] [pid=1976] Checking service 'checkdisk_c' on host 'pxx.division.company.com' for flapping...
[1340873446.477877] [016.1] [pid=1976] Service is not flapping (0.00% state change).
[1340873446.477880] [001.0] [pid=1976] check_for_host_flapping()
[1340873446.477882] [016.1] [pid=1976] Checking host 'pxx.division.company.com' for flapping...
[1340873446.477886] [016.1] [pid=1976] Host is flapping (45.72% state change).
[1340873446.477896] [001.0] [pid=1976] run_service_performance_data_command()
[1340873446.477902] [001.0] [pid=1976] get_raw_command_line_r()
[1340873446.477905] [001.0] [pid=1976] process_macros_r()
[1340873446.477908] [2048.1] [pid=1976] **** BEGIN MACRO PROCESSING ***********
[1340873446.477910] [2048.1] [pid=1976] Processing: '/usr/bin/perl /usr/local/pnp4nagios/libexec/process_perfdata.pl'
[1340873446.477916] [2048.1] [pid=1976]   Done.  Final output: '/usr/bin/perl /usr/local/pnp4nagios/libexec/process_perfdata.pl'
[1340873446.477918] [2048.1] [pid=1976] **** END MACRO PROCESSING *************
[1340873446.477921] [001.0] [pid=1976] my_system_r()
[1340873446.477923] [256.1] [pid=1976] Running command '/usr/bin/perl /usr/local/pnp4nagios/libexec/process_perfdata.pl'...
[1340873446.477932] [064.1] [pid=1976] Making callbacks (type 10)...
[1340873446.479491] [001.0] [pid=31759] process_macros_r()
[1340873446.479539] [001.0] [pid=31759] process_macros_r()
[1340873446.479565] [001.0] [pid=31759] process_macros_r()
[1340873446.479600] [001.0] [pid=31759] process_macros_r()
[1340873446.479608] [2048.1] [pid=31759] **** BEGIN MACRO PROCESSING ***********
[1340873446.479614] [2048.1] [pid=31759] Processing: '/pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$'
[1340873446.479643] [2048.1] [pid=31759]   Done.  Final output: '/pnp4nagios/graph?host=pxx.division.company.com&srv=checkdisk_c'
[1340873446.479652] [2048.1] [pid=31759] **** END MACRO PROCESSING *************
[1340873446.479669] [001.0] [pid=31759] process_macros_r()
[1340873446.479691] [001.0] [pid=31759] process_macros_r()
[1340873446.480433] [001.0] [pid=31759] process_macros_r()
[1340873446.480453] [001.0] [pid=31759] process_macros_r()
[1340873446.480473] [001.0] [pid=31759] process_macros_r()
[1340873446.529859] [256.1] [pid=1976] Execution time=0.051 sec, early timeout=0, result=0, output=(null)
[1340873446.529915] [064.1] [pid=1976] Making callbacks (type 10)...
[1340873446.529994] [001.0] [pid=1976] update_service_performance_data_file()
[1340873446.530013] [016.1] [pid=1976] Deleted check result file '/usr/local/icinga/var/spool/checkresults/ciHaI6j'
[1340873446.530034] [016.1] [pid=1976] Handling check result for service 'checkdisk_d' on host 'pxx.division.company.com'...
[1340873446.530042] [001.0] [pid=1976] handle_async_service_check_result()
[1340873446.530049] [016.0] [pid=1976] ** Handling check result for service 'checkdisk_d' on host 'pxx.division.company.com'...