heartbeat is stopped for some reason -

heartbeat is stopped for some reason

heartbeat is stopped for some reason

Anyway hnode2 was active and the services are running fine but I see heartbeat has been stopped somehow.

Here is the last log I see of heartbeat:

[quote:23c84415f5]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 9/1762471 ms age 0 [pid16738/MST_CONTROL]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 716/51784021 152624/74519 [pid16738/MST_CONTROL]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 200276 total malloc bytes. pid [16738/MST_CONTROL]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/14 ms age 405180540 [pid16741/HBFIFO]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 321/581 30772/13815 [pid16741/HBFIFO]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 32600 total malloc bytes. pid [16741/HBFIFO]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373810 [pid16742/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 340/657021 33264/15511 [pid16742/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 42008 total malloc bytes. pid [16742/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373820 [pid16743/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 340/394 25136/11458 [pid16743/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 25220 total malloc bytes. pid [16743/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373820 [pid16744/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 352/657052 34784/16543 [pid16744/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 43528 total malloc bytes. pid [16744/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373820 [pid16745/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 353/1244439 34868/16587 [pid16745/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 35812 total malloc bytes. pid [16745/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373820 [pid16746/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 364/657082 36304/17575 [pid16746/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 44840 total malloc bytes. pid [16746/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373830 [pid16747/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 364/454 28176/13522 [pid16747/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 36472 total malloc bytes. pid [16747/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373850 [pid16748/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 376/657112 37824/18607 [pid16748/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 46360 total malloc bytes. pid [16748/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373850 [pid16749/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 376/484 29696/14554 [pid16749/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 37992 total malloc bytes. pid [16749/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/1140417 ms age 40 [pid16750/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 388/30411871 39344/19639 [pid16750/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 51588 total malloc bytes. pid [16750/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/518348 ms age 30 [pid16751/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 389/10885817 39428/19683 [pid16751/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 40928 total malloc bytes. pid [16751/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: These are nothing to worry about.
[/quote:23c84415f5]


Now when I start it again I get these messages (I've got these before). What alternative do I have? The services were running on hnode2 I can't just unmount and stop OpenVZ in a production environment.

I also never stopped heartbeat, it was obviously fine until early this evening 5:12 PM and between now 11:48PM
[quote:23c84415f5]Starting High-Availability services:
2008/09/09_23:45:21 CRITICAL: Resource drbddisk::r0 is active, and should not be!
2008/09/09_23:45:21 CRITICAL: Non-idle resources can affect data integrity!
2008/09/09_23:45:21 info: If you don't know what this means, then get help!
2008/09/09_23:45:21 info: Read the docs and/or source to /usr/share/heartbeat/ResourceManager for more details.
CRITICAL: Resource drbddisk::r0 is active, and should not be!
CRITICAL: Non-idle resources can affect data integrity!
info: If you don't know what this means, then get help!
info: Read the docs and/or the source to /usr/share/heartbeat/ResourceManager for more details.
2008/09/09_23:45:21 CRITICAL: Non-idle resources will affect resource takeback!
2008/09/09_23:45:21 CRITICAL: Non-idle resources may affect data integrity!
[ OK ]
[/quote:23c84415f5]


 

hnode1 complains after restarting heartbeat:

[quote:d10093f29f]Sep 9 23:44:50 hnode1 heartbeat: [31055]: info: Heartbeat restart on node hnode2.ca
Sep 9 23:44:50 hnode1 heartbeat: [31055]: info: Status update for node hnode2.ca: status init
Sep 9 23:44:50 hnode1 heartbeat: [31055]: info: Status update for node hnode2.ca: status up
Sep 9 23:44:51 hnode1 heartbeat: [31055]: info: Status update for node hnode2.ca: status active
Sep 9 23:44:51 hnode1 heartbeat: [31055]: ERROR: should_drop_message: attempted replay attack [hnode2.ca]? [gen = 1220406280, curgen = 1220406281]
Sep 9 23:44:51 hnode1 heartbeat: [31055]: info: remote resource transition completed.
Sep 9 23:44:51 hnode1 heartbeat: [31055]: ERROR: No one owns our local resources!
Sep 9 23:44:51 hnode1 heartbeat: [31055]: ERROR: No one owns our local resources!
Sep 9 23:44:51 hnode1 heartbeat: [31055]: ERROR: should_drop_message: attempted replay attack [hnode2.ca]? [gen = 1220406280, curgen = 1220406281]
Sep 9 23:44:55 hnode1 heartbeat: [31055]: ERROR: should_drop_message: attempted replay attack [hnode2.ca]? [gen = 1220406280, curgen = 1220406281][/quote:d10093f29f]


 

hnode2 shortly after 5:17PM I installed and ran tiobench, I wonder if that did it?

Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: These are nothing to worry about.
Sep 9 17:18:05 hnode2 python: gethostby*.getanswer: asked for "apt.sw.be IN AAAA", got type "SOA"
[b:04bcff7251]Sep 9 20:18:17 hnode2 yum: Installed: tiobench - 0.3.3-1.2.el5.rf.i386
[/b:04bcff7251]


 

CRITICAL: Resource drbddisk::r0 is active, and should not be!
CRITICAL: Non-idle resources can affect data integrity!
info: If you don't know what this means, then get help!
info: Read the docs and/or the source to /usr/share/heartbeat/ResourceManager for more details.
2008/09/09_23:45:21 CRITICAL: Non-idle resources will affect resource takeback!
2008/09/09_23:45:21 CRITICAL: Non-idle resources may affect data integrity!

[quote:a0972a9e65] # What this means is that if you have a shared disk and it's already mounted
# before you start heartbeat, then you could have it mounted simultaneously
# on both sides. If this happens then your disk data is toast!
# So, this is sometimes VERY BAD INDEED!
#
[/quote:a0972a9e65]


 

I ran tiobench again and heartbeat never died


 

Worse of all I checked the logs on hnode1 and it never seemed to realize hnode2 heartbeat was down.



  • How to disable Google Fonts in Wordpress
  • Unable to load dynamic library /usr/lib64/php/modules/php_openssl
  • mysqld in Linux hacked
  • W: GPG error: http://archive.debian.org squeeze Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY AED4B06F473041FA NO_PUBKEY 64481591B98321F9
  • cannot mount kvm ntfs image
  • h264 DVR security camera footage cannot be played
  • dhcpd.conf how to secure so only known and allowed clients will be given dhcpd IP address leases
  • Thunderbird E-mail List Blank White but e-mails still clickable and viewable
  • css responsive images
  • responsive table without changing much code solution
  • yum how to install old obsolete packages
  • PHP Howto Store Value of Included File Output Into Variable
  • PHP Migration from 5.3 to 5.4+ and dealing with deprecated functions
  • ffmpeg vidstab to stabilize video
  • userdel user userdel: cannot lock /etc/passwd; try again later.
  • mdadm how to mount inactive array
  • How to find and mount mdadm arrays automatically
  • M2Crypto.SSL.Checker.WrongHost: Peer certificate subjectAltName does not match host, expected fedora-archive.ip-connect.vn.ua, got DNS:mirror.ip-connect.vn.ua
  • [Wed Sep 20 15:34:44 2017] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Wed Sep 20 15:34:44 2017] [error] Init: Unable to read server certificate from file /www/ssl-certs/server.crt [Wed Sep 20 15:34:44 2017] [error] SSL Library Err
  • linux how to answer yes to copy