heartbeat is stopped for some reason -

heartbeat is stopped for some reason

heartbeat is stopped for some reason

Anyway hnode2 was active and the services are running fine but I see heartbeat has been stopped somehow.

Here is the last log I see of heartbeat:

[quote:23c84415f5]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 9/1762471 ms age 0 [pid16738/MST_CONTROL]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 716/51784021 152624/74519 [pid16738/MST_CONTROL]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 200276 total malloc bytes. pid [16738/MST_CONTROL]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/14 ms age 405180540 [pid16741/HBFIFO]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 321/581 30772/13815 [pid16741/HBFIFO]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 32600 total malloc bytes. pid [16741/HBFIFO]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373810 [pid16742/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 340/657021 33264/15511 [pid16742/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 42008 total malloc bytes. pid [16742/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373820 [pid16743/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 340/394 25136/11458 [pid16743/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 25220 total malloc bytes. pid [16743/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373820 [pid16744/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 352/657052 34784/16543 [pid16744/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 43528 total malloc bytes. pid [16744/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373820 [pid16745/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 353/1244439 34868/16587 [pid16745/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 35812 total malloc bytes. pid [16745/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373820 [pid16746/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 364/657082 36304/17575 [pid16746/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 44840 total malloc bytes. pid [16746/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373830 [pid16747/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 364/454 28176/13522 [pid16747/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 36472 total malloc bytes. pid [16747/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373850 [pid16748/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 376/657112 37824/18607 [pid16748/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 46360 total malloc bytes. pid [16748/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/0 ms age 603373850 [pid16749/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 376/484 29696/14554 [pid16749/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 37992 total malloc bytes. pid [16749/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/1140417 ms age 40 [pid16750/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 388/30411871 39344/19639 [pid16750/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 51588 total malloc bytes. pid [16750/HBWRITE]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: MSG stats: 0/518348 ms age 30 [pid16751/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: cl_malloc stats: 389/10885817 39428/19683 [pid16751/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: RealMalloc stats: 40928 total malloc bytes. pid [16751/HBREAD]
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: Current arena value: 0
Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: These are nothing to worry about.
[/quote:23c84415f5]


Now when I start it again I get these messages (I've got these before). What alternative do I have? The services were running on hnode2 I can't just unmount and stop OpenVZ in a production environment.

I also never stopped heartbeat, it was obviously fine until early this evening 5:12 PM and between now 11:48PM
[quote:23c84415f5]Starting High-Availability services:
2008/09/09_23:45:21 CRITICAL: Resource drbddisk::r0 is active, and should not be!
2008/09/09_23:45:21 CRITICAL: Non-idle resources can affect data integrity!
2008/09/09_23:45:21 info: If you don't know what this means, then get help!
2008/09/09_23:45:21 info: Read the docs and/or source to /usr/share/heartbeat/ResourceManager for more details.
CRITICAL: Resource drbddisk::r0 is active, and should not be!
CRITICAL: Non-idle resources can affect data integrity!
info: If you don't know what this means, then get help!
info: Read the docs and/or the source to /usr/share/heartbeat/ResourceManager for more details.
2008/09/09_23:45:21 CRITICAL: Non-idle resources will affect resource takeback!
2008/09/09_23:45:21 CRITICAL: Non-idle resources may affect data integrity!
[ OK ]
[/quote:23c84415f5]


 

hnode1 complains after restarting heartbeat:

[quote:d10093f29f]Sep 9 23:44:50 hnode1 heartbeat: [31055]: info: Heartbeat restart on node hnode2.ca
Sep 9 23:44:50 hnode1 heartbeat: [31055]: info: Status update for node hnode2.ca: status init
Sep 9 23:44:50 hnode1 heartbeat: [31055]: info: Status update for node hnode2.ca: status up
Sep 9 23:44:51 hnode1 heartbeat: [31055]: info: Status update for node hnode2.ca: status active
Sep 9 23:44:51 hnode1 heartbeat: [31055]: ERROR: should_drop_message: attempted replay attack [hnode2.ca]? [gen = 1220406280, curgen = 1220406281]
Sep 9 23:44:51 hnode1 heartbeat: [31055]: info: remote resource transition completed.
Sep 9 23:44:51 hnode1 heartbeat: [31055]: ERROR: No one owns our local resources!
Sep 9 23:44:51 hnode1 heartbeat: [31055]: ERROR: No one owns our local resources!
Sep 9 23:44:51 hnode1 heartbeat: [31055]: ERROR: should_drop_message: attempted replay attack [hnode2.ca]? [gen = 1220406280, curgen = 1220406281]
Sep 9 23:44:55 hnode1 heartbeat: [31055]: ERROR: should_drop_message: attempted replay attack [hnode2.ca]? [gen = 1220406280, curgen = 1220406281][/quote:d10093f29f]


 

hnode2 shortly after 5:17PM I installed and ran tiobench, I wonder if that did it?

Sep 9 17:15:32 hnode2 heartbeat: [16738]: info: These are nothing to worry about.
Sep 9 17:18:05 hnode2 python: gethostby*.getanswer: asked for "apt.sw.be IN AAAA", got type "SOA"
[b:04bcff7251]Sep 9 20:18:17 hnode2 yum: Installed: tiobench - 0.3.3-1.2.el5.rf.i386
[/b:04bcff7251]


 

CRITICAL: Resource drbddisk::r0 is active, and should not be!
CRITICAL: Non-idle resources can affect data integrity!
info: If you don't know what this means, then get help!
info: Read the docs and/or the source to /usr/share/heartbeat/ResourceManager for more details.
2008/09/09_23:45:21 CRITICAL: Non-idle resources will affect resource takeback!
2008/09/09_23:45:21 CRITICAL: Non-idle resources may affect data integrity!

[quote:a0972a9e65] # What this means is that if you have a shared disk and it's already mounted
# before you start heartbeat, then you could have it mounted simultaneously
# on both sides. If this happens then your disk data is toast!
# So, this is sometimes VERY BAD INDEED!
#
[/quote:a0972a9e65]


 

I ran tiobench again and heartbeat never died


 

Worse of all I checked the logs on hnode1 and it never seemed to realize hnode2 heartbeat was down.



  • PHP Migration from 5.3 to 5.4+ and dealing with deprecated functions
  • ffmpeg vidstab to stabilize video
  • userdel user userdel: cannot lock /etc/passwd; try again later.
  • mdadm how to mount inactive array
  • How to find and mount mdadm arrays automatically
  • M2Crypto.SSL.Checker.WrongHost: Peer certificate subjectAltName does not match host, expected fedora-archive.ip-connect.vn.ua, got DNS:mirror.ip-connect.vn.ua
  • [Wed Sep 20 15:34:44 2017] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Wed Sep 20 15:34:44 2017] [error] Init: Unable to read server certificate from file /www/ssl-certs/server.crt [Wed Sep 20 15:34:44 2017] [error] SSL Library Err
  • linux how to answer yes to copy
  • linux cp and mv will not overwrite due to alias!
  • ERROR 2006 (HY000) at line 567: MySQL server has gone away
  • vbulletin 4.2.5 after upgrading from 3.6 white screen fatal php errors
  • iptables v1.4.7: can't initialize iptables table `NAT': Table does not exist (do you need to insmod?) Perhaps iptables or your kernel needs to be upgraded.
  • Linux and FreeBSD how to set time and date
  • FreeBSD/OpenBSD OpenVPN Client error "Cannot allocate TUN/TAP dev dynamically"
  • kdenlive - No LADSPA plugins were found! Check your LADSPA_PATH environment variable. [producer_xml] failed to load transition "qtblend"
  • /usr/bin/supermin-helper exited with error status 1. To see full error messages you may need to enable debugging. See http://libguestfs.org/guestfs-faq.1.html#debugging-libguestfs at /usr/bin/virt-list-partitions line 177.
  • Linux Unable to mount cifs/smb share in /etc/fstab
  • MySQL: table is marked as crashed solultion
  • bash Linux how to get first or last letters of a word
  • l2tp ipsec VPN Error Sep 12 18:16:25 vps pluto[7299]: ERROR: asynchronous network error report on eth0 (sport=500) for message to 192.5.6.2 port 20640, complainant 192.5.6.2: Connection refused [errno 111, origin ICMP type 3 code 3 (not authenticated