Server seems to restart several times a day

Tony Shelver6 months ago

Getting the below many times a day in the server log since the upgrade to 5.9. Can't find any evidence of this in the logs on the previous 5.6 installation. Any ideas?

2023-11-14 06:43:03  INFO: Operating system name: Linux version: 5.15.0-88-generic architecture: amd64
2023-11-14 06:43:03  INFO: Java runtime name: OpenJDK 64-Bit Server VM vendor: Eclipse Adoptium version: 17.0.6+10
2023-11-14 06:43:03  INFO: Memory limit heap: 3940mb non-heap: 0mb
2023-11-14 06:43:03  INFO: Character encoding: UTF-8 charset: UTF-8
2023-11-14 06:43:03  INFO: Version: 5.9
2023-11-14 06:43:03  INFO: Starting server...
2023-11-14 06:43:03  INFO: HikariPool-1 - Starting...
2023-11-14 06:43:04  INFO: HikariPool-1 - Added connection org.postgresql.jdbc.PgConnection@77bb0ab5
2023-11-14 06:43:04  INFO: HikariPool-1 - Start completed.
2023-11-14 06:43:04  INFO: Set default schema name to public
2023-11-14 06:43:04  INFO: Clearing database change log checksums
2023-11-14 06:43:04  INFO: Successfully acquired change log lock
2023-11-14 06:43:04  INFO: Successfully released change log lock
2023-11-14 06:43:06  INFO: Reading from public.databasechangelog
2023-11-14 06:43:06  INFO: Reading from public.databasechangelog
2023-11-14 06:43:07  INFO: Reading from public.databasechangelog
2023-11-14 06:43:07  INFO: UPDATE SUMMARY
2023-11-14 06:43:07  INFO: Run:                          0
2023-11-14 06:43:07  INFO: Previously run:              44
2023-11-14 06:43:07  INFO: Filtered out:                 0
2023-11-14 06:43:07  INFO: -------------------------------
2023-11-14 06:43:07  INFO: Total change sets:           44

2023-11-14 06:43:07  INFO: Update summary generated
2023-11-14 06:43:07  INFO: Successfully released change log lock
2023-11-14 06:43:07  INFO: Command execution complete
2023-11-14 06:43:07  INFO: jetty-11.0.15; built: 2023-04-11T18:37:53.775Z; git: 5bc5e562c8d05c5862505aebe5cf83a61bdbcb96; jvm 17.0.6+10
2023-11-14 06:43:07  INFO: Started o.t.w.@728f868{/,null,AVAILABLE}
2023-11-14 06:43:08  INFO: Session workerName=node0
2023-11-14 06:43:08  WARN: JAXBContext implementation could not be found. WADL feature is disabled
Anton Tananaev6 months ago

I don't see anything suspicious in this log. Have you checked systemd logs?

Tony Shelver6 months ago

Getting this in systemd (just a 'small' snippet') only started after the upgrade:

Nov 01 07:20:31 surerity systemd[1]: traccar.service: Watchdog timeout (limit 10min)!
Nov 01 07:20:31 surerity systemd[1]: traccar.service: Killing process 397925 (java) with signal SIGABRT.
Nov 01 07:20:31 surerity systemd[1]: traccar.service: Main process exited, code=dumped, status=6/ABRT
Nov 01 07:20:31 surerity systemd[1]: traccar.service: Failed with result 'watchdog'.
Nov 01 07:20:41 surerity systemd[1]: traccar.service: Scheduled restart job, restart counter is at 217.
Nov 01 07:20:41 surerity systemd[1]: Stopped traccar.
Nov 01 07:20:41 surerity systemd[1]: Started traccar.
Nov 01 07:20:48 surerity traccar[398705]: Database is up to date, no changesets to execute
Nov 01 07:20:48 surerity traccar[398705]: UPDATE SUMMARY
Nov 01 07:20:48 surerity traccar[398705]: Run:                          0
Nov 01 07:20:48 surerity traccar[398705]: Previously run:              44
Nov 01 07:20:48 surerity traccar[398705]: Filtered out:                 0
Nov 01 07:20:48 surerity traccar[398705]: -------------------------------
Nov 01 07:20:48 surerity traccar[398705]: Total change sets:           44
Nov 01 07:38:51 surerity systemd[1]: traccar.service: Watchdog timeout (limit 10min)!
Nov 01 07:38:51 surerity systemd[1]: traccar.service: Killing process 398705 (java) with signal SIGABRT.
Nov 01 07:38:52 surerity systemd[1]: traccar.service: Main process exited, code=dumped, status=6/ABRT
Nov 01 07:38:52 surerity systemd[1]: traccar.service: Failed with result 'watchdog'.
Nov 01 07:39:02 surerity systemd[1]: traccar.service: Scheduled restart job, restart counter is at 218.
Nov 01 07:39:02 surerity systemd[1]: Stopped traccar.
Nov 01 07:39:02 surerity systemd[1]: Started traccar.
Nov 01 07:39:08 surerity traccar[399492]: Database is up to date, no changesets to execute
Nov 01 07:39:08 surerity traccar[399492]: UPDATE SUMMARY
Nov 01 07:39:08 surerity traccar[399492]: Run:                          0
Nov 01 07:39:08 surerity traccar[399492]: Previously run:              44
Nov 01 07:39:08 surerity traccar[399492]: Filtered out:                 0
Nov 01 07:39:08 surerity traccar[399492]: -------------------------------
Nov 01 07:39:08 surerity traccar[399492]: Total change sets:           44
Nov 01 07:57:12 surerity systemd[1]: traccar.service: Watchdog timeout (limit 10min)!
Nov 01 07:57:12 surerity systemd[1]: traccar.service: Killing process 399492 (java) with signal SIGABRT.
Nov 01 07:57:12 surerity systemd[1]: traccar.service: Main process exited, code=dumped, status=6/ABRT
Nov 01 07:57:12 surerity systemd[1]: traccar.service: Failed with result 'watchdog'.
Nov 01 07:57:22 surerity systemd[1]: traccar.service: Scheduled restart job, restart counter is at 219.
Nov 01 07:57:22 surerity systemd[1]: Stopped traccar.
Nov 01 07:57:22 surerity systemd[1]: Started traccar.
Nov 01 07:57:28 surerity traccar[400249]: Database is up to date, no changesets to execute
Nov 01 07:57:28 surerity traccar[400249]: UPDATE SUMMARY
Nov 01 07:57:28 surerity traccar[400249]: Run:                          0
Nov 01 07:57:28 surerity traccar[400249]: Previously run:              44
Nov 01 07:57:28 surerity traccar[400249]: Filtered out:                 0
Nov 01 07:57:28 surerity traccar[400249]: -------------------------------
Anton Tananaev6 months ago

This is the reason:

Watchdog timeout (limit 10min)!

You can try disabling the watchdog.

Tony Shelver6 months ago

Per another (old) post that had the same issue, my systemd traccar.service contains:

[Unit]
Description=traccar
After=network.target

[Service]
Type=simple
WorkingDirectory=/opt/traccar
ExecStart=/opt/traccar/jre/bin/java -jar tracker-server.jar conf/traccar.xml
SyslogIdentifier=traccar
SuccessExitStatus=143
WatchdogSec=600
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target

Note watchdogsec is already at 600, 10 minutes, which seems to be the limit it's running into (10 minutes). Has this changed under 5.9 compared to earlier?

Note this is a fairly powerful machine with 32gb data on an SSD and with 6 CPUs.

Any suggestions?

Anton Tananaev6 months ago

I already gave a suggestion. Disable the watchdog.

Tony Shelver6 months ago

Thanks Anton. Made a change and will monitor