Traccar Docker on Kubernetes (AKS) – Repeated Liquibase lock & pod restarts after production rollout (previously stable setup)

Yash Ahirrao 2 months ago

Hello Traccar Team,
We are running Traccar using the official Docker image (traccar/traccar:latest) on Kubernetes (AKS) and are looking for guidance on a production issue that appeared suddenly, even though the same setup was running smoothly earlier.
Environment
Traccar version: latest Docker image
Deployment platform: Kubernetes (AKS)
Database: PostgreSQL (Azure Database for PostgreSQL)
TimescaleDB extension: enabled
Traccar replicas: 1
Database type: External DB (not H2)
Current Kubernetes Setup
Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: traccar
  namespace: cargopro
spec:
  replicas: 1
  selector:
    matchLabels:
      app: traccar
  template:
    metadata:
      labels:
        app: traccar
    spec:
      containers:
        - name: traccar
          image: traccar/traccar:latest
          ports:
            - containerPort: 8082
            - containerPort: 5055
          envFrom:
            - secretRef:
                name: traccar-secret
          volumeMounts:
            - name: traccar-config-volume
              mountPath: /opt/traccar/conf/traccar.xml
              subPath: traccar.xml
      volumes:
        - name: traccar-config-volume
          secret:
            secretName: traccar-secret

Secret (includes traccar.xml)

apiVersion: v1
kind: Secret
metadata:
  name: traccar-secret
  namespace: cargopro
type: Opaque
stringData:
  TRACCAR_URL: http://traccar:8082
  TRACCAR_USER: info@cargopro.ai
  TRACCAR_PASSWORD: ********

  traccar.xml: |
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE properties SYSTEM 'http://java.sun.com/dtd/properties.dtd'>
    <properties>
      <entry key="database.driver">org.postgresql.Driver</entry>
      <entry key="database.url">
        jdbc:postgresql://cargopro-db.postgres.database.azure.com:5432/traccar_cargopro
      </entry>
      <entry key="database.user">cargopro_db_admin</entry>
      <entry key="database.password">********</entry>
      <entry key="database.timescale">false</entry>
    </properties>
Service
apiVersion: v1
kind: Service
metadata:
  name: traccar
  namespace: cargopro
spec:
  type: ClusterIP
  ports:
    - name: web
      port: 8082
      targetPort: 8082
    - name: osmand
      port: 5055
      targetPort: 5055
  selector:
    app: traccar

Problem Description
This exact setup was running smoothly earlier, connected to the same PostgreSQL database.
However, during a recent production rollout, the Traccar pod suddenly started restarting repeatedly with the following logs:

Waiting for changelog lock....
Waiting for changelog lock....
ERROR: LockException
Could not acquire change log lock.
Currently locked by <pod-name> (<pod-ip>)

The pod enters CrashLoopBackOff.

Anton Tananaev 2 months ago

Database lock means that you restarted the service before the database migration finished. There are many threads about it.

Yash Ahirrao 2 months ago

Hi Anton, thanks for the response.
Just to clarify, this was not an intentional or manual restart during migration.
This was a normal production rollout using the same Kubernetes Deployment, same PostgreSQL database, and same configuration that had been running stably before.
We did not change replicas (still 1), did not scale up/down, and did not manually restart the pod during startup.
What surprised us is that:
The same database had already been initialized and working earlier
The same deployment manifest had been used previously without issues
The issue appeared suddenly on a new rollout, even though no DB reset or parallel instance was introduced
From Kubernetes’ perspective, this rollout only recreated the pod once (standard rolling replace), but it seems the Liquibase lock persisted and caused subsequent restarts to fail.
We wanted to understand:
Whether Kubernetes pod recreation (even with replicas=1) can still trigger this lock scenario
If there are best practices for Traccar on Kubernetes to avoid Liquibase lock persistence (e.g., startup probes, init containers, or migration disabling after first run)
And what the recommended recovery is in production without risking data integrity
Appreciate any guidance on how to make this setup more resilient for Kubernetes-based deployments.
Thanks again.

Anton Tananaev 2 months ago

Any restart could potentially cause this, but most common case is the actual Traccar version upgrade.

Yash Ahirrao 2 months ago

Hi Anton, thanks for the clarification.
Understood that this was likely caused by a Traccar version upgrade via the latest image and a restart during database migration.
At this point, we would like to reset and start clean in a safe way. Could you please advise the recommended process to start fresh in this situation for a Kubernetes-based deployment?
Specifically, we want to know the correct steps to:
Cleanly recover from the Liquibase lock state
Ensure database migrations complete successfully
We want to follow the Traccar-recommended approach rather than applying ad-hoc fixes.
Thanks for your guidance.
Best regards,
Yash

Anton Tananaev 2 months ago

Have you found other threads about this issue?

Yash Ahirrao 2 months ago

Yes I did not got any relevant solution

gertenbol 7 days ago

We had the exact same issue deploying Traccar 6.11.1 on Kubernetes with managed PostgreSQL.

First off: the database.timescale config key doesn't actually exist in Traccar's source code. Check Keys.java in the repo, it's not there. The TimescaleDB changesets use Liquibase preconditions that check pg_available_extensions at the database level, so that entry in your XML isn't doing anything.

The real problem is Kubernetes killing the pod before migrations finish. Traccar runs Liquibase synchronously at boot, before the web server starts. Without a startup probe, Kubernetes may kill the pod too early. The lock stays in the database, next pod can't acquire it, CrashLoopBackOff.

What fixed it for us

Startup probe. Add one on /api/health port 8082. This endpoint only becomes available after migrations complete and Jetty starts, so it's a natural fit. We use periodSeconds: 10 and failureThreshold: 30 giving the pod up to 5 minutes for migrations.

Recovering a stuck lock. Scale the deployment to 0 replicas, then run this against your database:

UPDATE databasechangeloglock SET locked=false, lockgranted=null, lockedby=null WHERE id=1;

Then scale back to 1.

Extra gotcha with managed PostgreSQL

Some providers list timescaledb in pg_available_extensions even though it can't actually be loaded (it needs shared_preload_libraries which you can't configure on managed instances). The Liquibase precondition passes, CREATE EXTENSION fails fatally, and the lock is stuck forever.

If that's your case, you need to pre-insert the timescale changeset IDs as MARK_RAN in the databasechangelog table before Traccar starts. Important: the filename column must match the logicalFilePath from the changelog XML, so use changelog-6.8.0 and changelog-6.11.0, not the full file paths like schema/changelog-6.8.0.xml.