GT06 problem

arshvein8 years ago

Hello,

I have used traccar server listener version 3.7 on Ubuntu 14.04 with Java 1.7/1.8, with custom database handling and front-end with no problem, with GT06 device (i think it's GT06N - need confirmation from the factory 1st) - ranging in number of 6000-8000 devices connected constantly, but lately, i have a problem.

1st, the log :

2016-10-04 08:01:10  INFO: [7CC2D9B3] connected
2016-10-04 08:01:38 DEBUG: [7CC2D9B3: 5023 < 223.255.229.76] HEX: 78780d010351608080127173002423f10d0a78780d010351608080127173002532780d0a78780d010351608080127173002600e30d0a
2016-10-04 08:01:38 DEBUG: [7CC2D9B3: 5023 > 223.255.229.76] HEX: 787805010024af730d0a
2016-10-04 08:01:38 DEBUG: [7CC2D9B3: 5023 > 223.255.229.76] HEX: 787805010025befa0d0a
2016-10-04 08:01:38 DEBUG: [7CC2D9B3: 5023 > 223.255.229.76] HEX: 7878050100268c610d0a
2016-10-04 08:02:38 DEBUG: [7CC2D9B3: 5023 < 223.255.229.76] HEX: 7878058a0027b8a20d0a78780a134406040002002801170d0a78780a1344060400020029109e0d0a797900709404414c4d313d37353b414c4d323d44353b414c4d333d35463b535441313d34303b4459443d30313b534f533d2c2c3b43454e5445523d3b46454e43453d46656e63652c4f46462c302c302e3030303030302c302e3030303030302c3330302c494e206f72204f55542c313b002a42bc0d0a
2016-10-04 08:02:38 DEBUG: [7CC2D9B3: 5023 > 223.255.229.76] HEX: 7878058a0027b8a20d0a
2016-10-04 08:02:38 DEBUG: [7CC2D9B3: 5023 > 223.255.229.76] HEX: 78780513002855320d0a
2016-10-04 08:02:38 DEBUG: [7CC2D9B3: 5023 > 223.255.229.76] HEX: 78780513002944bb0d0a
2016-10-04 08:04:41  INFO: [7CC2D9B3] timed out
2016-10-04 08:04:41  INFO: [7CC2D9B3] disconnected
2016-10-04 08:04:44  INFO: [7CC2D9B3] timed out

It has that '8a' protocol (7878058a0027b8a20d0a) that's not listed anywhere on protocol manual or the sourcecode decoder on latest traccar - any idea what that is? I can see this on almost all of my devices.

The next is, that '7979' starting bit - it's not listed anywhere on device protocol manual (unless the manufacturer did some customization - which they haven't confirmed), but it's there on your code (gt06protocoldecoder.java) - for MSG_INFO. but the subtype isn't right - it's 04 (797900709404) - that's not handled atm.

Those two things are happening a lot, and devices aren't responding after we send response data. it should respond with position/location data per 30 seconds, but it isn't - and it timed out quite fast, after 60 seconds - I set the timeout/resetDelay to 60 seconds for this device.

Those things clogged CLOSE_WAIT socket really fast (it raised to 18k close_wait socket in nearly 10 mins), so i am forced to restart traccar every 5-10 minutes to make sure the packet isn't dropped.

Another problem is, sometimes, a device sends login data, server responded, but no position/location data sent back at all, while they are still connected on the same IP. the problem is, sometimes, the device send another new login data with DIFFERENT IP - no wonder it thought we didn't send response data and it didn't send position packet.

Any idea how to handle these problems? Thank you a lot for the response in advance.

-Arshvein-

Anton Tananaev8 years ago

Most likely type 0x8a means command response or something.

Messages with 7979 header are documented in some newer protocol documents.

I think you need to get latest protocol documentation from the device vendor and check if we need to modify anything on the server side. It sounds like device doesn't receive response that it expects.

As for CLOSE_WAIT sockets, you can set a timeout.

arshvein8 years ago

Hello, Anton,

Thanks for swift response. We are waiting for the latest protocol documentation from them to see what's changed. Hope it'll change things, because it's a production system.

I wonder if there's many changes on the protocol decoding, because we used traccar core from one year ago (about version 2.4ish) with no complications.

For timeout, i already set 60-second timeout for 5023 (gt06) protocol, and it worked - sometimes. But, most of the time, i see devices going from established to close_wait and the connection is not terminated until long (>10-15 mins) which clog server's socket if traccar isn't restarted. Any idea why? Thx.

Anton Tananaev8 years ago

Please share the document if you manage to get it.

The problem might be that response format from the server has changed. I can check that as soon as I get up to date documentation.

Timeout should solve the problem with waiting connections. You can also try to increase connection limit on the server.

arshvein8 years ago

OK, i will. And i don't have problem with memory and connection limit as far as i can tell. I have raised and tuned up tcp/ip, file descriptor and conntrack there - and it can get up to about 20000-30000 connection in netstat before packet started to drop. But i'll see to it, it's a bit hard to tune for >5000 concurrent connection, really... Any suggestion for tuning? Thanks a lot.

Anton Tananaev8 years ago

If you have 5000 devices, I would recommend to set OS connection limit to at least 50k.

Usually connection limit + timeout is enough. Not sure what else can be done. I guess we need to fix the protocol issue if it causes devices to re-connect all the time. That would definitely help.

arshvein8 years ago

This is my sysctl.conf - I hope someone can help if im tuning this correctly or not.

net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.ip_local_port_range = 10024    65535
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 1
net.nf_conntrack_max = 196608
net.netfilter.nf_conntrack_tcp_timeout_established=600
net.ipv4.tcp_fin_timeout = 1
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 20
net.ipv4.tcp_no_metrics_save = 1
net.core.netdev_max_backlog = 4096
net.ipv4.tcp_keepalive_time = 600
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 10
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 10
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 10
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 10
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 10
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 10
net.core.somaxconn=1024
net.ipv4.tcp_max_syn_backlog=4096

it should be able to get 40000-50000 connection no problem, as it only limited by the API range. also, traccar always runs with ulimit -n 1000000 , and that should be enough. and weirdly it seems some devices are 'queueing' and have a delayed packet to send to the server. It might be a problem with the gsm changing IP too fast, though, but i'm still searching for the clue.

Any suggestions regarding this? Thanks.

Anton Tananaev8 years ago

Looks good to me. Let me know if you find the cause of your problem.

arshvein8 years ago

Hello,

It seems like the GSM Mobile internet provider here have some... bad infrastructure designs. As in, its IP lifetime is below 2 minutes at best - the device's IP changed as fast as 30 seconds from what I observed. It might disrupt the handshake connection between device and server, as it disrupts persistent connection too, and it seems GT06 series rely on those persistent connection. It might be one of the reason it delayed and clogged in CLOSE_WAIT, and the connection isn't terminated - because the device on the other side of the IP changed, but the connection pipe is still there.

Any suggestion regarding this?

Anton Tananaev8 years ago

Usually IP changes when device reconnects, so it might be the result of the problem, not the cause.

jomart8 years ago

Привет Антон.
Столкнулся с проблемой чтения архива трека трекера gt06:
http://52.34.254.76/1/786320161014_084024.jpg
трек как клетка, что это может быть как исправить?

Anton Tananaev8 years ago

Пожалуйста создайте отдельный топик.

jomart8 years ago

А как создать топик, чо-то у меня с английским плохо? Не могу понять как создать тему?

Anton Tananaev8 years ago

Вот тут внизу страницы:

https://www.traccar.org/forums/forum/devices/

Max4Tech7 years ago

Hello everyone, thank you for all your information that you share here at the forum. Right now I have 5000 assets working on a virtual server with 4 cpus and 15 gb of RAM, this is my configuration of the sysctl.conf file:

# Increase number of incoming connections that can queue up
# before dropping
net.core.somaxconn=1024
# Handle SYN floods and large numbers of valid HTTPS connections
net.ipv4.tcp_max_syn_backlog=4096
# Increase the length of the network device input queue
net.core.netdev_max_backlog = 4096
# Widen the port range used for outgoing connections
net.ipv4.ip_local_port_range = 10000 65000
# Increase Linux autotuning TCP buffer limits
# Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104) for 10GE
# Don't set tcp_mem itself! Let the kernel scale it based on RAM.
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
# Disable TCP slow start on idle connections
net.ipv4.tcp_slow_start_after_idle = 0
# Wait a maximum of 5 * 2 = 10 seconds in the TIME_WAIT state after a FIN, to handle
# any remaining packets in the network.
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 1
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
#no sabemos
net.nf_conntrack_max = 196608
net.ipv4.tcp_no_metrics_save = 1
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 10
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 10
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 10
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 10
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 10
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 10
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 10
# How long to keep ESTABLISHED connections in conntrack table
# Should be higher than tcp_keepalive_time + tcp_keepalive_probes * tcp_keepalive_intvl )
net.netfilter.nf_conntrack_tcp_timeout_established = 300
# Timeout broken connections faster (amount of time to wait for FIN)
net.ipv4.tcp_fin_timeout = 1
# Determines the wait time between isAlive interval probes (reduce from 75 sec to 15)
net.ipv4.tcp_keepalive_intvl = 60
# Determines the number of probes before timing out (reduce from 9 sec to 5 sec)
# Increase the length of the network device input queue
net.core.netdev_max_backlog = 4096
# Disconnect dead TCP connections after 1 minute
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 5

and i want to know if theres is the correct configuration for 10,000 assets or what do i have to change, thanks for your help....