Ideas to improve reverse geocoding

Manu8 years ago

Hello,

I've seen that when reverse geocoding is enabled, you are querying and storing the address returned by the geocoding service for each position. In our firsts tests we've seen that, after a few days using a tracker, more than a 90% address in Position table are repeated. Because of that, we think it would be good to cache the repeated addresses.

I have two suggestions of improvement:

  1. Normalize table Position

To reduce the queries number to external geocoding services (and avoid the 2500 queries/day of Google Geocoding Services) I think that it would be good to create a new table "location", and separate the address related information (actually in Position) into it.

In environtments with hundreds of trackers this could mean a significant reduction of storage space, number of queries to location services, Internet network traffic and even better times inserting position records.

This would behave as a Geolocalization cache.

  1. Async Reverse Geocoding

A second proposal for improvement could be to define an asynchronous process to periodically check "Location" table and make queries to location services. This would allow to fullfill later the NULLs addresses in the Location table.

Probably most Traccar users move in restricted geographical areas, and after a few days or weeks, most addresses should be cached.

What do yo think?

Regards,

Manuel Martínez
Anton Tananaev8 years ago
  1. There is already in-memory cache for addresses. At the moment I don't plan to change database structure to store addresses separately.

  2. Why would there be NULL in addresses?

It seems like the main reason for your suggestions is to reduce number of requests to the reverse geocoder provider. If you are trying to reduce cost, I would recommend to use local Nominatim or Gisgraphy server.

Manu8 years ago

Ok, I agree that installing a local GIS server is a way to reduce network traffic and possibly improve performance externally.

However, according to the release notes for version 3.2, it seems better to try to optimize code for better long-term performance.

I understand that my first proposal (normalize the table position) should not imply very big changes in the code, but for large installations can have a big impact on the number of queries to a geolocation server and perhaps performance.

I'll explain a little more (with some SQL code) the changes I'm thinking in to see if it is feasible to put them in the TODO:

SQL code

-- New table location
create table location ( longitude double not null, latitude double not null, address varchar (255), primary key (longitude, latitude));

-- Load old data into new table
insert ignore into location (longitude, latitude, address) select distinct longitude, latitude, address from position;

-- now drop the address column in Position
alter table position ....

-- Sample trigger to load info into location without modifying traccar code. This is just an example, and it's not portable as requires mysql, but it pretends to show that the changes to implement into Traccar code should be small.

DELIMITER //
CREATE TRIGGER address_to_location
BEFORE INSERT
ON position FOR EACH ROW
BEGIN
-- Insert address into location table
INSERT INTO location (longitude, latitude, address) VALUES ( NEW.longitude, NEW.latitude, NEW.address) ON DUPLICATE KEY UPDATE address=NEW.address;
-- Reset the field address to avoid loading it into positions table
SET NEW.address=’’;
END; //
DELIMITER ;


With the changes above there's only a little benefit as for each position we are still querying to external GIS servers. For me, the important change would be is not to query the external GIS if we've already got the coordenates and the address in our location cache new table.

I think also that the changes in the webapp should be minimum, as far as the sql code is in the traccar.xml file:

<entry key='database.selectLatestPositions'>
    SELECT * FROM position WHERE id IN (SELECT positionId FROM device);
</entry>

could be replaced by:

<entry key='database.selectLatestPositions'>
    SELECT * FROM position p LEFT JOIN location l ON p.longitude=l.longitude and p.latitude=l.latitude WHERE id IN (SELECT positionId FROM device);
</entry>

...

Sorry to insist, I'm just giving more details of my point of view.

Kind regards,

Manuel Martínez

Anton Tananaev8 years ago

Have you actually read my reply? There is already a cache.

With database cache how would you get address for live data in the server?

Do you have any numbers? The only valid use case I can think of is when device is not moving and sends same location. In-memory cache caters for that perfectly.

marivan48 years ago

boa noite hoje tenho traccar instalado com opengts , porem quando mudo o traccar para buscar no meu servidor nominatim local ele so aparece SP-BR poderia me dar uma sugestao de como corrigir, para as portas opengts o nominatim esta rodando normal mas quando coloco nas portas traccar os rastreadores ele nao aparece a localizaçao apenas SP_BR desculpe se coloquei no local errad qualquer coisa me mova para o o local certo do forum desde ja muito obrigado

Manu8 years ago

Yes, I've read your reply.

I'm not talking about removing in-memory cache, just saying that could be good to separate address related info into a new table.

Database cache could be a second level cache to consult before calling external Location Providers.

I don't have real numbers as far as I've just tested Traccar only last month with Traccar mobile app and last week with a GPS tracker device.

The only numbers I can give for now is database related:

mysql> select count() From position;
+----------+
| count(
) |
+----------+
| 121960 |
+----------+
1 row in set (0.01 sec)

mysql> select count() from (select distinct longitude, latitude, address From position) address;
+----------+
| count(
) |
+----------+
| 12908 |
+----------+
1 row in set (0.12 sec)

I don't know why but, there's also NULL addresses into database...

mysql> select count() from position where address is null;
+----------+
| count(
) |
+----------+
| 75801 |
+----------+

To have more realistic data maybe would be good to register some statistic info inside traccar (cache hits and miss), and to dump this info periodically to log...

Regards,

Manuel Martínez

Anton Tananaev8 years ago

So you suggesting to have two layers of caching. Can you also answer the question about getting data back in the server from the database cache. It means we need to query database for every position as far as I understand, correct?

Manu8 years ago

Yes, the procedure could be:

  1. Check if the position is already in L1 in-memory cache.
  2. if not, check if it's in L2 DB cache.
  3. else call external location provider.

But I'm just starting to browse and understand the java code, so I may be wrong.

I'm just making a suggestion, you're the expert! ;-)

Kind regards,

Manuel Martínez

Manu8 years ago

In fact, having a second level cache would probably help to populate in-memory cache without calling external location providers after a server restart (and flushing memory).

Regards,

Manuel Martínez

Anton Tananaev8 years ago

I don't see any good reason for having the second layer of cache except possibly saving some space in the database. On the other side it would definitely increase complexity and reduce performance of the system.

Manu8 years ago

I agree with you that it will increase complexity but not in the posible performance reduction. If it's well done, a second level cache should not reduce performance, just the oposite. If the position is already in memory you don't need to get it from the database (same performance), otherwise you look into it and you should get a performance improve rather than reducing it...

In fact, we're considering two things:

  1. Normalize the position table (separate the address info into a location new table) -> That would reduce the space needed and complicate a little bit the sql queries, but not much.

  2. Have a second level cache. In some cases it would reduce the number of queries to external location providers.

Well, that's just my opinion. Thanks for considering the idea.

Regards,

Manuel Martínez

Anton Tananaev8 years ago

I'm not convinced for now. I need numbers to see if it's really worth it.