Hello,

  (google translate)

I have a query that takes a long time :
select count(distinct idvisiteur) from src_inbox__at__lm__visites;

After two hours of running, it is not over, I'll kill her.

if I look at the heapsize column storage ()
sql>select * from storage() WHERE "table"='src_inbox__at__lm__visites';
+--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+-------------+---------+--------+
| schema | table                      | column              | type    | location    | count     | typewidth | columnsize | heapsize    | indices | sorted |
+========+============================+=====================+=========+=============+===========+===========+============+=============+=========+========+
| sys    | src_inbox__at__lm__visites | idvisite            | varchar | 01/00/10025 | 315819936 |         5 | 2526559488 | 27755937792 |       0 | false  |
| sys    | src_inbox__at__lm__visites | idvisiteur          | varchar | 76/7667     | 315819936 |        19 | 2526559488 | 80449110016 |       0 | false  |
| sys    | src_inbox__at__lm__visites | idvisiteuridentifie | varchar | 62/6201     | 315819936 |         2 | 1263279744 |  2222391296 |       0 | false  |
| sys    | src_inbox__at__lm__visites | siteniveau1         | varchar | 77/7704     | 315819936 |         5 |  315819936 |       65536 |       0 | false  |
| sys    | src_inbox__at__lm__visites | debutvisite         | varchar | 30/3064     | 315819936 |        19 | 2526559488 | 10741678080 |       0 | false  |
| sys    | src_inbox__at__lm__visites | catvisiteur         | varchar | 31/3171     | 315819936 |         0 |  631639872 |       65536 |       0 | false  |
| sys    | src_inbox__at__lm__visites | region              | varchar | 67/6706     | 315819936 |        12 | 1263279744 |  2077556736 |       0 | false  |
| sys    | src_inbox__at__lm__visites | ville               | varchar | 40/4071     | 315819936 |        10 | 2526559488 | 22635741184 |       0 | false  |
| sys    | src_inbox__at__lm__visites | os                  | varchar | 01/26/12673 | 315819936 |         8 |  631639872 |       65536 |       0 | false  |
| sys    | src_inbox__at__lm__visites | browser             | varchar | 01/44/14410 | 315819936 |        17 |  631639872 |       65536 |       0 | false  |
| sys    | src_inbox__at__lm__visites | sourcevisite        | varchar | 42/4234     | 315819936 |         1 |  631639872 |       65536 |       0 | false  |
| sys    | src_inbox__at__lm__visites | idcampagne          | varchar | 01/41/14100 | 315819936 |         0 |  631639872 |       65536 |       0 | false  |
| sys    | src_inbox__at__lm__visites | prefixe             | varchar | 01/22/12277 | 315819936 |         0 |  315819936 |       65536 |       0 | false  |
| sys    | src_inbox__at__lm__visites | libcampagne         | varchar | 65/6573     | 315819936 |         0 |  631639872 |       65536 |       0 | false  |
| sys    | src_inbox__at__lm__visites | emailing            | varchar | 01/04/10434 | 315819936 |         0 |  631639872 |       65536 |       0 | false  |
| sys    | src_inbox__at__lm__visites | dateenvois          | varchar | 01/22/12264 | 315819936 |         0 |  631639872 |       65536 |       0 | false  |
| sys    | src_inbox__at__lm__visites | nbpages             | int     | 56/5615     | 315819936 |         4 | 1263279744 |           0 |       0 | false  |
| sys    | src_inbox__at__lm__visites | dureevisite         | time    | 01/02/10211 | 315819936 |         4 | 1263279744 |           0 |       0 | false  |
| sys    | src_inbox__at__lm__visites | id_unique           | int     | 01/23/12370 | 315819936 |         4 | 1263279744 |           0 |       0 | false  |
+--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+-------------+---------+--------+

Is the heapsize column represents the amount of data to be placed in RAM ?
I have 64GB of RAM on the server, so the idvisiteur column can not be held in RAM. Is this true?  ( 64 000 000 000 <  80 449 110 016 )


I execute a vacuum (experimental command ?) :   https://www.monetdb.org/Documentation/UserGuide/AutoloadingScript  (Table vacuum commands (experimental!))
call vacuum('sys','src_inbox__at__lm__visites');

sql>select * from storage() WHERE "table"='src_inbox__at__lm__visites';
+--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+------------+---------+--------+
| schema | table                      | column              | type    | location    | count     | typewidth | columnsize | heapsize   | indices | sorted |
+========+============================+=====================+=========+=============+===========+===========+============+============+=========+========+
| sys    | src_inbox__at__lm__visites | idvisite            | varchar | 36          | 315819936 |         5 | 2526559488 | 4906811392 |       0 | false  |
| sys    | src_inbox__at__lm__visites | idvisiteur          | varchar | 64/6435     | 315819936 |        19 | 1263279744 | 3667394560 |       0 | false  |
| sys    | src_inbox__at__lm__visites | idvisiteuridentifie | varchar | 66/6640     | 315819936 |         2 | 1263279744 |   40042496 |       0 | false  |
| sys    | src_inbox__at__lm__visites | siteniveau1         | varchar | 67/6743     | 315819936 |         5 |  315819936 |      73728 |       0 | false  |
| sys    | src_inbox__at__lm__visites | debutvisite         | varchar | 70/7045     | 315819936 |        19 | 2526559488 | 7576551424 |       0 | false  |
| sys    | src_inbox__at__lm__visites | catvisiteur         | varchar | 71/7152     | 315819936 |         0 |  631639872 |      73728 |       0 | false  |
| sys    | src_inbox__at__lm__visites | region              | varchar | 72/7252     | 315819936 |        12 | 1263279744 |  106561536 |       0 | false  |
| sys    | src_inbox__at__lm__visites | ville               | varchar | 73/7353     | 315819936 |        10 | 1263279744 | 1362165760 |       0 | false  |
| sys    | src_inbox__at__lm__visites | os                  | varchar | 74/7453     | 315819936 |         8 |  631639872 |      73728 |       0 | false  |
| sys    | src_inbox__at__lm__visites | browser             | varchar | 75/7554     | 315819936 |        17 |  631639872 |      73728 |       0 | false  |
| sys    | src_inbox__at__lm__visites | sourcevisite        | varchar | 76/7655     | 315819936 |         1 |  631639872 |      73728 |       0 | false  |
| sys    | src_inbox__at__lm__visites | idcampagne          | varchar | 77/7761     | 315819936 |         0 |  631639872 |      73728 |       0 | false  |
| sys    | src_inbox__at__lm__visites | prefixe             | varchar | 01/00/10064 | 315819936 |         0 |  315819936 |      73728 |       0 | false  |
| sys    | src_inbox__at__lm__visites | libcampagne         | varchar | 01/01/10165 | 315819936 |         0 |  631639872 |      73728 |       0 | false  |
| sys    | src_inbox__at__lm__visites | emailing            | varchar | 01/02/10270 | 315819936 |         0 |  631639872 |      73728 |       0 | false  |
| sys    | src_inbox__at__lm__visites | dateenvois          | varchar | 01/03/10372 | 315819936 |         0 |  631639872 |      73728 |       0 | false  |
| sys    | src_inbox__at__lm__visites | nbpages             | int     | 01/04/10476 | 315819936 |         4 | 1263279744 |          0 |       0 | false  |
| sys    | src_inbox__at__lm__visites | dureevisite         | time    | 01/06/10600 | 315819936 |         4 | 1263279744 |          0 |       0 | false  |
| sys    | src_inbox__at__lm__visites | id_unique           | int     | 01/07/10703 | 315819936 |         4 | 1263279744 |          0 |       0 | false  |
+--------+----------------------------+---------------------+---------+-------------+-----------+-----------+------------+------------+---------+--------+

heapsize of idvisteur =  3 667 394 560 (3,6GB)  instead of  80 449 110 016  (80 GB)


my query is present MUCH faster
sql>select count(distinct idvisiteur) from src_inbox__at__lm__visites;
+----------+
| L1       |
+==========+
| 75528081 |
+----------+
1 tuple (4m 22s)


HOURA !!!


but WHY WHY WHY ?  I want to understand

Thank

Pierre


--
1G6
52 route de bischwiller
67300 Schiltigheim

Société de Services et de Formations en Logiciels Libres
http://1g6.biz
Tél : 06 64 63 70 35