[Monetdb-developers] BAT sizes

Henning Rode h.rode at cs.utwente.nl
Mon Oct 15 16:42:12 CEST 2007


hej stefan,

it was indeed more a general question. no direct problem.
but thanks for the detailed answer.

we can indeed not avoid overallocation at index building. however, once
such index tables are filled, nothing will be added anymore. and there
would be enough time here, for any kind of memory optimizations.
however, that seems not neccessary according to what you just explained.

best -henning


Stefan Manegold wrote:
> [I once again felt free to share this with the community ...]
> 
> Henning,
> 
> in case BAT capacities are significantly larger than their actual content
> (count), this might indeed have a negative influence on performance.
> (1) in case the BAT is memory-mapped when loaded, it "only" blocks some more
> address space than strictly necessary (no problem on 64-bit systems,
> potentially a problem on 32-bit systems);
> (2) in case the BAT is malloced when loaded, it also occupied some more
> memory than strictly necessary (potential problem on but 64- & 32-bit
> systems).
> 
> However, unless there is some accurate estimation, it is often hard (or
> virually impossible) to "guess" a BATs size before filling it; hence, a
> "generous" initial size allocation is good to avoid expensive BAT extents.
> 
> In your case, I'm lost concerning which BATs your taling about. The
> shredder-generated pre_* (actually rid_*) BATs need to be allocated before
> reading the document; hence, there is know knowledge about the number of
> nodes in the document, and as far as I can tell no trivial way to estimate
> this accurately. Hence, the shredder needs to guess something --- JanF can
> tell more, I guess...
> 
> In case of the TIJAH indices, I have no clue at all, how/where/when they are
> built and whether there might be better information available to not
> overallocate but allocate only just enough space. You or some of your
> colleagues in Twente should know all the details.
> 
> Finally, is there any concrete case where you actually experiences any
> problems due to "over-allocation", or are you just wondering?
> 
> Stefan
> 
> ps: in the case given below, the batsize just fits the BAT's capacity; only
>     the count is smaller than the capacity (obviously, it cannot be larger)
>     --- if you want/need to know why, you better ask him/her who
>     allocated/created/filled the "tj_DFLT_FT_INDEX_size1" BAT ...
> 
> 
> On Mon, Oct 15, 2007 at 02:36:46PM +0200, Henning Rode wrote:
>> hej stefan,
>>
>> sorry, that i did not answer earlier. i justed wanted to report the
>> actual sizes of pf/tijah indices in a paper. so that is done now.
>>
>> still, i was asking myself, whether it might have any kind of
>> performance influences, that BAT capacities are so much higher than the
>> actual BAT counts. This is of course handy, when we still want to add
>> new entries, but once we indexed a collection, we usually only query it.
>>
>> in case of our "pre_size" BAT this difference between BATsize and
>> BATdsksize can easily be 250MB or more.
>>
>> best -henning
>>
>> mil>var t := bat("tj_DFLT_FT_INDEX_size1");
>> mil>t.count().print();
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>>  |203091470
>>
>>                               |
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>> mil>t.capacity().print();
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>>  |260898816
>>
>>                               |
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>> mil>t.batsize().print();
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>>  |1043599360
>>
>>                               |
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>> mil>t.batdsksize().print();
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>>  |812366848
>>
>>
>>
>> mil>var x := t.copy();
>> mil>x.count().print();
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>>  |203091470
>>
>>                               |
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>> mil>x.capacity().print();
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>>  |260898816
>>
>>                               |
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>> mil>x.batsize().print();
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>>  |1043599360
>>
>>                               |
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>> mil>x.batdsksize().print();
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>>  |812366848
>>
>>                               |
>> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>> mil>t.access(BAT_READ);
>>
>>
>> Stefan Manegold wrote:
>>> Henning,
>>>
>>> you should also check & report b.capacity(), i.e.,
>>>
>>> b.count();
>>> b.capacity();
>>> b.info().reverse().like("batBuns").like("size").print();
>>> b.batsize();   
>>> b.batdsksize();
>>>
>>> var c := b.copy();
>>>
>>> c.count();
>>> c.capacity();
>>> c.info().reverse().like("batBuns").like("size").print();
>>> c.batsize();   
>>> c.batdsksize();
>>>
>>> Stefan
>>>
>>>
>>> On Sat, Oct 06, 2007 at 07:43:28PM +0200, Stefan Manegold wrote:
>>>> [felt free to cc the monetdb-developers list as more people might be
>>>>  interested or want to contribute]
>>>>
>>>> Henning,
>>>>
>>>> are you just "concerned" or are you having concrete problems with the bat
>>>> sizes?
>>>>
>>>> In cany case, to give any reasonable answer we'd need to know more about the
>>>> details. In particular how large is the BAT your talking about.
>>>>
>>>> I.e., with "b" being your BAT and "c := b.copy()", please check & report
>>>>
>>>> b.count();
>>>> b.info().reverse().like("batBuns").like("size").print();
>>>> b.batsize();
>>>> b.batdsksize();
>>>>
>>>> c.count();
>>>> c.info().reverse().like("batBuns").like("size").print();
>>>> c.batsize();
>>>> c.batdsksize();
>>>>
>>>> Stefan
>>>>
>>>>
>>>> On Fri, Oct 05, 2007 at 01:47:01PM +0200, Henning Rode wrote:
>>>>> hej stefan,
>>>>>
>>>>> thanks for the answer. so in conclusion, the over-allocation of memory
>>>>> is quite normal, and nothing to worry about.
>>>>>
>>>>> i was more surprised that the copied BAT still has this considerable
>>>>> over-allocation of memory, though it exactly knows how many entries it
>>>>> needs to hold.
>>>>>
>>>>> groeten -henning
>>>> -- 
>>>> | Dr. Stefan Manegold | mailto:Stefan.Manegold at cwi.nl |
>>>> | CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
>>>> | 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
>>>> | The Netherlands     | Fax : +31 (20) 592-4312       |
> 





More information about the developers-list mailing list