Re: Pathological behaviour when building large hash index

6 Dec 2013

      Sorry for the quick intrusion, I just urged to say this is one of the most
interesting threads of the year ;-)
On Fri, Dec 6, 2013 at 10:06 AM, Roberto Cornacchia roberto@spinque.comwrote:
...
Sorry for the quick intrusion, I just urged to say this is one of the most
interesting threads of the year ;-)
On Thu, Dec 5, 2013 at 8:57 PM, Will Muldrew will.muldrew@gmail.comwrote:
...
No index!
On 5 Dec 2013 17:05, "Stefan Manegold" Stefan.Manegold@cwi.nl wrote:
...
hm, that's with the index in place?
If so:
good for our optimizer,
but unfortunately not so good for you ...
If not:

good for you!
impressive for our "execution engine" (scan)

(- "embarrassing" for our optimizer ... well, it's a minor detail ...)
;-)
Stefan
----- Original Message -----
...
Sorry!
sql>select count(*) from t_order2 where orderId between 'AAAAAA' and
'AAAAAA';
+------+
| L1 |
+======+
| 0 |
+------+
1 tuple (596.202ms)
On 5 December 2013 16:49, Stefan Manegold < Stefan.Manegold@cwi.nl >
wrote:
...
Hi Will,
thank for reminding me: the optimizer still sees your query as point
query
...
and thus prevents parallelization in order to build/use the hash index.
Maybe you can "fool" the optimizer (and I hope noone else is reading
what I'm
...
writing now ... ;-)) by "disguising" the point predicate as range
predicate,
...
say, "where orderId between 'FOO' and 'FOO'" --- in fact, I rather
hope (for
...
us)
that is will not work ...
Stefan
----- Original Message -----
...
Hi Stefan
I've added that hash = FALSE patch, and now my simple query returns
reliably
in 6 seconds. It's still a lot slower than the indexed one
(obviously), and
...
...
12x slower than the one with the time > '1970-01-01' optimiser nudge
(which
...
...
causes parallelism), but it'll do for now.
Thanks
-Will
On 5 December 2013 16:32, Will Muldrew < will.muldrew@gmail.com >
wrote:
...
...
Hi Stefan
Here are my version details:
MonetDB-11.15.17
CentOS release 6.3
kernel 3.6.2
I think I might proceed with a patched version with the hash
disabled.
...
...
That's
totally fine for me right now, though obviously it'd be nice to be
on the
...
...
trunk!
W.r.t. a less random access hash generation. This is all very
hand-wavey
...
...
but
how about: 1) generate a sequence of (hash,oid) pairs to some file or
mmapped area. 2). use some well-known "external sort" algo to sort
these
...
...
pairs by hash (e.g. split, qsort, merge). 3). build your final hash
by
...
...
iterating through this sorted datastructure.
Your memory writing would then be sequential and you'd get good page
hit
...
...
characteristics. You just need to choose a decent sort algo -
ideally which
...
...
works transparently over memmapped data.
What do you think? I might even have a crack at it myself :)
-Will
On 5 December 2013 15:54, Stefan Manegold < Stefan.Manegold@cwi.nl> wrote:
Hi Will,
could you please also share with us which version of MonetDB you are
using,
...
...
and OS you are running?
Pleass find more comments & questions inlined below.
----- Original Message -----
...
Hi
I've been investigating very slow queries on my 350M row dataset
when
...
...
...
selecting for equality on a non-unique string id. e.g.
select count(*) from t_order where orderId = 'FOO';
Is your orderId column unique, or are there duplicate values?
...
This query takes ~4.5 hours to build a thash index. Subsequent
repeats
...
...
...
hit
the cache and are sub milli.
select count(*) from t_order where time > '1970-01-01' and orderId
=
...
...
...
'FOO'
This second one is consistently finished in a few seconds or less,
and
...
...
...
doesn't create any indexes.
The first query is a bit of a show-stopper. It spends a huge about
of
...
...
...
time
in
BAThash (gdk_search.c), and incurs a lot of page faults while
reading and
...
...
...
writing at random across a multi GB mmaped thash file.
Additionally these
...
...
...
page faults are hampered by the OS trying to write dirty pages out
in the
...
...
...
background. I've got plenty of RAM overall (130GB), but the
'active' page
...
...
...
proportion for the thash file seems to be stuck at about 25% -
giving me
...
...
...
a
0.75 probability of a miss.

Is there some way to make my OS less keen to evict pages?

(swappiness =
...
...
...
0
already) I should have plenty of room to have the whole thash file
resident.
Additionally, allowing it to sit on dirty pages longer would
reduce the
...
...
...
total write IO.
In fact, if there is sufficient memory in total, there would be no
need to
...
...
flush
dirty (non-persistent) pages at all, but we have not yet found a
good way
...
...
to
hit
this to the OS --- madvice() did not have much/any useful efficet
for us on
...
...
Linux.
Maybe some of the issues dicsussed here play a role:
http://engineering.linkedin.com/performance/optimizing-linux-memory-manageme...
...
...
?
While we have not yet encountered the NUMA problem reported there,
we have encountered the transparent huge page problems, and usually
disable them as also suggested in the above posting.
...

Is there some way to simply prevent large thash creation? For

many
...
...
...
applications I'd rather have slower consistent queries, then incur
a
...
...
...
massively slow query following a restart (not great in a prod
environment!)
Not easily right now.
However, you can either add the "fake" time predicate as you did
above,
...
...
which (in this case) triggers MonetDB's optimizer to push the time
predicate
(either because it's in timestamp (int) rather then varchar
(string), or
...
...
because the time column happens to be ordered) below the orderId
predicate
...
...
and this avoid the hash build,
or change the following line(s) in gdk/gdk_select.c (in case you
built
...
...
MonetDB yourself from source):
hash = b->batPersistence == PERSISTENT &&
(size_t) ATOMsize(b->ttype) > sizeof(BUN) / 4 &&
BATcount(b) * (ATOMsize(b->ttype) + 2 * sizeof(BUN)) <
GDK_mem_maxsize / 2;
...
...
->
hash = FALSE;
We'll consider whether we can provide a better solution in a future
release.
...

Is there some way to generate the hash index with a more

sympathetic
...
...
...
algo
that doesn't degrade so steeply. e.g some hash + sort. (I have
literally
...
...
...
no
experience with this!)
We'll also look into this. However, building a hash index inherently
incurs
...
...
random access, so chances are slim ...
Best,
Stefan
...
-Will Muldrew

users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
--
| Stefan.Manegold@CWI.nl | DB Architectures (DA) |
| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
--
| Stefan.Manegold@CWI.nl | DB Architectures (DA) |
| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |

users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

Roberto Cornacchia

tags (0)

participants (1)