tail heap hash found to be invalid on repeated DB loads

Sjoerd Mullender sjoerd at monetdb.org
Thu Mar 29 09:47:30 CEST 2018

On 29/03/18 00:37, Eyal Rozenberg wrote:
> As part of, shall we say, helping with the QA of the
> March-feature-branch-based releases, I'm looking at some of the GDK code
> involving loading a BBP (and the action thereof).
> Specifically, I've been looking at the function
> gdk/gdk_atoms.c:strCleanHash(), which recomputes the opportunistic
> partial hash table for a tail heap; checks if it's equal to what (seems
> to be) the hash loaded from disk; and if they don't match - copies the
> new hash over the old one.
> I've added an indication in case the hashes don't match. And - I've
> noticed this happens repeatedly when trying to load the same database,
> even after previous plain vanilla unmodified source code runs, with no
> crashes and orderly shutdown. At least - I'm seeing this happen with
> 11.29.1, and on a TPC-H SF 1 database I created with the script at
> https://github.com/eyalroz/tpch-tools
> My questions:
> 1. Is a discrepancy between the old and recomputed hash, when there had
> not been any problems on the previous run, expected behavior - or is it
> a bug?

It is expected behavior, so that's why the code is there.

> 2. If it is expected, why isn't a correct hash written back to the disk?
> Doesn't the memcmp(newhash, h->base, sizeof(newhash)) command
> eventually/immediately get translated into a write?

It doesn't happen all the time, and you don't want to be writing to
files when there is no need.  We've had complaints about that.

> 3. If this is not expected behavior - has this been verified not to
> happen on other machines? Other versions?
> 4. Could this be a 11.29.3 vs 11.29.1 issue somehow? I haven't updated
> to the latest RC yet.


The problem is that during normal database operations, string heaps can
be appended to, but if the change doesn't actually get committed, the
appends need to be undone.  This can happen e.g. when a transaction is
aborted due to server shutdown (crash or regular).  The committed data
indicates how large the string heap is (BBP.dir), but it may be that
there is more data, so that gets truncated.  That is fine.  But the hash
may have been changed as well.  If in the aborted run the file had been
loaded through GDKmalloc, that change wouldn't have gone to disk, but if
it had been memory mapped, the change may well have gone to disk.  That
is what needs to be repaired.

If a string heap is larger than GDK_ELIMLIMIT (64KiB), we can't be sure
where strings start, and anyway, it becomes expensive to run through the
whole heap, so we only rebuild the hash for the first 64 KiB.  This
means that it is likely the hash entry will be different if the heap is
larger since the newer hash values will be different.  Note though, that
the hash table is completely valid, even if different.  The entries just
point at different strings, but since it is opportunistic, that's fine.

> Eyal
> _______________________________________________
> developers-list mailing list
> developers-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list

Sjoerd Mullender

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://www.monetdb.org/pipermail/developers-list/attachments/20180329/83dcbec5/attachment.sig>

More information about the developers-list mailing list