Have you tried running a debug version of mserver5 in gdb? That would give you the complete stack when the server throws the error/exception.



-----Original Message-----
From: Stefano Fioravanzo [fioravanzos@gmail.com]
Sent: Sunday, August 14, 2016 03:46 AM Eastern Standard Time
To: Communication channel for MonetDB users
Subject: Re: Program contains errors

Hello,

After another trial with the queries that produced the error mentioned in the last mail, I got another error line which is:
*** Error in `mserver5': corrupted double-linked list: 0x00007feb1c20aec0 ***

As this issue seems to be a dead end, I’m just hoping that something could ring a bell to someone for a solution.

Thanks,
Stefano

On 10 Aug 2016, at 18:34, Stefano Fioravanzo <fioravanzos@gmail.com> wrote:

Hello Stefan,

Every time I have loaded the raw data from my csv files, so the are no issues in using different versions.

Sadly the merovingian log does not give any information at all when this error occurs, and neither does the server console.
Is there a way (in the stable version) to enable some verbose output, or to know in some way what actual error gave the message ‘program contains error’?

To check if it is a space problem, I have created 80 GB of data using github.com/niocs/tqgen
Command used: ./tqgen  --OutFilePat  /home/ubuntu/fake_data/tq.YYYYMMDD.csv  --NumStk 1000 --Seed 234532 --DateBeg 20200101 --DateEnd 20211131
And then to import: for i in /home/ubuntu/fake_data/*; do mclient -d testdatabase -s "copy offset 2 into fake_data from '$i' using delimiters ',' NULL AS '';"; done

Well, even with this data some queries returned the same error, ‘Program contains error’.

After a few more tests, after recreating the database and reimporting my data, the server console gave me this error on a query, I hove it can be more informative:

!WARNING: gdk_bat.c:2083: assertion `b->batCount <= b->batCapacity' failed
!WARNING: gdk_bat.c:2084: assertion `b->T->heap.size >= b->T->heap.free' failed
!WARNING: gdk_bat.c:2083: assertion `b->batCount <= b->batCapacity' failed
!WARNING: gdk_bat.c:2084: assertion `b->T->heap.size >= b->T->heap.free' failed
Segmentation fault

So there seem to be a problem with space indeed.
The table used in the query that produced the error came from a csv of 80GB (reserved data which I cannot share), I have 30GB of RAM (There is nothing else running on the machine), 60GB of swap, 450GB of disk space….
Any suggestion?

Stefano
On 05 Aug 2016, at 14:07, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:

Hi Stefano,

did/do you re-create/re-load your database from scratch for each version of MonetDB you tested,
or do you access that same database (same dbfarm) using the different versions?

Please be aware that we only support database upgrades between two consecutive releases of MonetDB.
All other cases are not supported and their behavior is undefined, and might thus include
anything from the server refusing to start to "silent" corruption of the database.

Having said that, it's generally hard for us to analyze problems, let alone solve them,
without being provided detailed instructions (if necessary including data) how to reproduce them.

More over, the assertion you get with the default branch has nothing to do with not enough space;
it indicated that internally some administrative information is inconsistent --- why is hard,
if not impossible, to understand without being able to reproduce the problem.

Also, with the 'Program contains error' message of the client, their might be more detailed
information in the merovingian log or on the server console?

Best,
Stefan


----- On Aug 4, 2016, at 10:09 PM, Stefano Fioravanzo fioravanzos@gmail.com wrote:

Hello,

I will briefly recap the problem I have already reported a few weeks ago: I am
using monetdb default branch (I did not compile the stable version because I
need embedded python) and I have a large table (~80 GB). A simple query like
SELECT * FROM large_table WHERE field=x, fails.
mserver5 reports this error:
mserver5: gdk_select.c:867: fullscan_int: Assertion `cnt < (bn)->S.capacity'
failed.

It seems that monet thinks it doesn’t have enough space to do a full scan of the
table, maybe? (The machine has 800GB of free disk space and 30GB of RAM -
Ubuntu 14.04).
I was using a default branch of december 2015, I have tried to use the latest
default branch (4/08/16), recompiled, installed but nothing, always the same
error.

So, as suggested, I have compiled the latest stable release (Jun2016-SP1). Now
even a query which does not require monet to read the whole table (like SELECT
* FROM large_table LIMIT 10 - This query worked fine in the default branch
version) fails with ‘Program contains error’ message by client.
Selecting a single column of the table works just fine.
Moreover, with the stable release, the same issue arises with the second biggest
table of the database (~35 GB).
[Here you can find the configuration of MonetDB:
http://pastie.org/private/r8gtssg944aqwnigrysa]

All queries on all the other tables in the database (they are all much smaller
than these two) work fine. I do not understand if the issue is in the size of
table or somewhere else. How can I know what ‘error’ the message ‘program
contains errors’ is referring to?

NOTE: At the beginning, using the default compiled branch everything worked just
fine! Every query on my large_table worked as expected. Then I had to recreate
the database from scratch so I deleted the dbfarm and created a new one. That’s
when this headache started…

At this point I don’t know what to do, any help would be MUCH appreciated.

Thanks,

Stefano

_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list