From: Martin Kersten <Martin.Kersten@cwi.nl>
To: Communication
channel for MonetDB users <users-list@monetdb.org>
Sent: Monday, February 11, 2013 11:38 PM
Subject: Re: monetdb status health
Dear Tapomay,
The error message is quite specific and certainly calls for the
SQL schema/query to analyse. You might sent me the MAL plan of that
query (using EXPLAIN command at SQL level).
regards, Martin
On 2/11/13 3:53 PM, Tapomay Dey wrote:
> After analysing the logs I see the following as a lone single ERR
> statement different from rest of the repeating ERR logs stating
> (crashed, manual intervention needed):
> 2013-02-10 21:09:57 ERR msearch_stats_db[3073]: mserver5:
> mal_dataflow.c:587: runMALdataflow: Assertion `workers[0]' failed.
>
> I also see "ERR merovingian[16831]: client error: unknown or impossible
> state: 4" in the later stages.
>
> Thanks and Regards,
> Tapomay.
>
> ------------------------------------------------------------------------
> *From:* Tapomay Dey <
tapomay@yahoo.com>
> *To:* Communication channel for MonetDB users <
users-list@monetdb.org>
> *Sent:* Monday, February 11, 2013 4:46 PM
> *Subject:* Re: monetdb status health
>
> Great info. Thanks a lot.
> I am going to keep my db in this state. I will be able to perform your
> suggestions tomorrow.
> Until then this is what I see in the logs when I do a fresh monetdb start:
> 2013-02-11 10:49:46 MSG merovingian[31990]: database 'msearch_stats_db'
> (32018) was killed by signal SIGSEGV
> 2013-02-11 10:49:46 ERR control[31990]: (local): failed to fork mserver:
> database 'msearch_stats_db' has crashed after starting, manual
> intervention ...
>
> FYI I have not included my UDF. I have done a basic configure-make-make
> install with
no modifications/extra options on Ubuntu 12.10 64
> bit.(mercurial changeset: 46861:45c89b2e2ac2 Wed Feb 06 11:42:37)
> Usage profile: There is a constant transactional insert/update load of
> the order of 100 update attempts per second. There is a "select 1;"
> fired every 10 seconds to check if DB is alive. There has been no
> significant select load yet(we are still loading historical data into
> the db).
>
> Thanks and Regards,
> Tapomay.
>
>
> ------------------------------------------------------------------------
> *From:* Stefan Manegold <
Stefan.Manegold@cwi.nl>
> *To:* Communication channel for MonetDB users <
users-list@monetdb.org>
> *Sent:* Monday, February 11, 2013 12:17
PM
> *Subject:* Re: monetdb status health
>
> Tapomay,
>
> in addition to what Martin suggests, please consider also checking the
> merovingian log for all details, in particular any server error messages.
> To understand the cause of the problem, it is crucial to know the exact
> error messages, in fact the exact sequence of events that led to the
> current situation.
> So, what did you do when (or just before) the server crashed first on
> your database? What did you do then? Each step (and its outcome
> including both error messages on the client and server errors in the
> merovingian log) is important.
>
> If you use a genuine (i.e., non-modified) Feb2013 code base, the problem
> (obviously) exists in that code. If you modified the code locally (e.g.,
> by adding UDFs), the problem might also be in your code.
>
> You might also what to
consider building a debug version (i.e.,
> configured with --disable-optimize --enable-debug --enable-assert),
> start mserver5 by hand on you database using the exact command line as
> given in the merovingian log (possibly also in a debugger), and see
> where (and why?) the crash occurs.
>
> Best,
> Stefan
>
> ----- Original Message -----
> > Dear Tapomay,
> >
> > Taking the non-released revision is indeed "living on the edge".
> >
> > On 2/11/13 6:44 AM, Tapomay Dey wrote:
> > > 1. BTW I am running a non-released revision of Feb13 branch. Could
> > > this
> > > be the reason for such a crash?
> > > I am doing so coz I need a fix that Niels had made for fixing a
> > > concurrency issue that caused duplicate keys.
> > >
Also planning to implement group_concat UDF as per the changed
> > > semantics
> > > of Feb13. I already have a partially running one for Oct12.
> > There are a few cases known where it may crash, it is worked upon.
> > In the testweb you can find the few cases. In general, you would be
> > the unlucky guy if you hit on them immediately. They seem rare.
> > >
> > > 2. As the DB crashes each time I try to start it I think its a
> > > perfect
> > > state to gather more diagnostics. How do I do so?
> > > I really need that a DB never reaches a non-recoverable state.
> > If it never passes the initialization phase after restart, it is most
> > likely a corrupted database. This could happen as a result of a
> > hardware
> >
failure, or an unknown error software error that caused a crash.
> > It may be your UDF that went haywire and caused the system to loose.
> > If it crashes without your UDF, then a run of the mserver using gdb
> > may provide a hint on the whereabouts
> > (see calling sequence in meriovingian.log to start mserver directly)
> >
> > My approach would now be:
> > 1) restore database from backup (or a small testdb)
> > 2) ensure it is working correctly without your UDF
> > 3) prepare test cases for your UDF
> > 4) add your UDF
> > 5) start/stop after the first few calls of code with UDF
> > to observe behavior.
> >
> > Success, Martin
> >
> > >
> > > My setup is such that there would be non-stop
Inserts/updates into
> > > the
> > > DB 24/7.
> > >
> > > Thanks and Regards,
> > > Tapomay.
> > >
> > >
> ------------------------------------------------------------------------
> > > *From:* Tapomay Dey <
tapomay@yahoo.com <mailto:
tapomay@yahoo.com>>
> > > *To:* Communication channel for MonetDB users
> > > <
users-list@monetdb.org <mailto:
users-list@monetdb.org>>
> > > *Sent:* Monday, February 11, 2013 10:47
AM
> > > *Subject:* Re: monetdb status health
> > >
> > > Thanks a lot.
> > > But since the time I asked the question the DB has gone into a
> > > state
> > > where it keeps logging
> > > 2013-02-11 04:12:40 ERR merovingian[15380]: client error: database
> > > 'msearch_stats_db' has crashed after starting, manual intervention
> > > needed, check monetdbd's logfile for details
> > >
> > > in merovingian.log.
> > >
> > > Health is 1%.
> > >
> > > What can I do at this stage?
> > >
> > > Thanks and Regards,
> > > Tapomay.
> > >
> > >
>
------------------------------------------------------------------------
> > > *From:* Fabian Groffen <
fabian@monetdb.org <mailto:
fabian@monetdb.org>>
> > > *To:*
users-list@monetdb.org <mailto:
users-list@monetdb.org>
> > > *Sent:* Sunday, February 10, 2013 11:33 PM
> > > *Subject:* Re: monetdb status health
> > >
> > > On 10-02-2013 09:44:09 -0800, Tapomay Dey wrote:
> > > > My questions are simple:
> > > >
> > > > what causes
crashes?
> > >
> > > The mserver5 (monetdb database) terminates in such a way that it
> > > can
> > > not be considered a clean shutdown, this is usually the case when
> > > the
> > > program gets terminated due to a condition that makes further
> > > execution
> > > impossible, e.g. memory faults. These are almost always program
> > > errors.
> > >
> > > > what is health?
> > >
> > > Health is the percentage of start-stop sequences compared to the
> > > number
> > > of times the database was actually started. E.g. how many times a
> > > start
> > > was followed by a clean shutdown (hence no crash).
> >
>
> > > > how do we stop health from degrading?
> > >
> > > You can't, a database that crashes, and keeps on doing so will
> > > cause the
> > > health of the database to degrade.
> > >
> > > > Following is the status of my db-
> > > > start count: 140
> > > > stop count: 1
> > > > crash count: 138
> > >
> > > So, essentially, every time you start your database, it never
> > > reaches a
> > > point where you stop it cleanly, but instead your database crashes
> > > all
> > > the time.
> > >
> > >
> > > --
> > > Fabian Groffen
fabian@monetdb.org <mailto:
fabian@monetdb.org>
> <mailto:
fabian@monetdb.org <mailto:
fabian@monetdb.org>>
> > > column-store pioneer
http://www.monetdb.org/Home> > > _______________________________________________
> > > users-list mailing list
> > >
users-list@monetdb.org <mailto:
users-list@monetdb.org>
> <mailto:
users-list@monetdb.org <mailto:
users-list@monetdb.org>>
> > >
http://mail.monetdb.org/mailman/listinfo/users-list> > >
> > >
> > >
> > > _______________________________________________
> > > users-list mailing list
> > >
users-list@monetdb.org <mailto:
users-list@monetdb.org>
> <mailto:
users-list@monetdb.org
<mailto:
users-list@monetdb.org>>
> > >
http://mail.monetdb.org/mailman/listinfo/users-list> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > users-list mailing list
> > >
users-list@monetdb.org <mailto:
users-list@monetdb.org>
> > >
http://mail.monetdb.org/mailman/listinfo/users-list> > >
> >
_______________________________________________
> > users-list mailing list
> >
users-list@monetdb.org <mailto:
users-list@monetdb.org>
> >
http://mail.monetdb.org/mailman/listinfo/users-list> >
>
> --
> |
Stefan.Manegold@CWI.nl <mailto:
Stefan.Manegold@CWI.nl> | DB
> Architectures (DA) |
> | www.CWI.nl/~manegold/ | Science Park 123 (L321) |
> | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
>
>
_______________________________________________
> users-list mailing list
>
users-list@monetdb.org <mailto:
users-list@monetdb.org>
>
http://mail.monetdb.org/mailman/listinfo/users-list>
>
>
> _______________________________________________
> users-list mailing list
>
users-list@monetdb.org <mailto:
users-list@monetdb.org>
>
http://mail.monetdb.org/mailman/listinfo/users-list>
>
>
>
> _______________________________________________
> users-list mailing list
>
users-list@monetdb.org>
http://mail.monetdb.org/mailman/listinfo/users-list>
_______________________________________________
users-list mailing list
users-list@monetdb.orghttp://mail.monetdb.org/mailman/listinfo/users-list