From: Stefan Manegold <Stefan.Manegold@cwi.nl>
To:
Communication channel for MonetDB users <users-list@monetdb.org>
Sent: Monday, February 11, 2013 12:17 PM
Subject: Re: monetdb status health
Tapomay,
in addition to what Martin suggests, please consider also checking the merovingian log for all details, in particular any server error messages.
To understand the cause of the problem, it is crucial to know the exact error messages, in fact the exact sequence of events that led to the current situation.
So, what did you do when (or just before) the server crashed first on your database? What did you do then? Each step (and its outcome including both error messages on the client and server errors in the merovingian log) is important.
If you use a genuine (i.e., non-modified) Feb2013 code base, the problem (obviously) exists in that code. If you modified the code locally (e.g., by adding UDFs), the problem might also be in your code.
You might also what to consider building a debug version (i.e., configured with --disable-optimize --enable-debug --enable-assert), start mserver5 by hand on you database using the exact
command line as given in the merovingian log (possibly also in a debugger), and see where (and why?) the crash occurs.
Best,
Stefan
----- Original Message -----
> Dear Tapomay,
>
> Taking the non-released revision is indeed "living on the edge".
>
> On 2/11/13 6:44 AM, Tapomay Dey wrote:
> > 1. BTW I am running a non-released revision of Feb13 branch. Could
> > this
> > be the reason for such a crash?
> > I am doing so coz I need a fix that Niels had made for fixing a
> > concurrency issue that caused duplicate keys.
> > Also planning to implement group_concat UDF as per the changed
> > semantics
> > of Feb13. I already have a partially running one for Oct12.
> There are a few cases known where it may crash, it is worked upon.
> In the
testweb you can find the few cases. In general, you would be
> the
unlucky guy if you hit on them immediately. They seem rare.
> >
> > 2. As the DB crashes each time I try to start it I think its a
> > perfect
> > state to gather more diagnostics. How do I do so?
> > I really need that a DB never reaches a non-recoverable state.
> If it never passes the initialization phase after restart, it is most
> likely a corrupted database. This could happen as a result of a
> hardware
> failure, or an unknown error software error that caused a crash.
> It may be your UDF that went haywire and caused the system to loose.
> If it crashes without your UDF, then a run of the mserver using gdb
> may provide a hint on the whereabouts
> (see calling sequence in meriovingian.log to start mserver directly)
>
> My approach would now be:
> 1) restore database from backup (or a small testdb)
> 2) ensure it is working
correctly without your UDF
> 3) prepare test cases for your UDF
> 4) add your UDF
> 5) start/stop after the first few calls of code with UDF
> to observe behavior.
>
> Success, Martin
>
> >
> > My setup is such that there would be non-stop Inserts/updates into
> > the
> > DB 24/7.
> >
> > Thanks and Regards,
> > Tapomay.
> >
> > ------------------------------------------------------------------------
> > *From:* Tapomay Dey <
tapomay@yahoo.com>
> > *To:* Communication channel for MonetDB users
> > <
users-list@monetdb.org>
> > *Sent:* Monday, February 11, 2013 10:47
AM
> > *Subject:* Re: monetdb status health
>
>
> > Thanks a lot.
> > But since the time I asked the question the DB has gone into a
> > state
> > where it keeps logging
> > 2013-02-11 04:12:40 ERR merovingian[15380]: client error: database
> > 'msearch_stats_db' has crashed after starting, manual intervention
> > needed, check monetdbd's logfile for details
> >
> > in merovingian.log.
> >
> > Health is 1%.
> >
> > What can I do at this stage?
> >
> > Thanks and Regards,
> > Tapomay.
> >
> > ------------------------------------------------------------------------
> > *From:* Fabian Groffen <
fabian@monetdb.org>
> > *To:*
users-list@monetdb.org> >
*Sent:* Sunday, February 10, 2013 11:33 PM
> > *Subject:* Re: monetdb status health
> >
> > On 10-02-2013 09:44:09 -0800, Tapomay Dey wrote:
> > > My questions are simple:
> > >
> > > what causes crashes?
> >
> > The mserver5 (monetdb database) terminates in such a way that it
> > can
> > not be considered a clean shutdown, this is usually the case when
> > the
> > program gets terminated due to a condition that makes further
> > execution
> > impossible, e.g. memory faults. These are almost always program
> > errors.
> >
> > > what is health?
> >
> > Health is the percentage of start-stop sequences compared to the
> > number
> > of times the database was actually started. E.g. how many times a
> > start
>
> was followed by a clean shutdown (hence no crash).
> >
> > > how do we stop health from degrading?
> >
> > You can't, a database that crashes, and keeps on doing so will
> > cause the
> > health of the database to degrade.
> >
> > > Following is the status of my db-
> > > start count: 140
> > > stop count: 1
> > > crash count: 138
> >
> > So, essentially, every time you start your database, it never
> > reaches a
> > point where you stop it cleanly, but instead your database crashes
> > all
> > the time.
> >
> >
> > --
> > Fabian Groffen
fabian@monetdb.org <mailto:
fabian@monetdb.org>
> > column-store pioneer
http://www.monetdb.org/Home> > _______________________________________________
> > users-list mailing list
> >
users-list@monetdb.org <mailto:
users-list@monetdb.org>
> >
http://mail.monetdb.org/mailman/listinfo/users-list> >
> >
> >
> > _______________________________________________
>
> users-list mailing list
> >
users-list@monetdb.org <mailto:
users-list@monetdb.org>
> >
http://mail.monetdb.org/mailman/listinfo/users-list> >
> >
> >
> >
> > _______________________________________________
> > users-list mailing list
> >
users-list@monetdb.org> >
http://mail.monetdb.org/mailman/listinfo/users-list>
>
> _______________________________________________
> users-list mailing list
>
users-list@monetdb.org>
http://mail.monetdb.org/mailman/listinfo/users-list>
--
|
Stefan.Manegold@CWI.nl | DB Architectures (DA) |
| www.CWI.nl/~manegold/ | Science Park 123 (L321) |
| +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
_______________________________________________
users-list mailing list
users-list@monetdb.orghttp://mail.monetdb.org/mailman/listinfo/users-list