Dear Tapomay,
The error message is quite specific and certainly calls for
the
SQL schema/query to analyse. You might sent me the MAL plan
of that
query (using EXPLAIN command at SQL level).
regards, Martin
On 2/11/13 3:53 PM, Tapomay Dey wrote:
> After analysing the logs I see the following as a lone
single ERR
> statement different from rest of the repeating ERR logs
stating
> (crashed, manual intervention needed):
> 2013-02-10 21:09:57 ERR msearch_stats_db[3073]:
mserver5:
> mal_dataflow.c:587: runMALdataflow: Assertion
`workers[0]' failed.
>
> I also see "ERR merovingian[16831]: client error:
unknown or impossible
> state: 4" in the later stages.
>
> Thanks and Regards,
> Tapomay.
>
>
------------------------------------------------------------------------
> *From:* Tapomay Dey <
tapomay@yahoo.com>
> *To:* Communication channel for MonetDB users <
users-list@monetdb.org>
> *Sent:* Monday, February 11, 2013 4:46 PM
> *Subject:* Re: monetdb status health
>
> Great info. Thanks a lot.
> I am going to keep my db in this state. I will be able
to perform your
> suggestions tomorrow.
> Until then this is what I see in the logs when I do a
fresh monetdb start:
> 2013-02-11 10:49:46 MSG merovingian[31990]: database
'msearch_stats_db'
> (32018) was killed by signal SIGSEGV
> 2013-02-11 10:49:46 ERR control[31990]: (local): failed
to fork mserver:
> database 'msearch_stats_db' has crashed after starting,
manual
> intervention ...
>
> FYI I have not included my UDF. I have done a basic
configure-make-make
> install with no modifications/extra options on Ubuntu
12.10 64
> bit.(mercurial changeset: 46861:45c89b2e2ac2 Wed Feb 06
11:42:37)
> Usage profile: There is a constant transactional
insert/update load of
> the order of 100 update attempts per second. There is a
"select 1;"
> fired every 10 seconds to check if DB is alive. There
has been no
> significant select load yet(we are still loading
historical data into
> the db).
>
> Thanks and Regards,
> Tapomay.
>
>
>
------------------------------------------------------------------------
> *From:* Stefan Manegold <
Stefan.Manegold@cwi.nl>
> *To:* Communication channel for MonetDB users <
users-list@monetdb.org>
> *Sent:* Monday, February 11, 2013 12:17 PM
> *Subject:* Re: monetdb status health
>
> Tapomay,
>
> in addition to what Martin suggests, please consider
also checking the
> merovingian log for all details, in particular any
server error messages.
> To understand the cause of the problem, it is crucial
to know the exact
> error messages, in fact the exact sequence of events
that led to the
> current situation.
> So, what did you do when (or just before) the server
crashed first on
> your database? What did you do then? Each step (and its
outcome
> including both error messages on the client and server
errors in the
> merovingian log) is important.
>
> If you use a genuine (i.e., non-modified) Feb2013 code
base, the problem
> (obviously) exists in that code. If you modified the
code locally (e.g.,
> by adding UDFs), the problem might also be in your
code.
>
> You might also what to consider building a debug
version (i.e.,
> configured with --disable-optimize --enable-debug
--enable-assert),
> start mserver5 by hand on you database using the exact
command line as
> given in the merovingian log (possibly also in a
debugger), and see
> where (and why?) the crash occurs.
>
> Best,
> Stefan
>
> ----- Original Message -----
> > Dear Tapomay,
> >
> > Taking the non-released revision is indeed
"living on the edge".
> >
> > On 2/11/13 6:44 AM, Tapomay Dey wrote:
> > > 1. BTW I am running a non-released revision
of Feb13 branch. Could
> > > this
> > > be the reason for such a crash?
> > > I am doing so coz I need a fix that Niels
had made for fixing a
> > > concurrency issue that caused duplicate
keys.
> > > Also planning to implement group_concat UDF
as per the changed
> > > semantics
> > > of Feb13. I already have a partially running
one for Oct12.
> > There are a few cases known where it may crash,
it is worked upon.
> > In the testweb you can find the few cases. In
general, you would be
> > the unlucky guy if you hit on them immediately.
They seem rare.
> > >
> > > 2. As the DB crashes each time I try to
start it I think its a
> > > perfect
> > > state to gather more diagnostics. How do I
do so?
> > > I really need that a DB never reaches a
non-recoverable state.
> > If it never passes the initialization phase after
restart, it is most
> > likely a corrupted database. This could happen as
a result of a
> > hardware
> > failure, or an unknown error software error that
caused a crash.
> > It may be your UDF that went haywire and caused
the system to loose.
> > If it crashes without your UDF, then a run of the
mserver using gdb
> > may provide a hint on the whereabouts
> > (see calling sequence in meriovingian.log to
start mserver directly)
> >
> > My approach would now be:
> > 1) restore database from backup (or a small
testdb)
> > 2) ensure it is working correctly without your
UDF
> > 3) prepare test cases for your UDF
> > 4) add your UDF
> > 5) start/stop after the first few calls of code
with UDF
> > to observe behavior.
> >
> > Success, Martin
> >
> > >
> > > My setup is such that there would be
non-stop Inserts/updates into
> > > the
> > > DB 24/7.
> > >
> > > Thanks and Regards,
> > > Tapomay.
> > >
> > >
>
------------------------------------------------------------------------
> > > *From:* Tapomay Dey <
tapomay@yahoo.com
<mailto:
tapomay@yahoo.com>>
> > > *To:* Communication channel for MonetDB
users
> > > <
users-list@monetdb.org
<mailto:
users-list@monetdb.org>>
> > > *Sent:* Monday, February 11, 2013 10:47 AM
> > > *Subject:* Re: monetdb status health
> > >
> > > Thanks a lot.
> > > But since the time I asked the question the
DB has gone into a
> > > state
> > > where it keeps logging
> > > 2013-02-11 04:12:40 ERR merovingian[15380]:
client error: database
> > > 'msearch_stats_db' has crashed after
starting, manual intervention
> > > needed, check monetdbd's logfile for details
> > >
> > > in merovingian.log.
> > >
> > > Health is 1%.
> > >
> > > What can I do at this stage?
> > >
> > > Thanks and Regards,
> > > Tapomay.
> > >
> > >
>
------------------------------------------------------------------------
> > > *From:* Fabian Groffen <
fabian@monetdb.org
<mailto:
fabian@monetdb.org>>
> > > *To:*
users-list@monetdb.org
<mailto:
users-list@monetdb.org>
> > > *Sent:* Sunday, February 10, 2013 11:33 PM
> > > *Subject:* Re: monetdb status health
> > >
> > > On 10-02-2013 09:44:09 -0800, Tapomay Dey
wrote:
> > > > My questions are simple:
> > > >
> > > > what causes crashes?
> > >
> > > The mserver5 (monetdb database) terminates
in such a way that it
> > > can
> > > not be considered a clean shutdown, this is
usually the case when
> > > the
> > > program gets terminated due to a condition
that makes further
> > > execution
> > > impossible, e.g. memory faults. These are
almost always program
> > > errors.
> > >
> > > > what is health?
> > >
> > > Health is the percentage of start-stop
sequences compared to the
> > > number
> > > of times the database was actually started.
E.g. how many times a
> > > start
> > > was followed by a clean shutdown (hence no
crash).
> > >
> > > > how do we stop health from degrading?
> > >
> > > You can't, a database that crashes, and
keeps on doing so will
> > > cause the
> > > health of the database to degrade.
> > >
> > > > Following is the status of my db-
> > > > start count: 140
> > > > stop count: 1
> > > > crash count: 138
> > >
> > > So, essentially, every time you start your
database, it never
> > > reaches a
> > > point where you stop it cleanly, but instead
your database crashes
> > > all
> > > the time.
> > >
> > >
> > > --
> > > Fabian Groffen
fabian@monetdb.org
<mailto:
fabian@monetdb.org>
> <mailto:
fabian@monetdb.org
<mailto:
fabian@monetdb.org>>
> > > column-store pioneer
http://www.monetdb.org/Home
> > >
_______________________________________________
> > > users-list mailing list
> > >
users-list@monetdb.org
<mailto:
users-list@monetdb.org>
> <mailto:
users-list@monetdb.org
<mailto:
users-list@monetdb.org>>
> > >
http://mail.monetdb.org/mailman/listinfo/users-list
> > >
> > >
> > >
> > >
_______________________________________________
> > > users-list mailing list
> > >
users-list@monetdb.org
<mailto:
users-list@monetdb.org>
> <mailto:
users-list@monetdb.org
<mailto:
users-list@monetdb.org>>
> > >
http://mail.monetdb.org/mailman/listinfo/users-list
> > >
> > >
> > >
> > >
> > >
_______________________________________________
> > > users-list mailing list
> > >
users-list@monetdb.org
<mailto:
users-list@monetdb.org>
> > >
http://mail.monetdb.org/mailman/listinfo/users-list
> > >
> > _______________________________________________
> > users-list mailing list
> >
users-list@monetdb.org
<mailto:
users-list@monetdb.org>
> >
http://mail.monetdb.org/mailman/listinfo/users-list
> >
>
> --
> |
Stefan.Manegold@CWI.nl
<mailto:
Stefan.Manegold@CWI.nl>
| DB
> Architectures (DA) |
> |
www.CWI.nl/~manegold/ | Science Park 123 (L321) |
> | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
>
> _______________________________________________
> users-list mailing list
>
users-list@monetdb.org
<mailto:
users-list@monetdb.org>
>
http://mail.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
>
users-list@monetdb.org
<mailto:
users-list@monetdb.org>
>
http://mail.monetdb.org/mailman/listinfo/users-list
>
>
>
>
> _______________________________________________
> users-list mailing list
>
users-list@monetdb.org
>
http://mail.monetdb.org/mailman/listinfo/users-list
>
_______________________________________________
users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list