I am using "monetdb start {DBname}" to start the DB.
Following is logged in the merovingian.log.
It's using --set gdk_nr_threads=8.

/usr/local/bin/mserver5 --dbpath=/usr/local/monetdb_home/stats_db_farm/msearch_stats_db --set merovingian_uri=mapi:monetdb://monet-db-1.hzdc15.sokrati.com:50000/msearch_stats_db --set mapi_open=false --set mapi_port=0 --set mapi_usock=/usr/local/monetdb_home/stats_db_farm/msearch_stats_db/.mapi.sock --set monet_vault_key=/usr/local/monetdb_home/stats_db_farm/msearch_stats_db/.vaultkey --set gdk_nr_threads=8 --set max_clients=128 --set sql_optimizer=default_pipe --set monet_daemon=yes

Thanks and Regards,
Tapomay.

From: Martin Kersten <Martin.Kersten@cwi.nl>
To: Communication channel for MonetDB users <users-list@monetdb.org>
Sent: Tuesday, February 12, 2013 12:00 AM
Subject: Re: monetdb status health

Dear Tapomay,

Looking a little further into the error context, I was wondering
if you spawned the mserver with a  --set gdk_nr_threads=N
where N > 1024?

regards, Martin

On 2/11/13 3:53 PM, Tapomay Dey wrote:
> After analysing the logs I see the following as a lone single ERR
> statement different from rest of the repeating ERR logs stating
> (crashed, manual intervention needed):
> 2013-02-10 21:09:57 ERR msearch_stats_db[3073]: mserver5:
> mal_dataflow.c:587: runMALdataflow: Assertion `workers[0]' failed.
>
> I also see "ERR merovingian[16831]: client error: unknown or impossible
> state: 4" in the later stages.
>
> Thanks and Regards,
> Tapomay.
>
> ------------------------------------------------------------------------
> *From:* Tapomay Dey <tapomay@yahoo.com>
> *To:* Communication channel for MonetDB users <users-list@monetdb.org>
> *Sent:* Monday, February 11, 2013 4:46 PM
> *Subject:* Re: monetdb status health
>
> Great info. Thanks a lot.
> I am going to keep my db in this state. I will be able to perform your
> suggestions tomorrow.
> Until then this is what I see in the logs when I do a fresh monetdb start:
> 2013-02-11 10:49:46 MSG merovingian[31990]: database 'msearch_stats_db'
> (32018) was killed by signal SIGSEGV
> 2013-02-11 10:49:46 ERR control[31990]: (local): failed to fork mserver:
> database 'msearch_stats_db' has crashed after starting, manual
> intervention ...
>
> FYI I have not included my UDF. I have done a basic configure-make-make
> install with no modifications/extra options on Ubuntu 12.10 64
> bit.(mercurial changeset: 46861:45c89b2e2ac2 Wed Feb 06 11:42:37)
> Usage profile: There is a constant transactional insert/update load of
> the order of 100 update attempts per second. There is a "select 1;"
> fired every 10 seconds to check if DB is alive. There has been no
> significant select load yet(we are still loading historical data into
> the db).
>
> Thanks and Regards,
> Tapomay.
>
>
> ------------------------------------------------------------------------
> *From:* Stefan Manegold <Stefan.Manegold@cwi.nl>
> *To:* Communication channel for MonetDB users <users-list@monetdb.org>
> *Sent:* Monday, February 11, 2013 12:17 PM
> *Subject:* Re: monetdb status health
>
> Tapomay,
>
> in addition to what Martin suggests, please consider also checking the
> merovingian log for all details, in particular any server error messages.
> To understand the cause of the problem, it is crucial to know the exact
> error messages, in fact the exact sequence of events that led to the
> current situation.
> So, what did you do when (or just before) the server crashed first on
> your database? What did you do then? Each step (and its outcome
> including both error messages on the client and server errors in the
> merovingian log) is important.
>
> If you use a genuine (i.e., non-modified) Feb2013 code base, the problem
> (obviously) exists in that code. If you modified the code locally (e.g.,
> by adding UDFs), the problem might also be in your code.
>
> You might also what to consider building a debug version (i.e.,
> configured with --disable-optimize --enable-debug --enable-assert),
> start mserver5 by hand on you database using the exact command line as
> given in the merovingian log (possibly also in a debugger), and see
> where (and why?) the crash occurs.
>
> Best,
> Stefan
>
> ----- Original Message -----
>  > Dear Tapomay,
>  >
>  > Taking the non-released revision is indeed "living on the edge".
>  >
>  > On 2/11/13 6:44 AM, Tapomay Dey wrote:
>  > > 1. BTW I am running a non-released revision of Feb13 branch. Could
>  > > this
>  > > be the reason for such a crash?
>  > > I am doing so coz I need a fix that Niels had made for fixing a
>  > > concurrency issue that caused duplicate keys.
>  > > Also planning to implement group_concat UDF as per the changed
>  > > semantics
>  > > of Feb13. I already have a partially running one for Oct12.
>  > There are a few cases known where it may crash, it is worked upon.
>  > In the testweb you can find the few cases. In general, you would be
>  > the unlucky guy if you hit on them immediately. They seem rare.
>  > >
>  > > 2. As the DB crashes each time I try to start it I think its a
>  > > perfect
>  > > state to gather more diagnostics. How do I do so?
>  > > I really need that a DB never reaches a non-recoverable state.
>  > If it never passes the initialization phase after restart, it is most
>  > likely a corrupted database. This could happen as a result of a
>  > hardware
>  > failure, or an unknown error software error that caused a crash.
>  > It may be your UDF that went haywire and caused the system to loose.
>  > If it crashes without your UDF, then a run of the mserver using gdb
>  > may provide a hint on the whereabouts
>  > (see calling sequence in meriovingian.log to start mserver directly)
>  >
>  > My approach would now be:
>  > 1) restore database from backup (or a small testdb)
>  > 2) ensure it is working correctly without your UDF
>  > 3) prepare test cases for your UDF
>  > 4) add your UDF
>  > 5) start/stop after the first few calls of code with UDF
>  > to observe behavior.
>  >
>  > Success, Martin
>  >
>  > >
>  > > My setup is such that there would be non-stop Inserts/updates into
>  > > the
>  > > DB 24/7.
>  > >
>  > > Thanks and Regards,
>  > > Tapomay.
>  > >
>  > >
> ------------------------------------------------------------------------
>  > > *From:* Tapomay Dey <tapomay@yahoo.com <mailto:tapomay@yahoo.com>>
>  > > *To:* Communication channel for MonetDB users
>  > > <users-list@monetdb.org <mailto:users-list@monetdb.org>>
>  > > *Sent:* Monday, February 11, 2013 10:47 AM
>  > > *Subject:* Re: monetdb status health
>  > >
>  > > Thanks a lot.
>  > > But since the time I asked the question the DB has gone into a
>  > > state
>  > > where it keeps logging
>  > > 2013-02-11 04:12:40 ERR merovingian[15380]: client error: database
>  > > 'msearch_stats_db' has crashed after starting, manual intervention
>  > > needed, check monetdbd's logfile for details
>  > >
>  > > in merovingian.log.
>  > >
>  > > Health is 1%.
>  > >
>  > > What can I do at this stage?
>  > >
>  > > Thanks and Regards,
>  > > Tapomay.
>  > >
>  > >
> ------------------------------------------------------------------------
>  > > *From:* Fabian Groffen <fabian@monetdb.org <mailto:fabian@monetdb.org>>
>  > > *To:* users-list@monetdb.org <mailto:users-list@monetdb.org>
>  > > *Sent:* Sunday, February 10, 2013 11:33 PM
>  > > *Subject:* Re: monetdb status health
>  > >
>  > > On 10-02-2013 09:44:09 -0800, Tapomay Dey wrote:
>  > >  > My questions are simple:
>  > >  >
>  > >  > what causes crashes?
>  > >
>  > > The mserver5 (monetdb database) terminates in such a way that it
>  > > can
>  > > not be considered a clean shutdown, this is usually the case when
>  > > the
>  > > program gets terminated due to a condition that makes further
>  > > execution
>  > > impossible, e.g. memory faults.  These are almost always program
>  > > errors.
>  > >
>  > >  > what is health?
>  > >
>  > > Health is the percentage of start-stop sequences compared to the
>  > > number
>  > > of times the database was actually started.  E.g. how many times a
>  > > start
>  > > was followed by a clean shutdown (hence no crash).
>  > >
>  > >  > how do we stop health from degrading?
>  > >
>  > > You can't, a database that crashes, and keeps on doing so will
>  > > cause the
>  > > health of the database to degrade.
>  > >
>  > >  > Following is the status of my db-
>  > >  >  start count: 140
>  > >  >  stop count: 1
>  > >  >  crash count: 138
>  > >
>  > > So, essentially, every time you start your database, it never
>  > > reaches a
>  > > point where you stop it cleanly, but instead your database crashes
>  > > all
>  > > the time.
>  > >
>  > >
>  > > --
>  > > Fabian Groffen fabian@monetdb.org <mailto:fabian@monetdb.org>
> <mailto:fabian@monetdb.org <mailto:fabian@monetdb.org>>
>  > > column-store pioneer http://www.monetdb.org/Home
>  > > _______________________________________________
>  > > users-list mailing list
>  > > users-list@monetdb.org <mailto:users-list@monetdb.org>
> <mailto:users-list@monetdb.org <mailto:users-list@monetdb.org>>
>  > > http://mail.monetdb.org/mailman/listinfo/users-list
>  > >
>  > >
>  > >
>  > > _______________________________________________
>  > > users-list mailing list
>  > > users-list@monetdb.org <mailto:users-list@monetdb.org>
> <mailto:users-list@monetdb.org <mailto:users-list@monetdb.org>>
>  > > http://mail.monetdb.org/mailman/listinfo/users-list
>  > >
>  > >
>  > >
>  > >
>  > > _______________________________________________
>  > > users-list mailing list
>  > > users-list@monetdb.org <mailto:users-list@monetdb.org>
>  > > http://mail.monetdb.org/mailman/listinfo/users-list
>  > >
>  > _______________________________________________
>  > users-list mailing list
>  > users-list@monetdb.org <mailto:users-list@monetdb.org>
>  > http://mail.monetdb.org/mailman/listinfo/users-list
>  >
>
> --
> | Stefan.Manegold@CWI.nl <mailto:Stefan.Manegold@CWI.nl> | DB
> Architectures  (DA) |
> | www.CWI.nl/~manegold/  | Science Park 123 (L321) |
> | +31 (0)20 592-4212    | 1098 XG Amsterdam  (NL) |
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org <mailto:users-list@monetdb.org>
> http://mail.monetdb.org/mailman/listinfo/users-list
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org <mailto:users-list@monetdb.org>
> http://mail.monetdb.org/mailman/listinfo/users-list
>
>
>
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> http://mail.monetdb.org/mailman/listinfo/users-list
>
_______________________________________________
users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list