Resent, with some of thread details removed due to limits on message length.

On Thu, May 27, 2010 at 2:49 PM, Hering Cheng <hering.cheng@computer.org> wrote:
Hi,

MonetDB core dumped after I had 10 Java threads retrieving a total of 23 million records from a single table via JDBC.  The threads actually all completed successfully.  The crash seems to occur when another process tried to open a connection to the same database.  I can reproduce the crash consistently.

After the core file is produced, processes can connect to MonetDB successfully, with Merovingian starting up the dead mserver5 automatically.

$ tail ~/gaal/rdcuxsrv220-local-disk/chenher/monetdb/feb2010/var/log/MonetDB/merovingian.log
2010-05-27 14:27:39 MSG merovingian[19478]: client rdcuxsrv220:61292 has disconnected from proxy
2010-05-27 14:27:42 MSG merovingian[19478]: client rdcuxsrv220:61296 has disconnected from proxy
2010-05-27 14:27:42 MSG merovingian[19478]: client rdcuxsrv220:61294 has disconnected from proxy
2010-05-27 14:27:42 MSG merovingian[19478]: client rdcuxsrv220:61298 has disconnected from proxy
2010-05-27 14:27:42 MSG merovingian[19478]: client rdcuxsrv220:61288 has disconnected from proxy
2010-05-27 14:27:42 MSG merovingian[19478]: client rdcuxsrv220:61304 has disconnected from proxy
2010-05-27 14:27:42 MSG merovingian[19478]: client rdcuxsrv220:61302 has disconnected from proxy
2010-05-27 14:32:33 MSG merovingian[19478]: proxying client rdcuxsrv220:61784 for database 'taq' to mapi:monetdb://127.0.0.1:50001/taq
2010-05-27 14:33:01 MSG merovingian[19478]: client rdcuxsrv220:61784 has disconnected from proxy
2010-05-27 14:34:06 MSG merovingian[19478]: database 'taq' (17289) has crashed (dumped core)

This is a Solaris SPARC server and MonetDB was built from the Feb2010 source code using Sun Studio 12.1:

$ ( ulimit -d $[32*1024*1024] && ulimit -n $[10*1024]; export LD_PRELOAD_64=/usr/lib/64/libumem.so:${LD_PRELOAD_64}; export LD_PRELOAD=/usr/lib/libumem.so:${LD_PRELOAD}; export MONETDB5CONF=/GAAL/chenher/rdcuxsrv220-local-disk/chenher/monetdb/feb2010/etc/monetdb5.conf; /GAAL/chenher/share/monetdb/distro-sparc-feb2010-64bit-debug/bin/mserver5 --version; )
MonetDB server v5.18.1 (64-bit), based on kernel v1.36.1 (64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2010 MonetDB B.V., all rights reserved
Visit http://monetdb.cwi.nl/ for further information
Found 32.0GiB available memory, 16 available cpu cores
Configured for prefix: /GAAL/chenher/share/monetdb/distro-sparc-feb2010-64bit-debug
Libraries:
  libpcre: 8.01 2010-01-19 (compiled with 8.01)
  openssl: OpenSSL 0.9.8k 25 Mar 2009 (compiled with )
  libxml2: 2.6.23 (compiled with 2.6.23)
Compiled by: chenher@rdcuxsrv220 (sparc-sun-solaris2.10)
Compilation: cc -m64 -xcode=pic32 -I/GAAL/chenher/share/hans_boehm_gc/distro-sparc-6.8-64bit/include/ -g
Linking    : /usr/ccs/bin/ld -m64 -L/GAAL/chenher/share/openssl-64bit/lib -L/GAAL/chenher/share/pcre-64bit/lib -L/GAAL/chenher/share/hans_boehm_gc/distro-sparc-6.8-64bit/lib/

$ dbx ~/gaal/share/monetdb/distro-sparc-feb2010-64bit-debug/bin/mserver5 ~/gaal/rdcuxsrv220-local-disk/chenher/monetdb/feb2010/var/MonetDB5/dbfarm/taq/core
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.7' in your .dbxrc
Reading mserver5
...
Reading lib_logger.so.5.18.1
Reading libuuid.so.1
t@4 (l@4) terminated by signal SEGV (no mapping at the fault address)
Current function is putName
  174           for(l= nme[0]; l && namespace.nme[l]; l= namespace.link[l]){
(dbx) where
current thread: t@4
=>[1] putName(nme = 0xffffffff6aec3928 "exportValue", len = 11U), line 174 in "mal_namespace.c"
  [2] initSQLreferences(), line 49 in "sql_gencode.c"
  [3] SQLinitClient(c = 0xffffffff7f352628), line 379 in "sql_scenario.c"
  [4] runPhase(c = 0xffffffff7f352628, phase = 5), line 363 in "mal_scenario.c"
  [5] runScenarioBody(c = 0xffffffff7f352628), line 392 in "mal_scenario.c"
  [6] runScenario(c = 0xffffffff7f352628), line 438 in "mal_scenario.c"
  [7] MSserveClient(dummy = 0xffffffff7f352628), line 368 in "mal_session.c"
(dbx) threads
      t@1  a  l@1   ?()   LWP suspended in  __pollsys()
      t@2  a  l@2   SERVERlistenThread()   LWP suspended in  __pollsys()
      t@3  a  l@3   mvc_logmanager()   LWP suspended in  __pollsys()
o>    t@4  a  l@4   ?()   signal SIGSEGV in  putName()
      t@5  a  l@5   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
      t@6  a  l@6   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
      t@7  a  l@7   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
      t@8  a  l@8   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
      t@9  a  l@9   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
     t@10  a l@10   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
     t@11  a l@11   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
     t@12  a l@12   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
     t@13  a l@13   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
     t@14  a l@14   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
     t@15  a l@15   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
     t@16  a l@16   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
     t@17  a l@17   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
     t@18  a l@18   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
     t@19  a l@19   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
     t@20  a l@20   runDFLOWworker()   sleep on 0x100c14b08  in  __lwp_park()
     t@21  a l@21   ?()   sleep on 0xffffffff7f352a98  in  __lwp_park()
     t@22  a l@22   ?()   sleep on 0xffffffff7f352d58  in  __lwp_park()
     t@23  a l@23   ?()   sleep on 0xffffffff7f353018  in  __lwp_park()
     t@24  a l@24   ?()   sleep on 0xffffffff7f3532d8  in  __lwp_park()
     t@25  a l@25   ?()   sleep on 0xffffffff7f353598  in  __lwp_park()
     t@26  a l@26   ?()   sleep on 0xffffffff7f353858  in  __lwp_park()
     t@27  a l@27   ?()   sleep on 0xffffffff7f353b18  in  __lwp_park()
     t@28  a l@28   ?()   sleep on 0xffffffff7f353dd8  in  __lwp_park()
     t@29  a l@29   ?()   sleep on 0xffffffff7f354098  in  __lwp_park()
     t@30  a l@30   ?()   sleep on 0xffffffff7f354358  in  __lwp_park()
     t@31  a l@31   runDFLOWworker()   sleep on 0x1012f8988  in  __lwp_park()
     t@32  a l@32   runDFLOWworker()   sleep on 0x1012f8988  in  __lwp_park()
     t@33  a l@33   runDFLOWworker()   sleep on 0x1012f8988  in  __lwp_park()
...
    t@174  a l@174   runDFLOWworker()   sleep on 0x102c9f908  in  __lwp_park()
    t@175  a l@175   runDFLOWworker()   sleep on 0x102c9f908  in  __lwp_park()

Thanks.
Hering