Hi all,

As gdb isn't providing any stacktrace when segfault occurs, we ve launched mserver5 with valgrind.
Here is the valgrind output when the server is crashing:

==1995== Thread 8:
==1995== Syscall param write(buf) points to uninitialised byte(s)
==1995==    at 0x71D877D: ??? (in /lib64/libpthread-2.12.so)
==1995==    by 0x562BD6A: GDKsave (gdk_storage.c:369)
==1995==    by 0x552BF93: HEAPsave_intern (gdk_heap.c:708)
==1995==    by 0x552BFD0: HEAPsave (gdk_heap.c:714)
==1995==    by 0x57B561E: BATimprints (gdk_imprints.c:847)
==1995==    by 0x54FD095: BAT_scanselect (gdk_select.c:910)
==1995==    by 0x5504180: BATsubselect (gdk_select.c:1719)
==1995==    by 0x4EE0D53: ALGsubselect2 (algebra.c:341)
==1995==    by 0x4E76C22: malCommandCall (mal_interpreter.c:119)
==1995==    by 0x4E790CA: runMALsequence (mal_interpreter.c:655)
==1995==    by 0x4E7C5D2: DFLOWworker (mal_dataflow.c:376)
==1995==    by 0x71D1A50: start_thread (in /lib64/libpthread-2.12.so)
==1995==  Address 0x14724820 is 64 bytes inside a block of size 1,760 alloc'd
==1995==    at 0x4C27A2E: malloc (vg_replace_malloc.c:270)
==1995==    by 0x559286F: GDKmalloc_prefixsize (gdk_utils.c:641)
==1995==    by 0x55928D8: GDKmallocmax (gdk_utils.c:667)
==1995==    by 0x552A212: HEAPalloc (gdk_heap.c:105)
==1995==    by 0x57B3F72: BATimprints (gdk_imprints.c:770)
==1995==    by 0x54FD095: BAT_scanselect (gdk_select.c:910)
==1995==    by 0x5504180: BATsubselect (gdk_select.c:1719)
==1995==    by 0x4EE0D53: ALGsubselect2 (algebra.c:341)
==1995==    by 0x4E76C22: malCommandCall (mal_interpreter.c:119)
==1995==    by 0x4E790CA: runMALsequence (mal_interpreter.c:655)
==1995==    by 0x4E7C5D2: DFLOWworker (mal_dataflow.c:376)
==1995==    by 0x71D1A50: start_thread (in /lib64/libpthread-2.12.so)
==1995==
==1995== Thread 5:
==1995== Invalid read of size 8
==1995==    at 0x11713FD9: delta_bind_bat (bat_storage.c:166)
==1995==    by 0x11714127: bind_col (bat_storage.c:185)
==1995==    by 0x115E98AB: sql_storage (sql.c:4742)
==1995==    by 0x4E790A5: runMALsequence (mal_interpreter.c:631)
==1995==    by 0x4E78596: callMAL (mal_interpreter.c:447)
==1995==    by 0x115F1D3F: SQLexecutePrepared (sql_execute.c:328)
==1995==    by 0x115F2143: SQLengineIntern (sql_execute.c:390)
==1995==    by 0x115F0DD1: SQLengine (sql_scenario.c:1307)
==1995==    by 0x4E9691A: runPhase (mal_scenario.c:515)
==1995==    by 0x4E96AE4: runScenarioBody (mal_scenario.c:560)
==1995==    by 0x4E96BF3: runScenario (mal_scenario.c:579)
==1995==    by 0x4E97B97: MSserveClient (mal_session.c:439)
==1995==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
==1995==
==1995==
==1995== Process terminating with default action of signal 11 (SIGSEGV)
==1995==  Access not within mapped region at address 0x18
==1995==    at 0x11713FD9: delta_bind_bat (bat_storage.c:166)
==1995==    by 0x11714127: bind_col (bat_storage.c:185)
==1995==    by 0x115E98AB: sql_storage (sql.c:4742)
==1995==    by 0x4E790A5: runMALsequence (mal_interpreter.c:631)
==1995==    by 0x4E78596: callMAL (mal_interpreter.c:447)
==1995==    by 0x115F1D3F: SQLexecutePrepared (sql_execute.c:328)
==1995==    by 0x115F2143: SQLengineIntern (sql_execute.c:390)
==1995==    by 0x115F0DD1: SQLengine (sql_scenario.c:1307)
==1995==    by 0x4E9691A: runPhase (mal_scenario.c:515)
==1995==    by 0x4E96AE4: runScenarioBody (mal_scenario.c:560)
==1995==    by 0x4E96BF3: runScenario (mal_scenario.c:579)
==1995==    by 0x4E97B97: MSserveClient (mal_session.c:439)
==1995==  If you believe this happened as a result of a stack
==1995==  overflow in your program's main thread (unlikely but
==1995==  possible), you can try to increase the size of the
==1995==  main thread stack using the --main-stacksize= flag.
==1995==  The main thread stack size used in this run was 10485760.
==1995==
==1995== HEAP SUMMARY:
==1995==     in use at exit: 41,189,795 bytes in 200,474 blocks
==1995==   total heap usage: 279,871 allocs, 79,397 frees, 66,479,732 bytes allocated
==1995==
==1995== LEAK SUMMARY:
==1995==    definitely lost: 2,432 bytes in 74 blocks
==1995==    indirectly lost: 0 bytes in 0 blocks
==1995==      possibly lost: 40,090,357 bytes in 200,294 blocks
==1995==    still reachable: 1,097,006 bytes in 106 blocks
==1995==         suppressed: 0 bytes in 0 blocks
==1995== Rerun with --leak-check=full to see details of leaked memory
==1995==
==1995== For counts of detected and suppressed errors, rerun with: -v
==1995== Use --track-origins=yes to see where uninitialised values come from
==1995== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 21 from 9)

Regards

On Wed, Oct 21, 2015 at 10:18 AM, Niels Nes <Niels.Nes@cwi.nl> wrote:
On Wed, Oct 21, 2015 at 09:53:31AM +0200, Mathieu Raillard wrote:
> Hi all,
>
> We have managed to recompile MonetDB using the source code from
> MonetDB-11.21.5.tar.xz and with the option " --enable-debug"
> We also have a backup of the database (600MB) that can be provided for
> debugging purpose as the crash is perfectly reproducible.
> When running "select * from sys.storage();", here is the information
> gathered in gdb:
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7f3c24aad700 (LWP 46812)]
> 0x00007f3c25834fd9 in delta_bind_bat (bat=0x234b480, access=0, temp=0)
> at
> bat_storage.c:166
> 166                     bat_set_access(b, BAT_READ);
We need the back trace and probably from the calling function (ie
above delta_bind_bat) you could print the column structure (print *c).

Niels
> (gdb)
> Continuing.
> [Thread 0x7f3c25ce5700 (LWP 46809) exited]
> [Thread 0x7f3c24eaf700 (LWP 46810) exited]
> [Thread 0x7f3c24cae700 (LWP 46811) exited]
> [Thread 0x7f3c245ab700 (LWP 46817) exited]
> [Thread 0x7f3c247ac700 (LWP 46822) exited]
> [Thread 0x7f3c17fff700 (LWP 46823) exited]
> [Thread 0x7f3c24aad700 (LWP 46812) exited]
>
> Program terminated with signal SIGSEGV, Segmentation fault.
> The program no longer exists.
> (gdb)
> The program is not being run.
> (gdb)
> The program is not being run.
> (gdb) backtrace
> No stack.
> (gdb)
>
> Regards 
>
>
> Mathieu
>
> On Sun, Oct 18, 2015 at 6:03 PM, Ying Zhang <Y.Zhang@cwi.nl> wrote:
>
>     Hai Mathieu,
>
>     Thanks for using MonetDB, and sorry for the crash.
>
>     If possible, can you please give us the necessary data to reproduce
>     the crash?  They include:
>     - the schema
>     - a small set of (anonymised) data
>     - the queries
>
>     You can also compile MonetDB from source with the --enable-debug
>     option, so that GDB can give you the exact line where the crash has
>     happend, and the value of the variable/statement/function/etc that
>     has caused the crash.
>
>     Regards,
>
>     Jennie
>
>     > On Oct 15, 2015, at 17:50, Mathieu Raillard <
>     mraillard@data-mat.fr> wrote:
>     >
>     > Hi all,
>     >
>     > We are using MoentDB since 6 months with quite good results.
>     (thanks to the dev team for their good work)
>     >
>     > Unfortunately yesterday we came across an issue for which we
>     can't find a solution.
>     > Our database crashed for an unknown reason (RAM was fine, Disk
>     space was fine) and now each time we re-use the same table with a
>     query we used during the crash, the database crash with a seg fault
>     >
>     > Query : select * from storage();
>     > OS Version : CentOS release 6.7 (Final)
>     > MonetDB Version : MonetDB Database Server Toolkit v1.1 (Jul2015)
>     >
>     > tail /var/log/messages :
>     > kernel: mserver5[9945]: segfault at 18 ip 00007f9e86fd534b sp
>     00007f9e862912e0 error 4 in lib_sql.so[7f9e86eb1000+182000]
>     >
>     > merovingian.log :
>     > database 'BI-DEV' (9972) was killed by signal SIGSEGV
>     >
>     > Log from gbd :
>     >
>     > Program received signal SIGSEGV, Segmentation fault.
>     > [Switching to Thread 0x7ffa24edc700 (LWP 4683)]
>     > 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/monetdb5/
>     lib_sql.so
>     > (gdb) bt
>     > #0  0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/
>     monetdb5/lib_sql.so
>     > #1  0x00007ffa25b20fee in sql_storage () from /usr/lib64/monetdb5
>     /lib_sql.so
>     > #2  0x00007ffa321aea45 in runMALsequence () from /usr/lib64/
>     libmonetdb5.so.19
>     > #3  0x00007ffa321afe29 in callMAL () from /usr/lib64/
>     libmonetdb5.so.19
>     > #4  0x00007ffa25b369f7 in SQLexecutePrepared () from /usr/lib64/
>     monetdb5/lib_sql.so
>     > #5  0x00007ffa25b36efc in SQLengineIntern () from /usr/lib64/
>     monetdb5/lib_sql.so
>     > #6  0x00007ffa321c87dd in runScenarioBody () from /usr/lib64/
>     libmonetdb5.so.19
>     > #7  0x00007ffa321c886f in runScenario () from /usr/lib64/
>     libmonetdb5.so.19
>     > #8  0x00007ffa321c9738 in MSserveClient () from /usr/lib64/
>     libmonetdb5.so.19
>     > #9  0x00007ffa321ca816 in MSscheduleClient () from /usr/lib64/
>     libmonetdb5.so.19
>     > #10 0x00007ffa32266e0f in doChallenge () from /usr/lib64/
>     libmonetdb5.so.19
>     > #11 0x00007ffa31d2adcf in thread_starter () from /usr/lib64/
>     libbat.so.12
>     > #12 0x00007ffa2f97da51 in start_thread (arg=0x7ffa24edc700) at
>     pthread_create.c:301
>     > #13 0x00007ffa2f6ca93d in clone () at ../sysdeps/unix/sysv/linux/
>     x86_64/clone.S:115
>     >
>     >
>     > Any help to understand and correct what is going on would be
>     nice.
>     >
>     > Regards
>     >
>     > Mathieu
>     > _______________________________________________
>     > users-list mailing list
>     > users-list@monetdb.org
>     > https://www.monetdb.org/mailman/listinfo/users-list
>
>     _______________________________________________
>     users-list mailing list
>     users-list@monetdb.org
>     https://www.monetdb.org/mailman/listinfo/users-list
>
>

> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list


--
Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI)
Science Park 123, 1098 XG Amsterdam, The Netherlands
room L3.14,  phone ++31 20 592-4098     sip:4098@sip.cwi.nl
url: https://www.cwi.nl/people/niels    e-mail: Niels.Nes@cwi.nl
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list