Hi,

Yes you were right.

We may have made a manipulation mistake with gdb, We were able this time to have the backtrace and to execute the commands you were asking for.
Here are the results:


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f0357fa0700 (LWP 6892)]
0x00007f0358d27fd9 in delta_bind_bat (bat=0x31b4470, access=0, temp=0) at bat_storage.c:166
166                     bat_set_access(b, BAT_READ);
(gdb) bt
#0  0x00007f0358d27fd9 in delta_bind_bat (bat=0x31b4470, access=0, temp=0) at bat_storage.c:166
#1  0x00007f0358d28128 in bind_col (tr=0x7f03480016a0, c=0x7f034805cec0, access=0) at bat_storage.c:185
#2  0x00007f0358bfd8ac in sql_storage (cntxt=0x7f03591df328, mb=0x7f03482abab0, stk=0x7f03483e81a0, pci=0x7f034837aeb0) at sql.c:4742
#3  0x00007f03628970a6 in runMALsequence (cntxt=0x7f03591df328, mb=0x7f03482abab0, startpc=1, stoppc=0, stk=0x7f03483e81a0, env=0x0, pcicaller=0x0) at mal_interpreter.c:631
#4  0x00007f0362896597 in callMAL (cntxt=0x7f03591df328, mb=0x7f03482abab0, env=0x7f0357f9fa18, argv=0x7f0357f9faa0, debug=0 '\000') at mal_interpreter.c:447
#5  0x00007f0358c05d40 in SQLexecutePrepared (c=0x7f03591df328, be=0x7f03480caf10, q=0x7f034839e970) at sql_execute.c:328
#6  0x00007f0358c06144 in SQLengineIntern (c=0x7f03591df328, be=0x7f03480caf10) at sql_execute.c:390
#7  0x00007f0358c04dd2 in SQLengine (c=0x7f03591df328) at sql_scenario.c:1307
#8  0x00007f03628b491b in runPhase (c=0x7f03591df328, phase=4) at mal_scenario.c:515
#9  0x00007f03628b4ae5 in runScenarioBody (c=0x7f03591df328) at mal_scenario.c:560
#10 0x00007f03628b4bf4 in runScenario (c=0x7f03591df328) at mal_scenario.c:579
#11 0x00007f03628b5b98 in MSserveClient (dummy=0x7f03591df328) at mal_session.c:439
#12 0x00007f03628b5782 in MSscheduleClient (command=0x7f03480008d0 "\001", challenge=0x7f0357f9fd70 "Fte2MIUw", fin=0x7f0348006b40, fout=0x7f03480049c0) at mal_session.c:319
#13 0x00007f0362967c9f in doChallenge (data=0x7f03500008d0) at mal_mapi.c:184
#14 0x00007f0362365007 in thread_starter (arg=0x7f0350000a50) at gdk_system.c:458
#15 0x00007f036073aa51 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f036048793d in clone () from /lib64/libc.so.6
(gdb) print *bat
$1 = {name = 0x31b44e0 "wifipass_nas_nasidentifier", bid = 2248, ibase = 97505, ibid = 2284, uibid = 1238, uvbid = 2284, cnt = 97505, ucnt = 0, cached = 0x0, wtime = 1, next = 0x0}
(gdb) up
#1  0x00007f0358d28128 in bind_col (tr=0x7f03480016a0, c=0x7f034805cec0, access=0) at bat_storage.c:185
185             return delta_bind_bat( c->data, access, isTemp(c));
(gdb) print *c
$2 = {base = {wtime = 0, rtime = 0, allocated = 0, flag = 0, id = 7334, name = 0x7f034805cf40 "nasidentifier"}, type = {type = 0x2f23d10, digits = 45, scale = 0}, colnr = 19, null = 0 '\000', def = 0x0, unique = 0 '\000',
  drop_action = 0, storage_type = 0x0, sorted = 0, dcount = 0, min = 0x0, max = 0x0, t = 0x7f034805be20, data = 0x31b4470}

Regards

Mathieu

On Thu, Oct 22, 2015 at 6:26 PM, Ying Zhang <Y.Zhang@cwi.nl> wrote:

> On Oct 22, 2015, at 18:04, Mathieu Raillard <mraillard@data-mat.fr> wrote:
>
> Hi all,
>
> As gdb isn't providing any stacktrace when segfault occurs,

Hai Mathieu,

Is this because you (probably accidentally) pressed the key “C” at some moment during the execution?  Because in your GDB info. of yesterday morning, I see that GDB immediately continues after having received the SIGSEGV.

Would you please retry running MonetDB in GDB to see if GDB will stop at the SIGSEGV?  Then execute the following commands in GD:

print *bat
up
print *c

Thanks!

Jennie


> we ve launched mserver5 with valgrind.
> Here is the valgrind output when the server is crashing:
>
> ==1995== Thread 8:
> ==1995== Syscall param write(buf) points to uninitialised byte(s)
> ==1995==    at 0x71D877D: ??? (in /lib64/libpthread-2.12.so)
> ==1995==    by 0x562BD6A: GDKsave (gdk_storage.c:369)
> ==1995==    by 0x552BF93: HEAPsave_intern (gdk_heap.c:708)
> ==1995==    by 0x552BFD0: HEAPsave (gdk_heap.c:714)
> ==1995==    by 0x57B561E: BATimprints (gdk_imprints.c:847)
> ==1995==    by 0x54FD095: BAT_scanselect (gdk_select.c:910)
> ==1995==    by 0x5504180: BATsubselect (gdk_select.c:1719)
> ==1995==    by 0x4EE0D53: ALGsubselect2 (algebra.c:341)
> ==1995==    by 0x4E76C22: malCommandCall (mal_interpreter.c:119)
> ==1995==    by 0x4E790CA: runMALsequence (mal_interpreter.c:655)
> ==1995==    by 0x4E7C5D2: DFLOWworker (mal_dataflow.c:376)
> ==1995==    by 0x71D1A50: start_thread (in /lib64/libpthread-2.12.so)
> ==1995==  Address 0x14724820 is 64 bytes inside a block of size 1,760 alloc'd
> ==1995==    at 0x4C27A2E: malloc (vg_replace_malloc.c:270)
> ==1995==    by 0x559286F: GDKmalloc_prefixsize (gdk_utils.c:641)
> ==1995==    by 0x55928D8: GDKmallocmax (gdk_utils.c:667)
> ==1995==    by 0x552A212: HEAPalloc (gdk_heap.c:105)
> ==1995==    by 0x57B3F72: BATimprints (gdk_imprints.c:770)
> ==1995==    by 0x54FD095: BAT_scanselect (gdk_select.c:910)
> ==1995==    by 0x5504180: BATsubselect (gdk_select.c:1719)
> ==1995==    by 0x4EE0D53: ALGsubselect2 (algebra.c:341)
> ==1995==    by 0x4E76C22: malCommandCall (mal_interpreter.c:119)
> ==1995==    by 0x4E790CA: runMALsequence (mal_interpreter.c:655)
> ==1995==    by 0x4E7C5D2: DFLOWworker (mal_dataflow.c:376)
> ==1995==    by 0x71D1A50: start_thread (in /lib64/libpthread-2.12.so)
> ==1995==
> ==1995== Thread 5:
> ==1995== Invalid read of size 8
> ==1995==    at 0x11713FD9: delta_bind_bat (bat_storage.c:166)
> ==1995==    by 0x11714127: bind_col (bat_storage.c:185)
> ==1995==    by 0x115E98AB: sql_storage (sql.c:4742)
> ==1995==    by 0x4E790A5: runMALsequence (mal_interpreter.c:631)
> ==1995==    by 0x4E78596: callMAL (mal_interpreter.c:447)
> ==1995==    by 0x115F1D3F: SQLexecutePrepared (sql_execute.c:328)
> ==1995==    by 0x115F2143: SQLengineIntern (sql_execute.c:390)
> ==1995==    by 0x115F0DD1: SQLengine (sql_scenario.c:1307)
> ==1995==    by 0x4E9691A: runPhase (mal_scenario.c:515)
> ==1995==    by 0x4E96AE4: runScenarioBody (mal_scenario.c:560)
> ==1995==    by 0x4E96BF3: runScenario (mal_scenario.c:579)
> ==1995==    by 0x4E97B97: MSserveClient (mal_session.c:439)
> ==1995==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
> ==1995==
> ==1995==
> ==1995== Process terminating with default action of signal 11 (SIGSEGV)
> ==1995==  Access not within mapped region at address 0x18
> ==1995==    at 0x11713FD9: delta_bind_bat (bat_storage.c:166)
> ==1995==    by 0x11714127: bind_col (bat_storage.c:185)
> ==1995==    by 0x115E98AB: sql_storage (sql.c:4742)
> ==1995==    by 0x4E790A5: runMALsequence (mal_interpreter.c:631)
> ==1995==    by 0x4E78596: callMAL (mal_interpreter.c:447)
> ==1995==    by 0x115F1D3F: SQLexecutePrepared (sql_execute.c:328)
> ==1995==    by 0x115F2143: SQLengineIntern (sql_execute.c:390)
> ==1995==    by 0x115F0DD1: SQLengine (sql_scenario.c:1307)
> ==1995==    by 0x4E9691A: runPhase (mal_scenario.c:515)
> ==1995==    by 0x4E96AE4: runScenarioBody (mal_scenario.c:560)
> ==1995==    by 0x4E96BF3: runScenario (mal_scenario.c:579)
> ==1995==    by 0x4E97B97: MSserveClient (mal_session.c:439)
> ==1995==  If you believe this happened as a result of a stack
> ==1995==  overflow in your program's main thread (unlikely but
> ==1995==  possible), you can try to increase the size of the
> ==1995==  main thread stack using the --main-stacksize= flag.
> ==1995==  The main thread stack size used in this run was 10485760.
> ==1995==
> ==1995== HEAP SUMMARY:
> ==1995==     in use at exit: 41,189,795 bytes in 200,474 blocks
> ==1995==   total heap usage: 279,871 allocs, 79,397 frees, 66,479,732 bytes allocated
> ==1995==
> ==1995== LEAK SUMMARY:
> ==1995==    definitely lost: 2,432 bytes in 74 blocks
> ==1995==    indirectly lost: 0 bytes in 0 blocks
> ==1995==      possibly lost: 40,090,357 bytes in 200,294 blocks
> ==1995==    still reachable: 1,097,006 bytes in 106 blocks
> ==1995==         suppressed: 0 bytes in 0 blocks
> ==1995== Rerun with --leak-check=full to see details of leaked memory
> ==1995==
> ==1995== For counts of detected and suppressed errors, rerun with: -v
> ==1995== Use --track-origins=yes to see where uninitialised values come from
> ==1995== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 21 from 9)
>
> Regards
>
> On Wed, Oct 21, 2015 at 10:18 AM, Niels Nes <Niels.Nes@cwi.nl> wrote:
> On Wed, Oct 21, 2015 at 09:53:31AM +0200, Mathieu Raillard wrote:
> > Hi all,
> >
> > We have managed to recompile MonetDB using the source code from
> > MonetDB-11.21.5.tar.xz and with the option " --enable-debug"
> > We also have a backup of the database (600MB) that can be provided for
> > debugging purpose as the crash is perfectly reproducible.
> > When running "select * from sys.storage();", here is the information
> > gathered in gdb:
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x7f3c24aad700 (LWP 46812)]
> > 0x00007f3c25834fd9 in delta_bind_bat (bat=0x234b480, access=0, temp=0)
> > at
> > bat_storage.c:166
> > 166                     bat_set_access(b, BAT_READ);
> We need the back trace and probably from the calling function (ie
> above delta_bind_bat) you could print the column structure (print *c).
>
> Niels
> > (gdb)
> > Continuing.
> > [Thread 0x7f3c25ce5700 (LWP 46809) exited]
> > [Thread 0x7f3c24eaf700 (LWP 46810) exited]
> > [Thread 0x7f3c24cae700 (LWP 46811) exited]
> > [Thread 0x7f3c245ab700 (LWP 46817) exited]
> > [Thread 0x7f3c247ac700 (LWP 46822) exited]
> > [Thread 0x7f3c17fff700 (LWP 46823) exited]
> > [Thread 0x7f3c24aad700 (LWP 46812) exited]
> >
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > The program no longer exists.
> > (gdb)
> > The program is not being run.
> > (gdb)
> > The program is not being run.
> > (gdb) backtrace
> > No stack.
> > (gdb)
> >
> > Regards
> >
> >
> > Mathieu
> >
> > On Sun, Oct 18, 2015 at 6:03 PM, Ying Zhang <Y.Zhang@cwi.nl> wrote:
> >
> >     Hai Mathieu,
> >
> >     Thanks for using MonetDB, and sorry for the crash.
> >
> >     If possible, can you please give us the necessary data to reproduce
> >     the crash?  They include:
> >     - the schema
> >     - a small set of (anonymised) data
> >     - the queries
> >
> >     You can also compile MonetDB from source with the --enable-debug
> >     option, so that GDB can give you the exact line where the crash has
> >     happend, and the value of the variable/statement/function/etc that
> >     has caused the crash.
> >
> >     Regards,
> >
> >     Jennie
> >
> >     > On Oct 15, 2015, at 17:50, Mathieu Raillard <
> >     mraillard@data-mat.fr> wrote:
> >     >
> >     > Hi all,
> >     >
> >     > We are using MoentDB since 6 months with quite good results.
> >     (thanks to the dev team for their good work)
> >     >
> >     > Unfortunately yesterday we came across an issue for which we
> >     can't find a solution.
> >     > Our database crashed for an unknown reason (RAM was fine, Disk
> >     space was fine) and now each time we re-use the same table with a
> >     query we used during the crash, the database crash with a seg fault
> >     >
> >     > Query : select * from storage();
> >     > OS Version : CentOS release 6.7 (Final)
> >     > MonetDB Version : MonetDB Database Server Toolkit v1.1 (Jul2015)
> >     >
> >     > tail /var/log/messages :
> >     > kernel: mserver5[9945]: segfault at 18 ip 00007f9e86fd534b sp
> >     00007f9e862912e0 error 4 in lib_sql.so[7f9e86eb1000+182000]
> >     >
> >     > merovingian.log :
> >     > database 'BI-DEV' (9972) was killed by signal SIGSEGV
> >     >
> >     > Log from gbd :
> >     >
> >     > Program received signal SIGSEGV, Segmentation fault.
> >     > [Switching to Thread 0x7ffa24edc700 (LWP 4683)]
> >     > 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/monetdb5/
> >     lib_sql.so
> >     > (gdb) bt
> >     > #0  0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/
> >     monetdb5/lib_sql.so
> >     > #1  0x00007ffa25b20fee in sql_storage () from /usr/lib64/monetdb5
> >     /lib_sql.so
> >     > #2  0x00007ffa321aea45 in runMALsequence () from /usr/lib64/
> >     libmonetdb5.so.19
> >     > #3  0x00007ffa321afe29 in callMAL () from /usr/lib64/
> >     libmonetdb5.so.19
> >     > #4  0x00007ffa25b369f7 in SQLexecutePrepared () from /usr/lib64/
> >     monetdb5/lib_sql.so
> >     > #5  0x00007ffa25b36efc in SQLengineIntern () from /usr/lib64/
> >     monetdb5/lib_sql.so
> >     > #6  0x00007ffa321c87dd in runScenarioBody () from /usr/lib64/
> >     libmonetdb5.so.19
> >     > #7  0x00007ffa321c886f in runScenario () from /usr/lib64/
> >     libmonetdb5.so.19
> >     > #8  0x00007ffa321c9738 in MSserveClient () from /usr/lib64/
> >     libmonetdb5.so.19
> >     > #9  0x00007ffa321ca816 in MSscheduleClient () from /usr/lib64/
> >     libmonetdb5.so.19
> >     > #10 0x00007ffa32266e0f in doChallenge () from /usr/lib64/
> >     libmonetdb5.so.19
> >     > #11 0x00007ffa31d2adcf in thread_starter () from /usr/lib64/
> >     libbat.so.12
> >     > #12 0x00007ffa2f97da51 in start_thread (arg=0x7ffa24edc700) at
> >     pthread_create.c:301
> >     > #13 0x00007ffa2f6ca93d in clone () at ../sysdeps/unix/sysv/linux/
> >     x86_64/clone.S:115
> >     >
> >     >
> >     > Any help to understand and correct what is going on would be
> >     nice.
> >     >
> >     > Regards
> >     >
> >     > Mathieu
> >     > _______________________________________________
> >     > users-list mailing list
> >     > users-list@monetdb.org
> >     > https://www.monetdb.org/mailman/listinfo/users-list
> >
> >     _______________________________________________
> >     users-list mailing list
> >     users-list@monetdb.org
> >     https://www.monetdb.org/mailman/listinfo/users-list
> >
> >
>
> > _______________________________________________
> > users-list mailing list
> > users-list@monetdb.org
> > https://www.monetdb.org/mailman/listinfo/users-list
>
>
> --
> Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI)
> Science Park 123, 1098 XG Amsterdam, The Netherlands
> room L3.14,  phone ++31 20 592-4098     sip:4098@sip.cwi.nl
> url: https://www.cwi.nl/people/niels    e-mail: Niels.Nes@cwi.nl
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list