Bug 6760 - Pulling huge data via ODBC is very slow (with standard rendering of output)
Summary: Pulling huge data via ODBC is very slow (with standard rendering of output)
Status: NEW
Alias: None
Product: Client interfaces
Classification: Unclassified
Component: C mapilib (show other bugs)
Version: 11.33.3 (Apr2019)
Hardware: x86_64 (amd64/em64t) Linux
: Normal enhancement
Assignee: clients devs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-17 12:47 CEST by Arshad
Modified: 2019-09-19 18:18 CEST (History)
1 user (show)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Arshad 2019-09-17 12:47:07 CEST
User-Agent:       Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0
Build Identifier: 

Hi,

This is not necessary a bug nor a functionally lost. I am detailing report for a request for enhancement or recommendation and best practices and settings when running mclient or custom build client that needs to pull huge amount of data via ODBC.

Given, that The tuples returned is upwards of 228078844 (228 Million approx) - and standard execution takes roughly 2 hours. Inspection shows that *print* function call could be serializing the output and slowing it down. I am exploring avenues where the improvements could be done. Any pointers and help in direction would be helpful.

Details given below.

The sql executions takes a very small amount of time, however the time taken for fetch() and render is taking upwards of 2 hours. This was figured out by running mclient with "--format=trash". I concluded here that any time upwards of 5 mins is what is spend on rendering.

$ time mclient --format=trash -h hostname  -p 5011 -u USER  -d DB < ./atest.sql
password:

real    5m39.670s
user    4m29.059s
sys     0m2.335s


Second, I looked into unixodbc client isql just to compare the results. Monet odbc client (mclient) was clearly faster.

$ time /data/local/arshad/unixodbc/install237/bin/isql  DSN05 USER Password -b < atest.sql > /dev/null

real    74m56.418s
user    26m35.183s
sys     22m12.731s


Noticed that *print* function call could be serializing the output and slowing down.


(gdb) bt
#0  0x00007fbeba3472de in strchrnul () from /lib64/libc.so.6
#1  0x00007fbeba2fe5e2 in vfprintf () from /lib64/libc.so.6
#2  0x00007fbeba32b589 in vsnprintf () from /lib64/libc.so.6
#3  0x00007fbebb7d1350 in mnstr_printf (s=0xe59f50, format=0x41834a "%*s") at stream.c:5230
#4  0x000000000040551a in SQLrow (len=0xe5e180, numeric=0xe5e1e0, rest=0xe5e1c0, fields=1, trim=2, wm=0 '\000') at mclient.c:733
#5  0x00000000004081fa in SQLrenderer (hdl=0xe5bde0) at mclient.c:1688
#6  0x0000000000408f00 in format_result (mid=0xe5c110, hdl=0xe5bde0, singleinstr=false) at mclient.c:2007
#7  0x0000000000409740 in doFileBulk (mid=0xe5c110, fp=0xe5bea0) at mclient.c:2190
#8  0x0000000000409f77 in doFile (mid=0xe5c110, fp=0xe5bea0, useinserts=false, interactive=false, save_history=0) at mclient.c:2363
#9  0x000000000040dcb0 in main (argc=9, argv=0x7ffc5a7b5b28) at mclient.c:3759
(gdb) bt full
#0  0x00007fbeba3472de in strchrnul () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fbeba2fe5e2 in vfprintf () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fbeba32b589 in vsnprintf () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007fbebb7d1350 in mnstr_printf (s=0xe59f50, format=0x41834a "%*s") at stream.c:5230
        buf = "\000\000\000\066\062\060\061\065\062 \000\000\000\000\000\000\n\000\000\273\276\177\000\000\376\024@\000\000\000\000\000\060t|\273\276\177\000\000\211\223\277\273\276\177\000\000\000\000\000\000\001\000\000\000\310e\235\273\276\177\000\000\001\000\000\000\000\000\000\000\004\276|\273\276\177\000\000\001\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000<T{Z\374\177\000\000P\271\345\000\000\000\000\000\376\037\000\000\000\000\000\000\001\000\000\000\000\000\000\000g\316\346\000\000\000\000\000P\271\345\000\000\000\000\000\200\001\000\000\000\000\000\000\060\000\000\000\000\000\000\000\260\001\000\000\213\000\000\000\240S{Z\374\177\000\000\006", '\000' <repeats 15 times>, "!\254\204\245\003\200\377\377"...
        bf = 0x7ffc5a7b52a0 ""
        i = 0
        bfsz = 512
        ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7ffc5a7b5580, reg_save_area = 0x7ffc5a7b54c0}}
#4  0x000000000040551a in SQLrow (len=0xe5e180, numeric=0xe5e1e0, rest=0xe5e1c0, fields=1, trim=2, wm=0 '\000') at mclient.c:733
        i = 0
        more = false
        first = true
        t = 0x0
        rows = 0
        ulen = 8
        cutafter = 0xe6a880
#5  0x00000000004081fa in SQLrenderer (hdl=0xe5bde0) at mclient.c:1688
        i = 1
        total = 9
        lentotal = 16
        vartotal = 9
        minvartotal = 9
        fields = 1
        rfields = 1
---Type <return> to continue, or q <return> to quit---
        printfields = 1
        max = 1
        graphwaste = 4
        len = 0xe5e180
        hdr = 0xe5e1a0
        numeric = 0xe5e1e0
        rest = 0xe5e1c0
        buf = "\260\065@\000\000\000\000\000\020o\235\273\276\177\000\000\000\000\000\000\000\000\000\000pW{Z\374\177\000\000\260\065@\000\000\000\000\000`R\300\273\276\177\000\000\260V"
        ps = 0
        silent = false
        rows = 277698994
#6  0x0000000000408f00 in format_result (mid=0xe5c110, hdl=0xe5bde0, singleinstr=false) at mclient.c:2007
        rc = -1
        aff = 0
        lid = 15056352
        reply = 0xe5c030 ""
        sqloptimizer = 0
        maloptimizer = 0
        querytime = 0
        rows = 0
        saveFD = 0x0
#7  0x0000000000409740 in doFileBulk (mid=0xe5c110, fp=0xe5bea0) at mclient.c:2190
        buf = 0xe677d0 "..."
        length = 209
        hdl = 0xe5bde0
        rc = 0
        bufsize = 10240
#8  0x0000000000409f77 in doFile (mid=0xe5c110, fp=0xe5bea0, useinserts=false, interactive=false, save_history=0) at mclient.c:2363
        line = 0x0
        buf = 0x0
        length = 4303443
---Type <return> to continue, or q <return> to quit---
        bufsiz = 0
        hdl = 0x0
        rc = 0
        lineno = 1
        prompt = 0x0
        prepno = 0
        rl = {s = 0x7ffc5a7b5860, prompt = 0x4035b0 <_start> "1\355I\211\321^H\211\342H\203\344\360PTI\307\300`\202A",
          buf = 0x7fbeba672640 <_IO_2_1_stdin_> "\230 \255", <incomplete sequence \373>, read = 140721826519328, len = 4208048}
        fd = 0
#9  0x000000000040dcb0 in main (argc=9, argv=0x7ffc5a7b5b28) at mclient.c:3759
        s = 0xe5bea0
        port = 50119
        user = 0x0
        passwd = 0x0
        host = 0x7ffc5a7b749f "lnx1040.ch3.prod.i.com"
        command = 0x0
        dbname = 0x0
        output = 0x0
        fp = 0x0
        trace = false
        dump = false
        useinserts = false
        c = 0
        mid = 0xe5c110
        save_history = 0
        interactive = false
        has_fileargs = false
        option_index = 0
        settz = true
        autocommit = true
        user_set_as_flag = true
        passwd_set_as_flag = false
        long_options = {{name = 0x41ae70 "autocommit", has_arg = 0, flag = 0x0, val = 97}, {name = 0x41ae7b "database", has_arg = 1,
---Type <return> to continue, or q <return> to quit---
            flag = 0x0, val = 100}, {name = 0x41ae84 "dump", has_arg = 0, flag = 0x0, val = 68}, {name = 0x41ae89 "inserts", has_arg = 0,
            flag = 0x0, val = 78}, {name = 0x41ae91 "echo", has_arg = 0, flag = 0x0, val = 101}, {name = 0x41ae96 "encoding", has_arg = 1,
            flag = 0x0, val = 69}, {name = 0x41ae9f "format", has_arg = 1, flag = 0x0, val = 102}, {name = 0x41aea6 "help", has_arg = 0,
            flag = 0x0, val = 63}, {name = 0x41aeab "history", has_arg = 0, flag = 0x0, val = 72}, {name = 0x41aeb3 "host", has_arg = 1,
            flag = 0x0, val = 104}, {name = 0x41aeb8 "interactive", has_arg = 0, flag = 0x0, val = 105}, {name = 0x41aec4 "timer",
            has_arg = 1, flag = 0x0, val = 116}, {name = 0x41aeca "language", has_arg = 1, flag = 0x0, val = 108}, {name = 0x41aed3 "log",
            has_arg = 1, flag = 0x0, val = 76}, {name = 0x41aed7 "null", has_arg = 1, flag = 0x0, val = 110}, {name = 0x418b90 "pager",
            has_arg = 1, flag = 0x0, val = 124}, {name = 0x41aedc "port", has_arg = 1, flag = 0x0, val = 112}, {name = 0x41aee1 "rows",
            has_arg = 1, flag = 0x0, val = 114}, {name = 0x41aee6 "statement", has_arg = 1, flag = 0x0, val = 115}, {name = 0x41a8f9 "user",
            has_arg = 1, flag = 0x0, val = 117}, {name = 0x41aef0 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x41aef8 "width",
            has_arg = 1, flag = 0x0, val = 119}, {name = 0x41aefe "Xdebug", has_arg = 0, flag = 0x0, val = 88}, {name = 0x41887a "timezone",
            has_arg = 0, flag = 0x0, val = 122}, {name = 0x41af05 "allow-remote", has_arg = 0, flag = 0x0, val = 82}, {name = 0x0,
            has_arg = 0, flag = 0x0, val = 0}}
        priv = {f = 0x0, buf = 0x0}
(gdb)


(gdb) bt
#0  0x00007fbebb9e0dd4 in read_line (mid=0xe5c110) at mapi.c:3494
#1  0x00007fbebb9e3792 in read_into_cache (hdl=0xe5bde0, lookahead=1) at mapi.c:4152
#2  0x00007fbebb9e4e4b in mapi_fetch_line_internal (hdl=0xe5bde0) at mapi.c:4577
#3  0x00007fbebb9e4f8f in mapi_fetch_line (hdl=0xe5bde0) at mapi.c:4604
#4  0x0000000000403d5b in fetch_line (hdl=0xe5bde0) at mclient.c:334
#5  0x0000000000403df5 in fetch_row (hdl=0xe5bde0) at mclient.c:351
#6  0x000000000040820f in SQLrenderer (hdl=0xe5bde0) at mclient.c:1643
#7  0x0000000000408f00 in format_result (mid=0xe5c110, hdl=0xe5bde0, singleinstr=false) at mclient.c:2007
#8  0x0000000000409740 in doFileBulk (mid=0xe5c110, fp=0xe5bea0) at mclient.c:2190
#9  0x0000000000409f77 in doFile (mid=0xe5c110, fp=0xe5bea0, useinserts=false, interactive=false, save_history=0) at mclient.c:2363
#10 0x000000000040dcb0 in main (argc=9, argv=0x7ffc5a7b5b28) at mclient.c:3759


Breakpoint 1, mnstr_printf (s=0xe59f50, format=0x418347 "%c") at stream.c:5221
5221            char buf[512], *bf = buf;
(gdb) bt full
#0  mnstr_printf (s=0xe59f50, format=0x418347 "%c") at stream.c:5221
        buf = "|\n\000\067\063\066\063\065\071 \000\000\000\000\000\000\n\000\000\273\276\177\000\000\376\024@\000\000\000\000\000\060t|\273\276\177\000\000\211\223\277\273\276\177\000\000\000\000\000\000\001\000\000\000\310e\235\273\276\177\000\000\001\000\000\000\000\000\000\000\004\276|\273\276\177\000\000\001\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000<T{Z\374\177\000\000P\271\345\000\000\000\000\000\376\037\000\000\000\000\000\000\001\000\000\000\000\000\000\000g\316\346\000\000\000\000\000P\271\345\000\000\000\000\000\200\001\000\000\000\000\000\000\060\000\000\000\000\000\000\000\260\001\000\000\213\000\000\000\240S{Z\374\177\000\000\006", '\000' <repeats 15 times>, "!\254\204\245\003\200\377\377"...
        bf = 0x7fbeba33687c <malloc+76> "H\205\333H\211\302uTH\205\333t\034\203=d\a4"
        i = 0
        bfsz = 3819052484010180608
        ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7ffc5a7b5580, reg_save_area = 0x7ffc5a7b54c0}}
#1  0x00000000004054c0 in SQLrow (len=0xe5e180, numeric=0xe5e1e0, rest=0xe5e1c0, fields=1, trim=2, wm=0 '\000') at mclient.c:730
        i = 0
        more = false
        first = true
        t = 0x0
        rows = 0
        ulen = 8
        cutafter = 0xe6acc0
#2  0x00000000004081fa in SQLrenderer (hdl=0xe5bde0) at mclient.c:1688
        i = 1
        total = 9
        lentotal = 16
        vartotal = 9
        minvartotal = 9
        fields = 1
        rfields = 1
        printfields = 1
        max = 1
        graphwaste = 4
        len = 0xe5e180
        hdr = 0xe5e1a0
        numeric = 0xe5e1e0
---Type <return> to continue, or q <return> to quit---
        rest = 0xe5e1c0
        buf = "\260\065@\000\000\000\000\000\020o\235\273\276\177\000\000\000\000\000\000\000\000\000\000pW{Z\374\177\000\000\260\065@\000\000\000\000\000`R\300\273\276\177\000\000\260V"
        ps = 0
        silent = false
        rows = 280815201
#3  0x0000000000408f00 in format_result (mid=0xe5c110, hdl=0xe5bde0, singleinstr=false) at mclient.c:2007
        rc = -1
        aff = 0
        lid = 15056352
        reply = 0xe5c030 ""
        sqloptimizer = 0
        maloptimizer = 0
        querytime = 0
        rows = 0
        saveFD = 0x0
#4  0x0000000000409740 in doFileBulk (mid=0xe5c110, fp=0xe5bea0) at mclient.c:2190
        buf = 0xe677d0 "..."
        length = 209
        hdl = 0xe5bde0
        rc = 0
        bufsize = 10240
#5  0x0000000000409f77 in doFile (mid=0xe5c110, fp=0xe5bea0, useinserts=false, interactive=false, save_history=0) at mclient.c:2363
        line = 0x0
        buf = 0x0
        length = 4303443
        bufsiz = 0
        hdl = 0x0
        rc = 0
        lineno = 1
        prompt = 0x0
        prepno = 0
---Type <return> to continue, or q <return> to quit---
        rl = {s = 0x7ffc5a7b5860, prompt = 0x4035b0 <_start> "1\355I\211\321^H\211\342H\203\344\360PTI\307\300`\202A",
          buf = 0x7fbeba672640 <_IO_2_1_stdin_> "\230 \255", <incomplete sequence \373>, read = 140721826519328, len = 4208048}
        fd = 0
#6  0x000000000040dcb0 in main (argc=9, argv=0x7ffc5a7b5b28) at mclient.c:3759
        s = 0xe5bea0
        port = 50119
        user = 0x0
        passwd = 0x0
        host = 0x7ffc5a7b749f "lnx1040.ch3.prod.i.com"
        command = 0x0
        dbname = 0x0
        output = 0x0
        fp = 0x0
        trace = false
        dump = false
        useinserts = false
        c = 0
        mid = 0xe5c110
        save_history = 0
        interactive = false
        has_fileargs = false
        option_index = 0
        settz = true
        autocommit = true
        user_set_as_flag = true
        passwd_set_as_flag = false
        long_options = {{name = 0x41ae70 "autocommit", has_arg = 0, flag = 0x0, val = 97}, {name = 0x41ae7b "database", has_arg = 1,
            flag = 0x0, val = 100}, {name = 0x41ae84 "dump", has_arg = 0, flag = 0x0, val = 68}, {name = 0x41ae89 "inserts", has_arg = 0,
            flag = 0x0, val = 78}, {name = 0x41ae91 "echo", has_arg = 0, flag = 0x0, val = 101}, {name = 0x41ae96 "encoding", has_arg = 1,
            flag = 0x0, val = 69}, {name = 0x41ae9f "format", has_arg = 1, flag = 0x0, val = 102}, {name = 0x41aea6 "help", has_arg = 0,
            flag = 0x0, val = 63}, {name = 0x41aeab "history", has_arg = 0, flag = 0x0, val = 72}, {name = 0x41aeb3 "host", has_arg = 1,
            flag = 0x0, val = 104}, {name = 0x41aeb8 "interactive", has_arg = 0, flag = 0x0, val = 105}, {name = 0x41aec4 "timer",
            has_arg = 1, flag = 0x0, val = 116}, {name = 0x41aeca "language", has_arg = 1, flag = 0x0, val = 108}, {name = 0x41aed3 "log",
---Type <return> to continue, or q <return> to quit---
            has_arg = 1, flag = 0x0, val = 76}, {name = 0x41aed7 "null", has_arg = 1, flag = 0x0, val = 110}, {name = 0x418b90 "pager",
            has_arg = 1, flag = 0x0, val = 124}, {name = 0x41aedc "port", has_arg = 1, flag = 0x0, val = 112}, {name = 0x41aee1 "rows",
            has_arg = 1, flag = 0x0, val = 114}, {name = 0x41aee6 "statement", has_arg = 1, flag = 0x0, val = 115}, {name = 0x41a8f9 "user",
            has_arg = 1, flag = 0x0, val = 117}, {name = 0x41aef0 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x41aef8 "width",
            has_arg = 1, flag = 0x0, val = 119}, {name = 0x41aefe "Xdebug", has_arg = 0, flag = 0x0, val = 88}, {name = 0x41887a "timezone",
            has_arg = 0, flag = 0x0, val = 122}, {name = 0x41af05 "allow-remote", has_arg = 0, flag = 0x0, val = 82}, {name = 0x0,
            has_arg = 0, flag = 0x0, val = 0}}
        priv = {f = 0x0, buf = 0x0}
(gdb)
(gdb)


Maybe mnstr_printf()is called for every tuple rendered ? 
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00007fbebb7d12ba in mnstr_printf at stream.c:5221
        breakpoint already hit 1549255 times
        continue
(gdb)

System Information
==================
$ mserver5 --version
MonetDB 5 server 11.33.3 (Apr2019) (64-bit, 128-bit integers)
Copyright (c) 1993 - July 2008 CWI
Copyright (c) August 2008 - 2019 MonetDB B.V., all rights reserved
Visit https://www.monetdb.org/ for further information
Found 2015.5GiB available memory, 128 available cpu cores
Libraries:
  libpcre: 8.32 2012-11-30
  openssl: OpenSSL 1.0.1e-fips 11 Feb 2013
  libxml2: 2.9.1
Compiled by: arshad (x86_64-pc-linux-gnu)
Compilation: gcc -std=gnu99  -g3
Linking    : /usr/bin/ld -m elf_x86_64  -Wl,-Bsymbolic-functions


$ uname -a
Linux hostname 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
You have mail in /var/spool/mail/monetadmin
s

$ ]$ cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)





Reproducible: Always

Steps to Reproduce:
1. mclient -h hostname  -p 5011 -u USER  -d DB < ./atest.sql (SQL pulling upto 288 - 300 million tuples)



Expected Results:  
Should show improved performance.
Comment 1 Martin van Dinther cwiconfidential 2019-09-19 18:18:59 CEST
Hi Arshad,

The performance is limited by the way the data is serialized and transferred across the network. Our current implementation is using mapi (protocol version 9) which serializes all data (also numeric/date/binary data) to strings and combines field values into a formatted row string.
This is far from optimal for large amount of data set, as in your case.
We are aware of this limitation.

Some research and development of a new protocol (internally called PROTOCOL_10) has been done, but its not completed, nor planned.

As a workaround you could do:
- use mclient on the server to pull the data from the server (without going over the network) into a (optionally compressed) file and ftp the file.
See also: https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/ExportingBulkData
You can also do this if you want to load large amounts of data into the database. See: https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/LoadingBulkData

Sometimes people transfer large amounts of data as they want to do some client side data processing on it and then store the results back into the database. If that is the case, you should consider to use a User Defined Function (or Procedure or Aggregate), which can do the processing in the database server. We support UDFs written in PL/SQL, C, C++, Python and R.
See: https://www.monetdb.org/Documentation/Manuals/SQLreference/Functions


FYI: mclient is not using ODBC, but mapi (MonetDB client/server communication API). The MonetDB ODBC driver is also using mapi.