Any chance the Python client could be modified to
return strings in Unicode instead of UTF-8?
Seems like a better fit for Python; for example,
that's how Python's sqlite3 module behaves.
An alternative approach is to expose a set_encoding
method on the connection that let's you specify the
encoding of all text fields returned.
If backward compatibility is an issue, the latter idea
seems better even though it adds a knob.
Any comments?
(Is the maintainer of the Python adapter active even
on this list? I didn't see many Python-specific posts
in the archives.)
Thanks,
m
Darn, just when I decided to announce my port of the monetdb-ruby-sql
gem to xquery, you release a fix of your own :P Anyway it's here:
http://github.com/d-snp/ruby-monetdb-xquery
There's still some differences between the official one and mine, I
think mine handles errors a bit more gracefully, but it could be I
missed something. I will look at the other changes as soon as possible
to see if mine has any odd behaviour.
In the following weeks I will be working on making XQuery and MonetDB
work on DataMapper (since DataMapper is a nicer framework than
ActiveRecord, and also ActiveRecord depends heavily on SQL) If anyone
has any comments on this or any questions or would like to help out
then please let me hear. I plan on putting the datamapper
implementation on github too somewhere this week.
Kind regards,
Tinco Andringa
I'm seeing some behavior that makes me
wonder if mserver5 leaks.
I'm using Ubuntu 9.04 w/ 32-bit i386 and 500 MB
of RAM. This is 2009Nov CVS.
Does this sequence of steps and memory usage
look right?
1. Reboot the server and start merovingian
0m -- mserver5 RSS (not running)
417m -- free memory reported by top
78m -- used
5m -- buffers
2. Destroy then create database
15m -- mserver5 RSS
392m -- free memory reported by top
3. Add tables to database
21m -- mserver5 RSS
378m -- free memory reported by top
4. Import a small table (38,000 rows, autoincr int + varchar(1024))
29m -- mserver5 RSS
350m -- free memory reported by top
5. Repeat steps 2 through 4 three times.
29m -- mserver5 RSS
!!! 225m -- free memory reported by top
6. Shut down merovingian
!!! 251m -- free memory reported by top
242m -- used
136m -- buffers
Why is top only showing 251m of free memory?
Yesterday, the system started to swap when free
memory, as reported by top, was exhausted.
And I bet I could get it to swap if I ran lots more
iterations in step 5 above.
I know that having free close to zero is a good thing,
as it means you are using your RAM, but that
swapping is never good on a database server. :)
I'm not using the server for anything else at the
moment, other than SSH and transferring data
from SQLite to MonetDB via mclient and files of
SQL commands.
As a related question, what is the minimum RAM
need to run MonetDB? I'd like to get it running on
an old slow computer running OpenBSD to check
how well Monet handles memory pressure and to
turn on OpenBSD's paranoid malloc options. Are
there some knobs I can turn to make monetdb5
run when there is only 100MB of free memory after
the OS starts up?
That swapping yesterday got me a nervous about
stability, especially since stopping and restarting
merovingian didn't give the memory back.
Thanks,
m
Hi,
I've been using CVS HEAD to test against, but now I'm wondering
if I should switch the Nov2009 branch.
I'm going to use Monet DB in production for an in-house project,
so I would prefer a reasonable level of stability.
Which source tree should I track?
Thanks!
m
Dear developers,
I would like to propose a change in GDK and hear opinions. It is about
the following issue:
in the BATjoin code, if there is no possibility to do a fetch or merge
join, a hashjoin is performed. A hashtable is created for the smallest
BAT. The reasons (i could think of) for choosing the smallest BAT for
the hashtable are that less space is required for the hashtable (which
in turn causes less cache misses when doing a lookup) and also because
the hashfunction used is assumed to be very inexpensive (it needs to
be calculated for each item in the large bat each time a join is
performed).
I can see that the hashfunction can be very efficient for data types
without indirection, but I feel that for data types like strings in
some cases this is a little different. If a string BAT for example
contains many different values (i.e. is not a bat which contains
enumeration values) the hashfunction will not be inexpensive anymore
(many cache misses), as each hashfunction call needs to hash a whole
(arbitrary long) string at an arbitrary place in the heap.
Is it perhaps possible to specify that, when a BAT of type 'str' has
many different values a hashtable may be build on the large BAT
instead of on the small BAT?
Reason that I ask this: I was analysing costs of a query in which I
had a few short strings (26 tuples, 1-column table: varchar) which I
wanted to look up in a dictionary (9M tuples, 2-column table:
int,varchar). "SELECT a.id FROM longlist AS a JOIN smalllist as b ON
a.strvalue=b.strvalue;"
The result is a small list of integers (26 or less tuples). This
operation currently takes roughly 1.5 seconds for a hot run, mostly
due to 9M strHash operations. By applying the patch below the
execution time for a hot run dropped down to .01 seconds. The
performance gain is caused by only having to perform strHash on the
items in the small bat once the hashtable for the large bat has been
created.
Any suggestions whether such a change is useful? Which benchmarks will
be influenced?
I guess this code change is probably not useful for large string BATs
with only few different values, but perhaps a guess could be made how
diverse the strings in a bat are (by taking a sample or perhaps simply
by looking at the ratio batsize/heapsize), and based on that determine
whether to build it on the large or small BAT?
Greetings,
Wouter
Index: src/gdk/gdk_relop.mx
===================================================================
RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_relop.mx,v
retrieving revision 1.167.2.4
diff -u -r1.167.2.4 gdk_relop.mx
--- src/gdk/gdk_relop.mx 20 Nov 2009 13:04:06 -0000 1.167.2.4
+++ src/gdk/gdk_relop.mx 18 Dec 2009 14:59:13 -0000
@@ -1232,7 +1232,12 @@
@-
hash join: the bread&butter join of monet
@c
- /* Simple rule, always build hash on the smallest */
+ /* Simple rule, always build hash on the smallest,
+ except when it is a string-join, then we do the opposite */
+ if (swap && rcount < lcount && l->ttype == TYPE_str) {
+ ALGODEBUG THRprintf(GDKout, "#BATjoin:
BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n",
estimate);
+ return BATmirror(BAThashjoin(BATmirror(r),
BATmirror(l), estimate));
+ }
if (swap && rcount > lcount) {
ALGODEBUG THRprintf(GDKout, "#BATjoin:
BATmirror(BAThashjoin(BATmirror(r), BATmirror(l)," BUNFMT "));\n",
estimate);
When I try to fill a database with the COPY INTO command, the data will
(depending on the file used) either go:
- Fully into Memory.
- To HDD.
- A combination of Memory and HDD space.
Is there a way to configure MonetDB so that it always uses Memory only?
Thanks in advance.
--
View this message in context: http://old.nabble.com/Memory-use-tp26843876p26843876.html
Sent from the monetdb-developers mailing list archive at Nabble.com.
The MonetDB team at CWI/MonetDB BV is pleased to announce the
Nov2009-SP1 bug fix release of the MonetDB suite of programs.
More information on this release will be available at
<http://monetdb.cwi.nl/Development/Releases/Nov2009/>.
Fixes include:
- The Windows installers for MonetDB/SQL in the Nov2009 release were
incomplete. The missing files have been added.
- Fixed performance issue with loading complex schemas in readonly
mode in MonetDB/SQL.
- Fixed performance issue with grouping in MonetDB/SQL when one of
the columns is sorted.
- Fixed problems with LIKE joins in MonetDB/SQL.
- Load all files in a multi-file COPY INTO query.
- Fix error reporting of unexpected input in XQuery queries to be
Unicode aware.
- Correctly handle circular dependencies in FD property check. (SF
bug #2908615.)
- Fixes throughout the code base to better cope with unexpected
situations (such as memory shortage).
--
Sjoerd Mullender
The lines
+# else
+# error "don't know how to get the amount of physical memory for your OS"
+# endif
will break on Windows.
Thanks,
Ren Zou
Fabian wrote:
> Update of /cvsroot/monetdb/MonetDB/src/gdk
> In directory sfp-cvsdas-1.v30.ch3.sourceforge.com:/tmp/cvs-serv17611
>
> Modified Files:
> gdk_system.mx
> Log Message:
> If there is a limit get on the memory usage, take it into account, such that the GDK kernel can make the proper decisions. Indented ifdefs to make nesting structure clear. Added comment on probably obsolete piece of code.
>
> Index: gdk_system.mx
> ===================================================================
> RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_system.mx,v
> retrieving revision 1.120
> retrieving revision 1.121
> diff -u -d -r1.120 -r1.121
> --- gdk_system.mx 4 Nov 2009 16:49:24 -0000 1.120
> +++ gdk_system.mx 15 Dec 2009 12:38:37 -0000 1.121
> @@ -339,9 +339,9 @@
> _MT_pagesize = sysInfo.dwPageSize;
> }
> #else
> -#if defined(HAVE_SYSCONF) && defined(_SC_PAGESIZE)
> +# if defined(HAVE_SYSCONF) && defined(_SC_PAGESIZE)
> _MT_pagesize = sysconf(_SC_PAGESIZE);
> -#endif
> +# endif
> #endif
> if (_MT_pagesize <= 0)
> _MT_pagesize = 4096; /* default */
> @@ -366,14 +366,40 @@
> #if defined(HAVE_SYSCONF) && defined(_SC_PHYS_PAGES)
> _MT_npages = sysconf(_SC_PHYS_PAGES);
> #else
> -#ifdef HAVE_GETRLIMIT
> +# if defined(HAVE_GETRLIMIT) && defined(RLIMIT_RSS)
> {
> struct rlimit rl;
>
> + /* Specifies the limit (in pages) of the process's resident set
> + * (the number of virtual pages resident in RAM). This limit
> + * only has effect in Linux 2.4.x, x < 30, and there only
> + * affects calls to madvise() specifying MADV_WILLNEED */
> + /* FIXME: this looks like a total wrong thing to check in any
> + * case to me */
> getrlimit(RLIMIT_RSS, &rl);
> _MT_npages = rl.rlim_cur / _MT_pagesize;
> }
> +# else
> +# error "don't know how to get the amount of physical memory for your OS"
> +# endif
> #endif
> +#ifdef HAVE_GETRLIMIT
> + {
> + struct rlimit rl;
> + size_t memlim;
> +
> + /* The environment can be limited memory wise. In such case the
> + * physically available memory, is not necessarily what we can
> + * also use. */
> + getrlimit(RLIMIT_DATA, &rl);
> + if (rl.rlim_cur != (rlim_t)RLIM_INFINITY) {
> + /* rlimit returns in bytes, recalculate */
> + memlim = rl.rlim_cur / _MT_pagesize;
> + /* if it's more restrictive, take that as value */
> + if (memlim < _MT_npages)
> + _MT_npages = memlim;
> + }
> + }
> #endif
> }
>
>
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> _______________________________________________
> Monetdb-checkins mailing list
> Monetdb-checkins(a)lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-checkins
>
>
On Thu, Nov 12, 2009 at 09:06:23AM +0000, Fabian wrote:
> Update of /cvsroot/monetdb/MonetDB
> In directory 23jxhf1.ch3.sourceforge.com:/tmp/cvs-serv25980
>
> Modified Files:
> Tag: Nov2009
> configure.ag
> Log Message:
> Yippy yay for messing with this at this stage of the development cycle... Revert my yesterday's commit here as it not only breaks compilation on OSX 10.6 (the original problem), but also Solaris 10. We need to facilitate people doing cross-compilation in some other way.
Might simply using AC_CHECK_FUNCS(fdatasync) be an option, here?
Stefan
> Index: configure.ag
> ===================================================================
> RCS file: /cvsroot/monetdb/MonetDB/configure.ag,v
> retrieving revision 1.208.2.2
> retrieving revision 1.208.2.3
> diff -u -d -r1.208.2.2 -r1.208.2.3
> --- configure.ag 11 Nov 2009 11:17:52 -0000 1.208.2.2
> +++ configure.ag 12 Nov 2009 09:06:21 -0000 1.208.2.3
> @@ -94,17 +94,15 @@
>
> # Checks for library functions.
>
> -# OSX 10.6 (Snow Leopard) somehow makes configure believe that fdatasync
> -# exists, in reality however, it does not on this platform.
> -AC_CACHE_CHECK([for fdatasync],[ac_cv_func_fdatasync],[
> - AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
> +# Fix for OSX 10.6 (Snow Leopard) stolen from:
> +# http://blogs.sun.com/elambert/entry/drizzle_in_the_snow_how
> +AC_RUN_IFELSE([AC_LANG_PROGRAM([[
> #include <unistd.h>
> ]],[[
> fdatasync(4);
> ]])],
> [ac_cv_func_fdatasync=yes],
> [ac_cv_func_fdatasync=no])
> -])
> AS_IF([test "x${ac_cv_func_fdatasync}" = "xyes"],
> [AC_DEFINE([HAVE_FDATASYNC],[1],[If the system has a working fdatasync])])
>
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Monetdb-checkins mailing list
> Monetdb-checkins(a)lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-checkins
>
>
--
| Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl |
| CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ |
| 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 |
| The Netherlands | Fax : +31 (20) 592-4312 |
> Update of /cvsroot/monetdb/MonetDB5/src/optimizer
> In directory
> sfp-cvsdas-1.v30.ch3.sourceforge.com:/tmp/cvs-serv5440/MonetDB5/src/optimizer
>
> Modified Files:
> opt_recycler.mx
> Log Message:
>
> recycleSeq is stored in MalBlkRecord.recid;
> since the latter is of type int,
> it does not make much sense
> to have the former of type lng.
>
> Or should both be of type lng?
Thanks, the intention was they all to be of type lng.
Milena
>
> And what about QryStat.recid?
> That is (currently) of type lng
> and used to store MalBlkRecord.recid
> ...
>
> (found by Microsoft's Visual Studio compiler on Windows)
>
>
> Index: opt_recycler.mx
> ===================================================================
> RCS file: /cvsroot/monetdb/MonetDB5/src/optimizer/opt_recycler.mx,v
> retrieving revision 1.59
> retrieving revision 1.60
> diff -u -d -r1.59 -r1.60
> --- opt_recycler.mx 2 Dec 2009 13:39:29 -0000 1.59
> +++ opt_recycler.mx 10 Dec 2009 22:05:27 -0000 1.60
> @@ -104,7 +104,7 @@
> #include "opt_recycler.h"
> #include "mal_instruction.h"
>
> -static lng recycleSeq = 0;
> +static int recycleSeq = 0;
>
> static int
> OPTrecycleImplementation(Client cntxt, MalBlkPtr mb, MalStkPtr stk,
> InstrPtr p)
>
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> _______________________________________________
> Monetdb-checkins mailing list
> Monetdb-checkins(a)lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/monetdb-checkins
>