The MonetDB team at CWI/MonetDB BV is pleased to announce the
Aug2011-SP1 bugfix release of the MonetDB suite of programs.
More information about MonetDB can be found on our website at
http://www.monetdb.org/.
For details on this release, please see the release notes at
<http://www.monetdb.org/Downloads/ReleaseNotes>.
As usual, the download location is <http://dev.monetdb.org/downloads/>.
Aug 2011-SP1 bugfix release
SQL * Fixed a bug when using default values for interval
columns. Bug 2877.
* A bug was fixed where deleted rows weren't properly
accounted for in all operations. This was bug 2882.
* A bug was fixed which caused an update to an internal
table to happen too soon. The bug could be observed on
a multicore system with a query INSERT INTO t (SELECT
* FROM t) when the table t is "large enough". This was
bug 2883.
Client Package * Fixed Perl DBD rowcount for larger results, bug #2889
* Perl: We now distinguish properly between TABLE and
GLOBAL TEMPORARY (the latter are recognized by being
in the "tmp" schema).
* Perl: fixed a bunch of syntax errors. This fixes bug
2884. With thanks to Rémy Chibois.
* Perl: Fixed DBD::monetdb table_info and
tabletype_info. This fixes bug 2885. With thanks to
Rémy Chibois.
* mclient: fix display of varchar columns with only NULL
values.
* Fixed a bug in mclient/msqldump where an internal
error occurred during dump when there are GLOBAL
TEMPORARY tables.
MonetDB5 Server * Fixed a problem where MAL variables weren't properly
cleared before reuse of the data strucutre. This
problem could cause the data flow scheduler to
generate dependencies between instructions that didn't
actually exist, which in turn could cause circular
dependencies among instructions with deadlock as a
result. Bugs 2865 and 2888.
Bug Fixes * 2865: mserver5 --set gdk_nr_threads=2 --forcemito:
deadlock during first SQL client connect on virgin
(empty) DB
* 2877: assertion failed when creating table with
default interval
* 2879: A SELECT query fails to produce some results
* 2882: Delete of some records don't work on Monetdb
* 2883: insert into t (select * from t) bogus in
multithreaded execution
* 2884: Mapi.pm and GetInfo.pm syntax fixes for array
usage
* 2885: DBD::monetdb table_info error
* 2887: Division by zero error, when should not divide
at 0
* 2888: test/null/null_in_case.sql hangs server
* 2889: Mapi::getBlock is erasing query count on second
call
Roberto,
we were also puzzled with which syntax to adopt. The first issue is that the
syntax is also dictating in a way implementation (although that sounds not
correct:)). A syntax that has a order by rand() implies that you are
attaching random numbers to the rows and then sort on that. This sorting is
very coslty and it is the worst case since you have random order to begin
with. Something like that would cost NlogN where N is the size of rows.
Moreover, that syntax implies that you are sampling the base table (or some
intermediate result anyway) and then "continue" the query evaluation.
however sampling is not a very kind operator and cannot by easily pushed
down: a sample of a join is not equal with the join of 2 samples. These are
in my view the implications (or benefits!) of such a syntax. So MonetDB
could as well support the syntax you propose and should!
On the other hand, the syntax SAMPLE we just introduce, is applied solely on
the final result set of the query, after or query evaluation is done. The
algorithm has a complexity of S, where S is the size of the sample (as
opposed to NlogN of the previous). Also , the sample operator has the same
semantics of limits, it is an operator that shows a limited (random) portion
of the entire (usually computed) result. that means that all joins,
predicated etc. have been evaluated before a sample is taken. Of course we
would love to have that functionality in sub-queries too, hence the
workaround. What is really missing now is a function where the table
definition is a parameter, indeed!
With this 2 views in mind, I would suggest that both syntax are necessary to
exist! Each of them serving a different purpose (and different underline
implementation eventually).
the other solution would be to allow SAMPLE is subqueries and "depart" from
the SQL spirit.
What do you think?
lefteris
On Mon, Sep 26, 2011 at 10:41 AM, Roberto Cornacchia <
roberto.cornacchia(a)gmail.com> wrote:
> Lefteris, I would like to add something about the syntax (and semantics) of
> this feature.
> See below
>
>
>> +/*
>> + * @- Uniform Sampling.
>> + *
>> + * A new SQL operator has been added to support sampling the result of a
>> query.
>> + * The syntax for sampling is:
>> + * SELECT ... FROM ... WHERE ... SAMPLE s
>> + *
>> + * where s if is an integer greater than 1, it defines the number of rows
>> to be
>> + * in the sample. If s is a double between [0.0,1.0] the it refers to the
>> + * percentage of the result to be sampled. That is if s=0.3 then the
>> sample
>> + * will be 30% the size of the query result.
>> + *
>>
>
> It has been discussed already whether this choice or an explicit PERCENT is
> more intuitive.
> Personally, I don't care much about that.
>
>
>> + * SAMPLE is been treated as LIMIT, ORDER BY, etc., that means that it
>> can only
>> + * be in the outer most SELECT clause, i.e., SAMPLE cannot appear in a
>> + * subquery.
>
>
> This is much more interesting to me.
> I see that SAMPLE fits in the same semantics as LIMIT. In fact, you can
> think of it as a special LIMIT.
> What I don't like about it, is that it then shares the same limitations
> (which are absurd, as the whole SQL often is, from my point of view), which
> is what you mention above about the subquery.
> You suggest a workaround here:
>
> However, if this is needed, then one may define a function, for
>> + * example
>> + *
>> + * CREATE FUNCTION mysample ()
>> + * RETURNS TABLE(col a,...)
>> + * BEGIN
>> + * RETURN
>> + * SELECT a,...
>> + * FROM name_table
>> + * SAMPLE 100;
>> + * end;
>> + *
>> + * and then use function mysample() for example to populate a new table
>> with
>> + * the sample. E.g.,
>> + *
>> + * INSERT INTO sample_table (SELECT * FROM mysample());
>> + *
>> + *
>
>
> To me, writing a function with an hard-coded table name in it is... weird!
> Like writing a function that can only do 2+2.
> (This could be much better if table identifiers were allowed as function
> parameters, a feature that I'd love to see implemented!).
>
> To this, I much prefer a solution based on the window functions RANK()
> OVER() and ROW_NUMBER() OVER() which luckily are implemented in MonetDB,
> and the function rand().
>
> Your example above becomes a regular query that has no nesting-issue:
>
> SELECT a,b,c
> FROM (
> SELECT a,b,c, ROW_NUMBER() OVER(ORDER BY rand()) as my_rank
> FROM name_table
> ) as x
> where x.my_rank <= 100;
>
> Replace ROW_NUMBER() with RANK() to allow duplicates in the random sample.
>
> I'm not comparing the efficiency of the two solutions here, your solution
> could well be more efficient.
> I'm only comparing them from a usability point of view.
>
> I noticed that my own usage of the LIMIT clause is becoming less and less
> frequent (in favour of the syntax I report here) because of the nesting
> issue.
> What I am wondering is, how much useful is to introduce a new syntax for
> the sampling that has this (important, in my view) limitation? How often
> would you need a sample in the outermost select clause only?
>
> Cheers,
> Roberto
>
>
> _______________________________________________
> Checkin-list mailing list
> Checkin-list(a)monetdb.org
> http://mail.monetdb.org/mailman/listinfo/checkin-list
>
>
Hi all,
You'll find attached an archive containing two portfiles for compiling and installing MonetDB using the MacPorts system.
Can you please have a look at it ?
In order to use these portfiles, please refer to:
http://guide.macports.org/#development.local-repositories
If this is of interest for someone, do you want me to submit this port to the MacPorts Project ?
Regards,
--
Rémy Chibois
The MonetDB team at CWI/MonetDB BV is pleased to announce the Aug2011
feature release of the MonetDB suite of programs.
The default width for OID columns in the database is now 64 bits for all
64-bit architecture. In releases up to this release, the default width
on 64-bit Windows was still 32 bits. This means that with the current
release it is possible to have tables with more than about 2 million
rows even on Windows (as long as you use a 64-bit build).
More information (including release notes) on this release is available
at <http://www.monetdb.org/Downloads/ReleaseNotes>.
As usual, the download location is <http://dev.monetdb.org/downloads/>.
--
Sjoerd Mullender