Possible bug in pcre.c

Roberto Cornacchia roberto.cornacchia at gmail.com
Mon Apr 8 17:15:28 CEST 2019


My bad, I rushed into my conclusions.


On Mon, 8 Apr 2019 at 16:47, Sjoerd Mullender <sjoerd at monetdb.org> wrote:

> It's complicated stuff, but I think the code is right.
>
> The implementation uses two nested for loops.  The outer loop iterates
> over the right input (since the right input is column of regular
> expressions, the expressions only get compiled once and matched multiple
> times).  The inner loop iterates over the left input.  (This is
> different from most other implementation of join that iterate over the
> left input first.)  Because of this, we know the right output is sorted.
>
> lastl is de last inserted vale in the left output.  It is used to
> maintain the dense and sorted flags of the left output (tseqbase,
> tsorted, and trevsorted).
>
> BATcount(r1) > 0, so this means we've already seen a match, which in
> turn means that we've been past the assignment "lastl = lo;" just past
> the if statement.  Note that there is no other assignment to lastl (just
> the initialization to 0 at the top of the function).
>
> nl == 0 means that for the current value of the right input (outermost
> for loop) we haven't yet inserted any matches, but since we're executing
> where we are, there are matches for that current value and one or more
> values in the left input.  Note that nl is set to 0 inside the outer for
> loop, and gets incremented inside the inner for loop for each match that
> was inserted.
>
> So, we are now looking at a new value in the right column that has
> matched a value in the left column.  There are three possibilities: the
> left input is smaller than, equal to, or larger than the last value for
> the left input that was inserted into the left output.
>
> If smaller, the left output is not sorted (smaller value follows
> larger).  Also, we don't know about uniqueness anymore (we only maintain
> that if the left output is strictly ascending).
> If equal, the left output is definitely not unique.
> If larger, the left output is not reverse sorted (larger value follows
> smaller).
>
> Since we're inserting an output for a new iteration of the right input,
> we know that the right output is not reverse sorted.  If nl were greater
> than 0, we're inserting a subsequent match of the right input, so the
> right input could still be reverse sorted, though not unique anymore.
>
> I hope this clears things up.
>
> On 08/04/2019 15.27, Roberto Cornacchia wrote:
> > Hi,
> >
> > I was looking at this piece of code (end of pcrejoin loop), and I have
> > the feeling something isn't right.
> > Inside the body of
> >   if (nl == 0)
> > a condition over lastlis checked.
> >
> > However, it seems to me that lastl gets a useful value only when nl > 0.
> > If (nl == 0) then lastl must be 0.
> >
> >
> > <loop> {
> >                         /* actual body */
> >                         ....
> >
> >
> >                         /* track result properties */
> > if (BATcount(r1) > 0) {
> > if (lastl + 1 != lo)
> > r1->tseqbase = oid_nil;
> > if (nl == 0) {
> > r2->trevsorted = 0;
> > if (lastl > lo) {
> > r1->tsorted = 0;
> > r1->tkey = 0;
> > } else if (lastl < lo) {
> > r1->trevsorted = 0;
> > } else {
> > r1->tkey = 0;
> > }
> > }
> > }
> > APPEND(r1, lo);
> > APPEND(r2, ro);
> > lastl = lo;
> > nl++;
> > }
> >
> >
> > _______________________________________________
> > developers-list mailing list
> > developers-list at monetdb.org
> > https://www.monetdb.org/mailman/listinfo/developers-list
> >
>
>
> --
> Sjoerd Mullender
>
> _______________________________________________
> developers-list mailing list
> developers-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.monetdb.org/pipermail/developers-list/attachments/20190408/e6d3000d/attachment.html>


More information about the developers-list mailing list