Can someone please provide more concrete information on this? Is it known to work?

Say I have a master and two worker nodes.

I have a usr table on master and remote versions of this table on each worker:

sql>select * from usr;

+------+-------+

| id | name |

+======+=======+

| 1 | kelly |

| 2 | fred |

| 3 | bob |

| 4 | igor |

+------+-------+

4 tuples (3.239ms)

I have an activity table that is sharded across the two workers, each remoted to master and a merge table (called activity) combining them. On worker 0:

sql>select * from activity_w0;

+--------+------+

| usr_id | what |

+========+======+

| 1 | act1 |

| 3 | act3 |

| 1 | act5 |

| 3 | act8 |

+--------+------+

4 tuples (2.331ms)

On worker 1:

sql>select * from activity_w1;

+--------+------+

| usr_id | what |

+========+======+

| 2 | act2 |

| 4 | act4 |

| 2 | act6 |

| 4 | act7 |

+--------+------+

4 tuples (3.899ms)

So worker 0 has only activity for odd usr ids, worker 1 has only activity for even worker ids. If I perform this query on master:

sql>select

more> what

more>from

more> activity a

more> join usr u on (a.usr_id = u.id)

more>where

more> u.name = 'kelly';

+------+

| what |

+======+

| act1 |

| act5 |

+------+

2 tuples (10.822ms)

I assume that master is pulling back the entire contents of both activity_w0 and activity_w1 because it doesn't know that there is a remote usr table on each worker. It is not clear to me how to set up a replica table on the workers to allow the filtering to be done there rather than at the master.

Is this possible or am I chasing a red herring?

-K

On Mon, May 8, 2017 at 6:31 AM, Ying Zhang <Y.Zhang@cwi.nl> wrote:

> On 5 May 2017, at 23:07, Kelly Burkhart <kelly.burkhart@gmail.com> wrote:
>
> Can someone describe how exactly replica tables are used to push joins to worker nodes?
>
> Assume a master node and two worker nodes. I assume my reference table would go on my master node, and on my worker nodes I would have my large data table and replicas of the reference tables from the master node.

Hai Kelly,

This would be a strategy you can start with. I’d expect that some joins can be pushed to be executed (solely) locally on the worker nodes.

But how exactly replica tables are used, and joins are handled largely depends on your query.

Best,
Jennie

>
> Is that the general strategy?
>
> Thanks,
>
> -K
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list