Client PHP too slow with many results
Hello,
I'm finding php client too slow returning big result sets. I'm using php client i i downloaded from http://dev.monetdb.org/downloads/sources/Latest/ MonetDB-11.13.3.tar.bz2
Some benchmark down, anyway of get the results faster?
LIMIT 50K
[root@monetdb examples]# php simple_query.php *9*
[root@monetdb examples]# time mclient -u adctest -d adctest --statement='SELECT * FROM summary_sent LIMIT 50000' > res.txt password:
real 0m4.566s <- already counting with some time to password input user 0m1.272s sys 0m0.132s
----
LIMIT 100K
[root@monetdb examples]# time mclient -u adctest -d adctest --statement='SELECT * FROM summary_sent LIMIT 100000' > res.txt password:
*real 0m7.184s* user 0m2.545s sys 0m0.266s
[root@monetdb examples]# php simple_query.php *18*
----------
Code:
[root@monetdb examples]# cat simple_query.php <?php require '../lib/php_monetdb.php';
$db = monetdb_connect("sql", '127.0.0.1', 50000, 'adctest', 'adctest', 'adctest') or trigger_error(monetdb_last_error());
$start_query_monetdb = time(); $res = monetdb_query($db, monetdb_escape_string('SELECT * FROM summary_sent LIMIT 100000')) or trigger_error(monetdb_last_error());
while ( $row = monetdb_fetch_object($res) ) {
}
$time = time() - $start_query_monetdb; echo $time . "\n";
/* Free the result set */ monetdb_free_result($res);
/* Disconnect from the database */ if (monetdb_connected($db)) { monetdb_disconnect($db); }
Hello Eduardo
The core problem might be your query itself. It is a row-store query ;)
It is the extreme case encountered in a column store. There the benefits come from using (accessing) a limited number of columns or to perform aggregations.
Consider queries such as "SELECT grp, sum(attr) FROM summary_sent WHERE predicate GROUP BY grp" or "SELECT attr1, attr2 FROM summary_sent WHERE predicate LIMIT 50000;"
Your observation about php client performance most likely has little to do with the PHP interface to MonetDB. To check it, split the timing in the actual query execution part in the server and the retrieval loop. (Also be aware of cold/hot query processing) Although, I am not sure about the buffering scheme deployed in our PHP code, which may require some updates. In general, sending twice the amount of data over the wire will take twice the amount of time to process.
regards, Martin
On 11/15/12 5:53 PM, Eduardo Oliveira wrote:
Hello,
I'm finding php client too slow returning big result sets. I'm using php client i i downloaded from http://dev.monetdb.org/downloads/sources/Latest/ MonetDB-11.13.3.tar.bz2
Some benchmark down, anyway of get the results faster?
LIMIT 50K
[root@monetdb examples]# php simple_query.php *9*
[root@monetdb examples]# time mclient -u adctest -d adctest --statement='SELECT * FROM summary_sent LIMIT 50000' > res.txt password:
real 0m4.566s <- already counting with some time to password input user 0m1.272s sys 0m0.132s
LIMIT 100K
[root@monetdb examples]# time mclient -u adctest -d adctest --statement='SELECT * FROM summary_sent LIMIT 100000' > res.txt password:
*real 0m7.184s* user 0m2.545s sys 0m0.266s
[root@monetdb examples]# php simple_query.php *18*
Code:
[root@monetdb examples]# cat simple_query.php <?php require '../lib/php_monetdb.php';
$db = monetdb_connect("sql", '127.0.0.1', 50000, 'adctest', 'adctest', 'adctest') or trigger_error(monetdb_last_error());
$start_query_monetdb = time(); $res = monetdb_query($db, monetdb_escape_string('SELECT * FROM summary_sent LIMIT 100000')) or trigger_error(monetdb_last_error());
while ( $row = monetdb_fetch_object($res) ) {
}
$time = time() - $start_query_monetdb; echo $time . "\n";
/* Free the result set */ monetdb_free_result($res);
/* Disconnect from the database */ if (monetdb_connected($db)) { monetdb_disconnect($db); }
-- *Eduardo Oliveira
/IT/ ***Email:* eduardo.oliveira@adclick.pt mailto:nuno.morais@adclick.pt *Web: *www.adclickint.com http://www.adclickint.com/ **http://www.adclickint.com/
users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
Hello Martin thank you for the quick answer,
I will try to explain you better my situation. I have a big table that will grow around 20-30M rows a day (just inserts, i will create a new CSV and load each 20-30 minutes), then i have a small table that right now is 100K and probably will be 200K in 1-2 years, this one have a lot of updates. The big table is very easy to shard by date, is for reports that all the time user have to choose a > date and < date.
The final reports are made from a big table a join with small table, and have around 100-500 rows (that is sometimes presented as table sometimes as graph).
But because i will shard big table different servers is not easy to keep the small table in all servers, and it have a lot of updates, and i read that updates on monetdb are not really updates, just another structure saying that was updated.
So the idea is to have a PHP HTTP API that will receive a query will query the shards monetdb needed but can bring much more than 100-500 rows, can be up to 10K rows, because is not joined with the small table yet and 1 group by that is one column is just on the small table, is difficult to put that column is the big table to avoid join because needs a lot of updates.
Then it will be done the join in the appplication, or using a table on mysql a memory table writes will be very fast.
The times i gave below is all local, so some networking time will be added, but so slow to get 10K of results is really a show stopper.
2012/11/15 Martin Kersten martin@monetdb.org
Hello Eduardo
The core problem might be your query itself. It is a row-store query ;)
It is the extreme case encountered in a column store. There the benefits come from using (accessing) a limited number of columns or to perform aggregations.
Consider queries such as "SELECT grp, sum(attr) FROM summary_sent WHERE predicate GROUP BY grp" or "SELECT attr1, attr2 FROM summary_sent WHERE predicate LIMIT 50000;"
Your observation about php client performance most likely has little to do with the PHP interface to MonetDB. To check it, split the timing in the actual query execution part in the server and the retrieval loop. (Also be aware of cold/hot query processing) Although, I am not sure about the buffering scheme deployed in our PHP code, which may require some updates. In general, sending twice the amount of data over the wire will take twice the amount of time to process.
regards, Martin
On 11/15/12 5:53 PM, Eduardo Oliveira wrote:
Hello,
I'm finding php client too slow returning big result sets. I'm using php client i i downloaded from http://dev.monetdb.org/** downloads/sources/Latest/http://dev.monetdb.org/downloads/sources/Latest/ MonetDB-11.13.3.tar.bz2
Some benchmark down, anyway of get the results faster?
LIMIT 50K
[root@monetdb examples]# php simple_query.php *9*
[root@monetdb examples]# time mclient -u adctest -d adctest --statement='SELECT * FROM summary_sent LIMIT 50000' > res.txt password:
real 0m4.566s <- already counting with some time to password input user 0m1.272s sys 0m0.132s
LIMIT 100K
[root@monetdb examples]# time mclient -u adctest -d adctest --statement='SELECT * FROM summary_sent LIMIT 100000' > res.txt password:
*real 0m7.184s*
user 0m2.545s sys 0m0.266s
[root@monetdb examples]# php simple_query.php *18*
Code:
[root@monetdb examples]# cat simple_query.php <?php require '../lib/php_monetdb.php';
$db = monetdb_connect("sql", '127.0.0.1', 50000, 'adctest', 'adctest', 'adctest') or trigger_error(monetdb_last_**error());
$start_query_monetdb = time(); $res = monetdb_query($db, monetdb_escape_string('SELECT * FROM summary_sent LIMIT 100000')) or trigger_error(monetdb_last_**error());
while ( $row = monetdb_fetch_object($res) ) {
}
$time = time() - $start_query_monetdb; echo $time . "\n";
/* Free the result set */ monetdb_free_result($res);
/* Disconnect from the database */ if (monetdb_connected($db)) { monetdb_disconnect($db); }
-- *Eduardo Oliveira
/IT/ ***Email:* eduardo.oliveira@adclick.pt mailto:nuno.morais@adclick.pt** *Web: *www.adclickint.com http://www.adclickint.com/ **http://www.adclickint.com/
______________________________**_________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/**mailman/listinfo/users-listhttp://mail.monetdb.org/mailman/listinfo/users-list
Dear Eduardo,
Based on our own experience in the past and reported also independently in the German CS Magazine a few years back, the PHP protocol is only slightly worse then the very optimized version in Mysql.
Your application shows an append database, which does not cause any harm to the performance. For highly volatile updates the optimistic concurrency scheme in combination with the delta-storage is indeed not optimal. But given the small table size, a regular rebuild creating a fresh copy of the small one would deal with it.
To proceed and attack your problems the cause must be identified first, e.g. move your timing in your script to separate cost of the PHP API result set receipt and analysis from the server query evaluation cost. Note that by the time the first record is sent over the wire, the complete result is already available within the server.
You might also look at the "funnel" feature that would allow you to combine results of several servers in an easy way. Of course, solving a join in the application is the last resort.
You could help us to make parts of your application known, e.g. schema and data, and identify where MonetDB could improve over its competitors in quantitative means, e.g. how fast is MySQL in tackling your problem.
regards, Martin
On 11/15/12 11:04 PM, Eduardo Oliveira wrote:
Hello Martin thank you for the quick answer,
I will try to explain you better my situation. I have a big table that will grow around 20-30M rows a day (just inserts, i will create a new CSV and load each 20-30 minutes), then i have a small table that right now is 100K and probably will be 200K in 1-2 years, this one have a lot of updates. The big table is very easy to shard by date, is for reports that all the time user have to choose a > date and < date.
The final reports are made from a big table a join with small table, and have around 100-500 rows (that is sometimes presented as table sometimes as graph).
But because i will shard big table different servers is not easy to keep the small table in all servers, and it have a lot of updates, and i read that updates on monetdb are not really updates, just another structure saying that was updated.
So the idea is to have a PHP HTTP API that will receive a query will query the shards monetdb needed but can bring much more than 100-500 rows, can be up to 10K rows, because is not joined with the small table yet and 1 group by that is one column is just on the small table, is difficult to put that column is the big table to avoid join because needs a lot of updates.
Then it will be done the join in the appplication, or using a table on mysql a memory table writes will be very fast.
The times i gave below is all local, so some networking time will be added, but so slow to get 10K of results is really a show stopper.
2012/11/15 Martin Kersten <martin@monetdb.org mailto:martin@monetdb.org>
Hello Eduardo The core problem might be your query itself. It is a row-store query ;) It is the extreme case encountered in a column store. There the benefits come from using (accessing) a limited number of columns or to perform aggregations. Consider queries such as "SELECT grp, sum(attr) FROM summary_sent WHERE predicate GROUP BY grp" or "SELECT attr1, attr2 FROM summary_sent WHERE predicate LIMIT 50000;" Your observation about php client performance most likely has little to do with the PHP interface to MonetDB. To check it, split the timing in the actual query execution part in the server and the retrieval loop. (Also be aware of cold/hot query processing) Although, I am not sure about the buffering scheme deployed in our PHP code, which may require some updates. In general, sending twice the amount of data over the wire will take twice the amount of time to process. regards, Martin On 11/15/12 5:53 PM, Eduardo Oliveira wrote: Hello, I'm finding php client too slow returning big result sets. I'm using php client i i downloaded from http://dev.monetdb.org/__downloads/sources/Latest/ <http://dev.monetdb.org/downloads/sources/Latest/> MonetDB-11.13.3.tar.bz2 Some benchmark down, anyway of get the results faster? LIMIT 50K [root@monetdb examples]# php simple_query.php *9* [root@monetdb examples]# time mclient -u adctest -d adctest --statement='SELECT * FROM summary_sent LIMIT 50000' > res.txt password: real 0m4.566s <- already counting with some time to password input user 0m1.272s sys 0m0.132s ---- LIMIT 100K [root@monetdb examples]# time mclient -u adctest -d adctest --statement='SELECT * FROM summary_sent LIMIT 100000' > res.txt password: *real 0m7.184s* user 0m2.545s sys 0m0.266s [root@monetdb examples]# php simple_query.php *18* ---------- Code: [root@monetdb examples]# cat simple_query.php <?php require '../lib/php_monetdb.php'; $db = monetdb_connect("sql", '127.0.0.1', 50000, 'adctest', 'adctest', 'adctest') or trigger_error(monetdb_last___error()); $start_query_monetdb = time(); $res = monetdb_query($db, monetdb_escape_string('SELECT * FROM summary_sent LIMIT 100000')) or trigger_error(monetdb_last___error()); while ( $row = monetdb_fetch_object($res) ) { } $time = time() - $start_query_monetdb; echo $time . "\n"; /* Free the result set */ monetdb_free_result($res); /* Disconnect from the database */ if (monetdb_connected($db)) { monetdb_disconnect($db); } -- *Eduardo Oliveira * /IT/ ***Email:* eduardo.oliveira@adclick.pt <mailto:eduardo.oliveira@adclick.pt> <mailto:nuno.morais@adclick.pt <mailto:nuno.morais@adclick.pt>__> *Web: *www.adclickint.com <http://www.adclickint.com> <http://www.adclickint.com/> **<http://www.adclickint.com/> * *<http://www.adclickint.com/> <http://www.adclickint.com/> _________________________________________________ users-list mailing list users-list@monetdb.org <mailto:users-list@monetdb.org> http://mail.monetdb.org/__mailman/listinfo/users-list <http://mail.monetdb.org/mailman/listinfo/users-list>
-- *Eduardo Oliveira
/IT/ ***Email:* eduardo.oliveira@adclick.pt mailto:nuno.morais@adclick.pt *Web: *www.adclickint.com http://www.adclickint.com/ **http://www.adclickint.com/
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
On 15-11-2012 23:28:01 +0100, Martin Kersten wrote:
Based on our own experience in the past and reported also independently in the German CS Magazine a few years back, the PHP protocol is only slightly worse then the very optimized version in Mysql.
This was with the version that back then relied on the C-implementation (mapi) of the protocol. Since this proved to be unmaintainable, we switched to a PHP native-implementation. This is obviously much less suited to gain high performance in large data volumes shipped over the wire, as you observed in your initial post.
Fabian
Fabian i understand that, thank you for the help, i have to workaround to bring less data over the wire.
2012/11/16 Fabian Groffen fabian@monetdb.org
On 15-11-2012 23:28:01 +0100, Martin Kersten wrote:
Based on our own experience in the past and reported also independently in the German CS Magazine a few years back, the PHP protocol is only slightly worse then the very optimized version in Mysql.
This was with the version that back then relied on the C-implementation (mapi) of the protocol. Since this proved to be unmaintainable, we switched to a PHP native-implementation. This is obviously much less suited to gain high performance in large data volumes shipped over the wire, as you observed in your initial post.
Fabian
-- Fabian Groffen fabian@monetdb.org column-store pioneer http://www.monetdb.org/Home
participants (4)
-
Eduardo Oliveira
-
Fabian Groffen
-
Martin Kersten
-
Martin Kersten