Is it possible to look at the X100 source code or more detailed
information right now?
What I'm trying to accomplish is basically the same, in Common Lisp.
I read through Peter Boncz's Ph.D. thesis as well as the X100 paper.
One thing that I don't understand is whether the X100 scan retrieves
vectors from _all_ participating BATs. I'm also wondering how the
size of the cache (and thus the size of the vector) is determined.
Thanks, Joel
--
http://wagerlabs.com/uptick
Hi Joel,
About X100. In my opinion this is a very cool project that actually
constitutes an entirely new query engine. It is more scalable than the current
Monet4 engine, and also than the Monet5 engine. There is no main memory
limitation anymore. Let's think about it as Monet6 (though that is not sure).
The system is quite functional, but especially regarding APIs, database
updates and also binary/source distribution of the code, a lot has to be done.
Currently, we are concentrating on obtaining scientific results. Some
functioanlities (like database updates) still have to be developed. I think we
are more than a year off from a releasable system. Also, it has not been
decided yet whether that will be an open-source release. Maybe X100 has some
commercial future..
Then more general. I just read your posts on this mailing list and I do not
have a full grasp of your plans in order to properly advise you.
Some general tips:
- try to keep all bulk operations in MIL, with interfaces to the lisp engine
that move only little data in and out.
- use unary bats preferable. i/e/ those that have a void head type
- try to seperate data and metadata. Thus, the schema you use, and also the
results, are best kept outside MonetDB. Maybe store them in flat files XML or
something. It makes the architecture easier to maintain. Also, your
application does not depend that strongly on MonetDB repositories.
As for special purpose indexing, often it is possible to represent them as a
set of bats. you will have to do some coding in a new monet extension module
(a .mx file as found in the src/modules/plain/ directory) to write some
primitives that access those bats (that are your index) efficiently.
just my 2 cts.
Peter
Martin,
Thanks for your help and my apologies. I thought there was support
for enums in the GDK but I'll take a look at the enum module.
I'm trying to implement the pieces of MIL that I need
for my trading platform, in Common Lisp.
Thanks, Joel
On Jul 29, 2005, at 11:52 PM, Martin Kersten wrote:
> Joel,
>
> We apologize that the user-documentation is behind the code.
> Nevertheless, the following areas are a good starting point for
> gaining additional information.
--
http://wagerlabs.com/uptick
Folks,
Is there a property I can set on the BAT to mark it as enum?
How are num bats implemented in the kernel? I could not find this in
the docs.
Thanks, Joel
--
http://wagerlabs.com/uptick
On Jul 28, 2005, at 1:19 PM, Martin Kersten wrote:
>>> MIL is likely
>>> to be moved into a corner next year when MonetDB Version 5 is
>>> released.
>>>
>> What is the expected replacement?
>>
> An assembler-like version, intended as a target language for front-end
> compilers and optimizers, not for end-user programming.
Then I could actually translate things into this new MIL. Sound good.
>> Can I screw it [BAT properties] up by using GDK functions?
>>
> Mostly not, but you are really linking into a kernel and this requires
> a lot more administration to take care off. Not to mention proper
> garbage collection,..., i.e. all the stuff hidden by the language
> interpreters.
Fortunately for me garbage collection is taken care of by Lisp.
It's also a mixed interpreted _and_ compiled environment
which gives me an additional leg up.
> The structure of the table is known and the integrity constraints
> allow direct updates on the BATs.
Yes, my updates are fixed indeed. Where can I find the batch-loading
code? Is that the ascii_io module?
> Then a simple IP-channel can be used
> to quickly pump data into the system.
It does not need to be an IP channel, right? I could just as well
pump data into the system directly, right?
> Depending on the batching size
> implied in a separate data gatherer, you should be able to handle
> anywhere between 5K-200K inserts/sec on a normal PC.
> It also bypasses the SQL logger, which may not be what you want.
I will have to run some experiments here as loosing incoming data
would be very bad for me on the one hand and on the other hand
I need to store data as quickly as possible. I have a Mac OSX laptop.
> A time-series module over BATs would be the preferred way to go.
What should this module be written in?
> Such a module would have primitives for moving-statistics,
> window-based selections, window-based signal analysis, and
> schemes for efficient temporal joins (including interpolation).
I have more interest in rolling out a trading platform but once
the first version is out I could definitely look into optimizations
like building a time series module.
I recon from your "Efficient k-NN Search on Vertically Decomposed Data"
paper that I could calculate the Euclidian distance between
time series on the fly since vector ops are very fast. At least
I could try. I'm also talking to the http://www.cs.purdue.edu/spgist/
folks and other scientists to see if I could use one of the indexing
schemes as a MonetDB search accelerator (hope I have the terminology
right).
I did not see an answer on my other post so I still don't know if a
range
search can be efficiently done on a lng bat. If they can then I could
just
do with that as my queries are mostly run this strategy on this range
of data,
using a sliding window of X BUNs.
The range of data would be a subset of the price BAT, limited by a range
of dates in a different BAT. But then I also need to consider a symbol
BAT to take only MSFT or IBM, etc. into account. It seems a lot like
the problem that you tackled in the k-NN paper.
I don't fully understand how indexes work in MonetDB. I don't understand
if I need to manually create them on my columns for example. I suppose
this does not make sense with MonetDB. It seems that only one index
is used,
the one that links all BATs in a table so that particular BUNs can be
accessed.
Let me know if I got it right please.
> Unfortunately, there is no free lunch and I guess a more top-down
> architectural design is now what you should be looking for.
I'm reading through Peter Alexander Boncz's dissertation at the moment.
> Before you jump into the MonetDB code.
> Compare it with riding a Ferrari the first time around, you can get
> good speed-up, but also easily get killed.
Yes, I'm aware of the dangers but think that I'm on to something
exciting.
Your help and advice is greatly appreciated!
Thanks, Joel
--
http://wagerlabs.com/uptick
Martin,
On Jul 28, 2005, at 12:49 PM, Martin Kersten wrote:
> 3) is a good option. Separate the database functionality from the
> application and only extend the kernel if you can not escape it.
> [...]
> indeed, accessing points one-by-one nullifies all positive effects of
> a database system, you should think set-(list-) at a time
I could not see any way of retrieving data in sets or lists using MAPI.
Am I missing something?
> MIL is likely
> to be moved into a corner next year when MonetDB Version 5 is
> released.
What is the expected replacement?
>> or something similar. I'm wondering if MIL would be up to the task.
>> #2 is the option that I would prefer as I could hook up into the GDK
>> to quickly iterate through BATs and BUNs and use Lisp to write
>>
> that is feasible and has been done more often. The pitfall here is
> that
> you may easily screw-up the property infrastructure
What is the property infrastructure?
Can I screw it up by using GDK functions?
> Actually, if your input/updates are fixed, you may consider a much
> faster scheme directly interfacing with updates on the BAT structures.
> The bulk loading routines may give you the hints how to do this.
Just so that I understand you correctly, what do you mean by fixed
input/updates?
Thanks, Joel
--
http://wagerlabs.com/uptick
Folks,
The real-time trading platform that I'm building requires running
strategies on historical data at very high speed. There could be
hundreds of thousands of such runs with different permutations
of arguments.
I do not want to force my customers to write trading systems in MIL
and MIL, according to its description, might not be suitable.
So I was wondering if I should 1) try to translate trading systems
from Lisp into MIL or if I should 2) build my own Lisp kernel
for trading systems execution. There's also 3) the option of not
running strategies in MonetDB/MIL.
#3 would require using MAPI and extracting data points one by one.
I dislike this as it would not let me take advantage
of vector processing.
#1 would require using MIL to potentially write Fast Fourier Transforms
or something similar. I'm wondering if MIL would be up to the task.
#2 is the option that I would prefer as I could hook up into the GDK
to quickly iterate through BATs and BUNs and use Lisp to write
and debug trading systems. With some coordination I think I could
even use MonetDB-SQL together with my kernel, so long as I leave
all the insertion and update operations outside of Lisp.
What do you think?
Thanks, Joel
--
http://wagerlabs.com/uptick
Folks,
I'm building a real-time, very short-frequency trading platform.
The platform needs a backend for storing exchange ticks (quotes)
and I think MonetDB could be that platform.
My ultimate attraction is the vertical decomposition that sets
MonetDB apart. It's the approach taken by KDB
http://www.kx.com/products/kdbplusfaq.phphttp://cs.nyu.edu/cs/faculty/shasha/papers/sigmodpap.pdf
and that has already proven immensely effective.
My personal need is in searching for matching subsequences
in other time series (k-NN?) which would require me to compute
the Euclidian distance between points, etc.
Martin suggested that I build a time series extension module
which is fine with me (see http://wagerlabs.com/resume.pdf)
I have several questions before I embark on this project...
I don't have a clear understanding of how MonetDB managed indexes.
If I'm dealing with timestamp, price and volume and need to search
either would I need to build indexes on the three columns?
I believe MonetDB supports range queries. Could I just store
timestamp as an integer such as YYYYMMDDHHSS and do a range query
on that?
How would I go about building the time series extension module?
Would I need to add custom search accelerators to build
a specialized index on the fly?
Thanks, Joel
--
http://wagerlabs.com/uptick