Python UDFs with blob column
Hello,
I've been recently working with Python UDFs in MonetDB and I've noticed that returning a blob column from an array of python objects is not allowed. A look at the code confirmed that this feature is not implemented yet: "FIXME: check for byte array/or pickle object to string". I have thus written an implementation of this feature. Could you please tell me how I can submit my code for you to review it and consider to apply it? Thanks.
Best regards,
Florestan
Hi Florestan,
On 12 Jun 2018, at 16:56, Florestan De Moor florestan.de-moor@ens-rennes.fr wrote:
I've been recently working with Python UDFs in MonetDB and I've noticed that returning a blob column from an array of python objects is not allowed. A look at the code confirmed that this feature is not implemented yet: "FIXME: check for byte array/or pickle object to string". I have thus written an implementation of this feature.
Great!
Could you please tell me how I can submit my code for you to review it and consider to apply it? Thanks.
I suggest you send a patch (hg diff) to this mailing list.
Best,
Hannes
Hi Hannes,
Thank you for your answer! Here is the patch in attachment.
Best,
Florestan
On 12/06/2018 11:39, Hannes Mühleisen wrote:
Hi Florestan,
On 12 Jun 2018, at 16:56, Florestan De Moor florestan.de-moor@ens-rennes.fr wrote:
I've been recently working with Python UDFs in MonetDB and I've noticed that returning a blob column from an array of python objects is not allowed. A look at the code confirmed that this feature is not implemented yet: "FIXME: check for byte array/or pickle object to string". I have thus written an implementation of this feature.
Great!
Could you please tell me how I can submit my code for you to review it and consider to apply it? Thanks.
I suggest you send a patch (hg diff) to this mailing list.
Best,
Hannes
developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
Hello Florestan,
Your patch has been applied and will be part of the next feature release (https://dev.monetdb.org/hg/MonetDB/rev/7aaeaa80867f). Thank you very much for the contribution.
Best regards, Panos.
Florestan De Moor @ 2018-06-12 16:33 GMT:
Hi Hannes,
Thank you for your answer! Here is the patch in attachment.
Best,
Florestan
On 12/06/2018 11:39, Hannes Mühleisen wrote:
Hi Florestan,
On 12 Jun 2018, at 16:56, Florestan De Moor florestan.de-moor@ens-rennes.fr wrote:
I've been recently working with Python UDFs in MonetDB and I've noticed that returning a blob column from an array of python objects is not allowed. A look at the code confirmed that this feature is not implemented yet: "FIXME: check for byte array/or pickle object to string". I have thus written an implementation of this feature.
Great!
Could you please tell me how I can submit my code for you to review it and consider to apply it? Thanks.
I suggest you send a patch (hg diff) to this mailing list.
Best,
Hannes
developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
diff -r d18b0a317120 sql/backends/monet5/UDF/pyapi/conversion.c --- a/sql/backends/monet5/UDF/pyapi/conversion.c Tue Jun 12 09:45:28 2018 +0200 +++ b/sql/backends/monet5/UDF/pyapi/conversion.c Tue Jun 12 12:23:25 2018 -0400 @@ -825,24 +825,39 @@ bool *mask = NULL; char *data = NULL; blob *ele_blob;
size_t blob_fixed_size = -1;
size_t blob_fixed_size = ret->memory_size;
PyObject *pickle_module = NULL, *pickle = NULL;
bool gstate = 0;
- if (ret->result_type == NPY_OBJECT) {
// FIXME: check for byte array/or pickle object to string
msg = createException(MAL, "pyapi.eval",
SQLSTATE(PY000) "Python object to BLOB not supported yet.");
goto wrapup;
// Python objects, we may need to pickle them, so we
// may execute Python code, we have to obtain the GIL
gstate = Python_ObtainGIL();
pickle_module = PyImport_ImportModule("pickle");
if (pickle_module == NULL) {
msg = createException(MAL, "pyapi.eval",
SQLSTATE(PY000) "Can't load pickle module to pickle python object to blob");
Python_ReleaseGIL(gstate);
goto wrapup;
}
}blob_fixed_size = 0; // Size depends on the objects
- if (ret->mask_data != NULL) { mask = (bool *)ret->mask_data; } if (ret->array_data == NULL) { msg = createException(MAL, "pyapi.eval", SQLSTATE(PY000) "No return value stored in the structure.");
if (ret->result_type == NPY_OBJECT) {
Py_XDECREF(pickle_module);
Python_ReleaseGIL(gstate);
} data = (char *)ret->array_data; data += (index_offset * ret->count) * ret->memory_size;} goto wrapup;
b = COLnew(seqbase, TYPE_sqlblob, (BUN)ret->count, TRANSIENT); b->tnil = 0; b->tnonil = 1;blob_fixed_size = ret->memory_size;
@@ -850,26 +865,68 @@ b->tsorted = 0; b->trevsorted = 0; for (iu = 0; iu < ret->count; iu++) {
char* memcpy_data; size_t blob_len = 0;
if (ret->result_type == NPY_OBJECT) {
PyObject *object = *((PyObject **)&data[0]);
if (PyByteArray_Check(object)) {
memcpy_data = PyByteArray_AsString(object);
blob_len = pyobject_get_size(object);
} else {
pickle = PyObject_CallMethod(pickle_module, "dumps", "O", object);
if (pickle == NULL) {
msg = createException(MAL, "pyapi.eval",
SQLSTATE(PY000) "Can't pickle object to blob");
Py_XDECREF(pickle_module);
Python_ReleaseGIL(gstate);
goto wrapup;
}
memcpy_data = PyBytes_AsString(pickle);
blob_len = pyobject_get_size(pickle);
Py_XDECREF(pickle);
}
if (memcpy_data == NULL) {
msg = createException(MAL, "pyapi.eval",
SQLSTATE(PY000) "Can't get blob pickled object as char*");
Py_XDECREF(pickle_module);
Python_ReleaseGIL(gstate);
goto wrapup;
}
} else {
memcpy_data = data;
}
if (mask && mask[iu]) { ele_blob = (blob *)GDKmalloc(offsetof(blob, data)); ele_blob->nitems = ~(size_t)0; } else { if (blob_fixed_size > 0) { blob_len = blob_fixed_size;
} else {
assert(0); } ele_blob = GDKmalloc(blobsize(blob_len)); ele_blob->nitems = blob_len;
memcpy(ele_blob->data, data, blob_len);
memcpy(ele_blob->data, memcpy_data, blob_len); }
if (BUNappend(b, ele_blob, false) != GDK_SUCCEED) {
if (BUNappend(b, ele_blob, FALSE) != GDK_SUCCEED) {
if (ret->result_type == NPY_OBJECT) {
Py_XDECREF(pickle_module);
Python_ReleaseGIL(gstate);
} goto bunins_failed; } GDKfree(ele_blob); data += ret->memory_size;
- }
// We are done, we can release the GIL
if (ret->result_type == NPY_OBJECT) {
Py_XDECREF(pickle_module);
Python_ReleaseGIL(gstate);
}
- BATsetcount(b, (BUN)ret->count); BATsettrivprop(b); } else {
developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
participants (3)
-
Florestan De Moor
-
Hannes Mühleisen
-
Panagiotis Koutsourakis