Python UDFs with blob column

Panagiotis Koutsourakis kutsurak at monetdbsolutions.com
Thu Jul 19 10:52:13 CEST 2018


Hello Florestan,

Your patch has been applied and will be part of the next feature release
(https://dev.monetdb.org/hg/MonetDB/rev/7aaeaa80867f). Thank you very
much for the contribution.

Best regards,
Panos.

Florestan De Moor @ 2018-06-12 16:33 GMT:

> Hi Hannes,
>
> Thank you for your answer! Here is the patch in attachment.
>
> Best,
>
> Florestan
>
>
> On 12/06/2018 11:39, Hannes Mühleisen wrote:
>> Hi Florestan,
>>
>>
>>
>>> On 12 Jun 2018, at 16:56, Florestan De Moor <florestan.de-moor at ens-rennes.fr> wrote:
>>>
>>> I've been recently working with Python UDFs in MonetDB and I've noticed that returning a blob column from an array of python objects is not allowed.
>>> A look at the code confirmed  that this feature is not implemented yet: "FIXME: check for byte array/or pickle object to string".
>>> I have thus written an implementation of this feature.
>> Great!
>>
>>> Could you please tell me how I can submit my code for you to review it and consider to apply it? Thanks.
>> I suggest you send a patch (hg diff) to this mailing list.
>>
>> Best,
>>
>> Hannes
>>
>>
>> _______________________________________________
>> developers-list mailing list
>> developers-list at monetdb.org
>> https://www.monetdb.org/mailman/listinfo/developers-list
>
> diff -r d18b0a317120 sql/backends/monet5/UDF/pyapi/conversion.c
> --- a/sql/backends/monet5/UDF/pyapi/conversion.c	Tue Jun 12 09:45:28 2018 +0200
> +++ b/sql/backends/monet5/UDF/pyapi/conversion.c	Tue Jun 12 12:23:25 2018 -0400
> @@ -825,24 +825,39 @@
>  		bool *mask = NULL;
>  		char *data = NULL;
>  		blob *ele_blob;
> -		size_t blob_fixed_size = -1;
> +		size_t blob_fixed_size = ret->memory_size;
> +
> +		PyObject *pickle_module = NULL, *pickle = NULL;
> +		bool gstate = 0;
> +
>  		if (ret->result_type == NPY_OBJECT) {
> -			// FIXME: check for byte array/or pickle object to string
> -			msg = createException(MAL, "pyapi.eval",
> -								  SQLSTATE(PY000) "Python object to BLOB not supported yet.");
> -			goto wrapup;
> +			// Python objects, we may need to pickle them, so we
> +			// may execute Python code, we have to obtain the GIL
> +			gstate = Python_ObtainGIL();
> +			pickle_module = PyImport_ImportModule("pickle");
> +			if (pickle_module == NULL) {
> +				msg = createException(MAL, "pyapi.eval",
> +									  SQLSTATE(PY000) "Can't load pickle module to pickle python object to blob");
> +				Python_ReleaseGIL(gstate);
> +				goto wrapup;
> +			}
> +			blob_fixed_size = 0; // Size depends on the objects
>  		}
> +
>  		if (ret->mask_data != NULL) {
>  			mask = (bool *)ret->mask_data;
>  		}
>  		if (ret->array_data == NULL) {
>  			msg = createException(MAL, "pyapi.eval",
>  								  SQLSTATE(PY000) "No return value stored in the structure.");
> +			if (ret->result_type == NPY_OBJECT) {
> +				Py_XDECREF(pickle_module);
> +				Python_ReleaseGIL(gstate);
> +			}
>  			goto wrapup;
>  		}
>  		data = (char *)ret->array_data;
>  		data += (index_offset * ret->count) * ret->memory_size;
> -		blob_fixed_size = ret->memory_size;
>  		b = COLnew(seqbase, TYPE_sqlblob, (BUN)ret->count, TRANSIENT);
>  		b->tnil = 0;
>  		b->tnonil = 1;
> @@ -850,26 +865,68 @@
>  		b->tsorted = 0;
>  		b->trevsorted = 0;
>  		for (iu = 0; iu < ret->count; iu++) {
> +
> +			char* memcpy_data;
>  			size_t blob_len = 0;
> +
> +			if (ret->result_type == NPY_OBJECT) {
> +				PyObject *object = *((PyObject **)&data[0]);
> +				if (PyByteArray_Check(object)) {
> +					memcpy_data = PyByteArray_AsString(object);
> +					blob_len = pyobject_get_size(object);
> +				} else {
> +					pickle = PyObject_CallMethod(pickle_module, "dumps", "O", object);
> +					if (pickle == NULL) {
> +						msg = createException(MAL, "pyapi.eval",
> +											  SQLSTATE(PY000) "Can't pickle object to blob");
> +						Py_XDECREF(pickle_module);
> +						Python_ReleaseGIL(gstate);
> +						goto wrapup;
> +					}
> +					memcpy_data = PyBytes_AsString(pickle);
> +					blob_len = pyobject_get_size(pickle);
> +					Py_XDECREF(pickle);
> +				}
> +				if (memcpy_data == NULL) {
> +					msg = createException(MAL, "pyapi.eval",
> +										  SQLSTATE(PY000) "Can't get blob pickled object as char*");
> +					Py_XDECREF(pickle_module);
> +					Python_ReleaseGIL(gstate);
> +					goto wrapup;
> +				}
> +			} else {
> +				memcpy_data = data;
> +			}
> +
>  			if (mask && mask[iu]) {
>  				ele_blob = (blob *)GDKmalloc(offsetof(blob, data));
>  				ele_blob->nitems = ~(size_t)0;
>  			} else {
>  				if (blob_fixed_size > 0) {
>  					blob_len = blob_fixed_size;
> -				} else {
> -					assert(0);
>  				}
>  				ele_blob = GDKmalloc(blobsize(blob_len));
>  				ele_blob->nitems = blob_len;
> -				memcpy(ele_blob->data, data, blob_len);
> +				memcpy(ele_blob->data, memcpy_data, blob_len);
>  			}
> -			if (BUNappend(b, ele_blob, false) != GDK_SUCCEED) {
> +			if (BUNappend(b, ele_blob, FALSE) != GDK_SUCCEED) {
> +				if (ret->result_type == NPY_OBJECT) {
> +					Py_XDECREF(pickle_module);
> +					Python_ReleaseGIL(gstate);
> +				}
>  				goto bunins_failed;
>  			}
>  			GDKfree(ele_blob);
>  			data += ret->memory_size;
> +
>  		}
> +
> +		// We are done, we can release the GIL
> +		if (ret->result_type == NPY_OBJECT) {
> +			Py_XDECREF(pickle_module);
> +			Python_ReleaseGIL(gstate);
> +		}
> +
>  		BATsetcount(b, (BUN)ret->count);
>  		BATsettrivprop(b);
>  	} else {
> _______________________________________________
> developers-list mailing list
> developers-list at monetdb.org
> https://www.monetdb.org/mailman/listinfo/developers-list


More information about the developers-list mailing list