Hello,

I have a function definition

command batstemmer.stem(terms:bat[:oid,:str], stemmer_name:str):bat[:oid,:str]
address CMDbatstem
comment "Wrapper for snowball stemmer";

which internally uses the snowball stemmer (http://snowball.tartarus.org/).


When the bat to be stemmed is large enough, mitosis will split it into chunks and call the function "stem" on each chunk, possibly in parallel.

Problem is, the snowball stemmer implementation appears to be thread-unsafe, which causes a SIGSEGV.

Indeed, using the no_mitosis_pipe solves the issue. However, this solution is suboptimal.

Another solution I found is to mark the mal signature as {unsafe}. This works, although it does something a bit silly: it splits the table into chunks, then repacks everything, and finally runs my function on the re-packed bat (basically wasting effort on a useless split + repack).

Now, my question is: is there a more focussed property to use? {unsafe} implies thread-unsafe, but it is actually stronger than that. For example, it also implies that there might be side-effects. Therefore, the result cannot be recycled. In my case, instead, the result is perfectly safe to be reused.

Thanks for any tip.

Roberto