Skip to main content

Export SAM/BAM data

There are two ways to export query results into SAM files: using the server-side sam_export() function, or using the client-side SAM formatter. The server side export function works independent of which client interface is used, however, it can only write data to files to which the MonetDB server has access. With the client side export function, one can export data to any files the client interface has access to, independent of where the MonetDB server resides. However, each client interface needs to be extended with its own SAM formatter. Currently, only mclient is supported.

Server-side SAM export

  • bam.sam_export(output_path STRING)

    This procedure exports all records in the table "bam.export" to the file "output_path". If "output_path" is a relative path, the file is created in the database farm currently served by this MonetDB server (see the man-pages monetdbd and mserver5 for the definition of "dbfarm" and "dbpath"). If the file "output_path" already exists, it is overwritten. This procedure will also delete all records from the table "bam.export" after the export.

The queries below show how to i) populate the "bam.export" table, ii) export data, iii) load exported data into the database, and finally iv) conduct some checks. Note that when insert data into the "bam.export" table, one needs to select the individual columns. This is because the "bam.export" table does not contain a column for the virtual offsets, because the virtual offsets can only be computed afterwards when constructing BAM files.

Populate the export table with a sequentially stored file

sql> INSERT INTO bam.export ( 
SELECT qname, flag, rname, pos, mapq, cigar, rnext, pnext, tlen, seq, qual
FROM bam.alignments_1);
sql> CALL bam.sam_export('OUTPUT_1'); -- Export data to SAM file
sql> CALL bam.bam_loader_file('OUTPUT_1', 0); -- Load exported data back into a sequential table
sql> -- Data inside original table should be exactly the same as the newly imported file
sql> SELECT qname, flag, rname, pos, mapq, cigar, rnext, pnext, tlen, seq, qual
FROM bam.alignments_1
EXCEPT
SELECT qname, flag, rname, pos, mapq, cigar, rnext, pnext, tlen, seq, qual
FROM bam.alignments_13; -- assume OUTPUT_1 is loaded into bam.alignments_13
sql> SELECT * FROM bam.export; -- Verify that the export table is now empty

Populate the export table with a pairwise stored file

sql> INSERT INTO bam.export (
    SELECT qname, flag, rname, pos, mapq, cigar, rnext, pnext, tlen, seq, qual
    FROM bam.unpaired_all_alignments_3
);
sql> CALL bam.sam_export('OUTPUT_2'); -- Export data to SAM file
sql> CALL bam.bam_loader_file('OUTPUT_2', 0); -- Load exported data back into a sequential table
sql> -- Data inside original table should be exactly the same as the newly imported file
sql> SELECT qname, flag, rname, pos, mapq, cigar, rnext, pnext, tlen, seq, qual
FROM bam.unpaired_all_alignments_3
EXCEPT
SELECT qname, flag, rname, pos, mapq, cigar, rnext, pnext, tlen, seq, qual
FROM bam.alignments_14; -- assume OUTPUT_2 is loaded into bam.alignments_14
sql> SELECT * FROM bam.export; -- Verify that the export table is now empty

Client-side SAM export

If you are using mclient to communicate with MonetDB, you can use the built-in SAM formatter to export query results to a SAM formatted file. Note that this requires the output columns to have the appropriate names, i.e., the renderer looks for the column names as defined by the alignments_N table as defined in the sequential schema (see SAM/BAM storage schemas). If there are columns that cannot be mapped, you will receive a notification about this and the renderer will simply discard the data in the not recognised columns.

To use the SAM formatter, type in mclient:

sql> \f sam

Now, if you execute a query using mclient, the result will be displayed in SAM format.

To write the query result to a SAM file, e.g., "/tmp/out.sam", type the following command in mclient:

sql> \> /tmp/out.sam

Now, results of all subsequent queries will be appended to this SAM file.

You can easily generate multiple SAM files by passing the "\>" multiple output file names:

sql> \> /tmp/out1.sam
sql> SELECT * FROM bam.alignments_1;
sql> \> /tmp/out2.sam
sql> SELECT * FROM bam.alignments_2;