How to Work with Mercurial

Author: Sjoerd Mullender
Organization: CWI, MonetDB B.V.

Abstract

The MonetDB repository has moved from CVS on Sourceforge to Mercurial on dev.monetdb.org. In this document, we give an introduction to the use of Mercurial, specifically for MonetDB.

Introduction

A nice tutorial about how to use Mercurial, aimed at Subversion (SVN) users (although CVS users will understand the Subversion references fine) can be found at http://hginit.com/.

There is also a Mercurial book. You can read it online at http://hgbook.red-bean.com/.

The main website for Mercurial is https://www.mercurial-scm.org/. There is also a lot of information there, including information about all sorts of nifty extensions.

The Basics

Mercurial (and git, for that matter) work around the concept of changesets where both CVS and SVN work around the concept of revisions. A revision is basically what the project (or a particular file in the case of CVS) looks like at any particular point in time. A changeset is basically a set of changes. Both CVS and SVN are centralized systems where there is one "truth", the central repository, and every developer has one particular revision on their disk (usually the latest). When you make a change, you upload the new version to the server (known as commit). If another developer was there first, the commit fails. You first need to update your own copy from the central repository, merging any changes in the process, and, when that was successful, try the commit again.

Mercurial works differently.

Every developer has a complete copy (a clone) of the repository on their disk. When you make a change, you can commit it to your local copy of the repository, independently of any other developers. Since this work is independent of others, you can commit smaller changes that are not yet finished and that do not yet work. When you are satisfied with the changes, you can share them with others by pushing the changes to the central repository.

Before you push changes, you may first have to pull in changes from others that had already been pushed by them. Pushing and pulling are mirror operations. What they do is compare the collections of changes (the changesets) in the originating and destination repositories, and copy over all changes that are in the originating repository but not yet in the destination. After you pull in changes, you need to update your working files. In the process you may have to resolve conflicting changes.

Because every developer has their own copy of the repository, and every developer makes changes and gets other developers' changes in a different order, the histories of different copies of the repository are different. You can see that in the log of the repository. In Mercurial, changesets are identified by two numbers. A relatively small number that is easy to use (the local revision number), and a large, hexadecimal number (the global revision id). The local revision number is per clone. It indicates the order in which changesets were added to the repository, but has no other significance. The global revision id identifies the changeset across clones. Different developers will have different local revision numbers for the same change, but they will have the same global revision id.

Each changeset depends on other changes that came before. However, a changeset does not depend on all previous changes, since some changes were made independently from each other. A changeset records its parent changeset. In fact, in Mercurial a changeset can have up to two parents. When a changeset has two parents, it is the result of a merge.

A changeset that is not the parent of other changesets (i.e. it doesn't have any children) is called a head. Mercurial wants to minimize the number of heads in a clone, so it will warn when an operation creates an extra head and will often refuse to perform the operation altogether. Combining heads is called a merge.

As far as Mercurial is concerned, each changeset you add (i.e. each commit) depends on the last changeset. However, this is of course not always really true since often different changes are orthogonal to each other. It is possible to tell Mercurial to change the parent of a changeset to some other changeset. This is called rebasing the changeset.

Configuration

Before you do anything else, it is a good idea to take care of some necessary configuration.

Mercurial reads a number of configuration files. Configuration settings in later ones in the list below override settings in earlier ones.

The configuration files all share the same syntax, the Windows INI file syntax, consisting of sections headed by a header in square brackets, and within each section key=value pairs.

See the manual hgrc(5) and hg help config for more information.

If you're using Mercurial only for MonetDB, you can add all the configurations that are mentioned below to the file $HOME/.hgrc. Otherwise you need to decide which configuration settings can be used for all your Mercurial clones and which should be specific to a clone. If you choose to use clone-specific configuration settings, you do need to make sure that the settings are added to each clone after they have been created.

Important Configuration

Your Name

The minimal configuration consists of telling Mercurial who you are. This is typically done in the user-specific configuration file. Add the following (with appropriate changes) to your configuration file:

[ui]
username = My Name <my.address@example.org>

Note that the name is actually used by the system in the From address in mails that are sent to the checkin mailing list to notify people of changes and thus the address is public.

Line Endings

On Windows, text files use the CRLF (Carriage Return - Line Feed) combination as end-of-line indicators, whereas Unix/Linux (including MacOS) uses only LF (Line Feed). On both sets of systems, many programs can deal with either convention, but some programs get confused, and if a file with one type of line ending is edited on a system using the other type, it may well be that all line endings are converted. This would cause a massive change to the file, affecting all lines. This would be very bad.

For this reason, it is important that Mercurial converts line endings for you, if needed. However, binary files, such as images, must not be converted.

Mercurial uses the eol extension and a file, .hgeol, in the root directory of the working set. This files contains file name patterns with declarations of the type of file (BIN, CRLF, LF, or native). The MonetDB repository already contains this file.

In order to enable the line ending conversion, the following configuration needs to be added in your configuration file (this can usually be added to the user-specific file). This should be done on any system, not just on Windows.

[extensions]
eol =

File Name Restrictions

If you are on a Linux system, you are urged to fix your configuration so that file names that are not compatible with Windows won't be added. Add the following to your configuration file:

[ui]
portablefilenames = abort

White Space Checks

You are strongly urged to add the following configuration to your .hg/hgrc file in all of your MonetDB clones (or to your $HOME/.hgrc file if you don't use Mercurial for other projects). First get a copy of a non-standard extension:

hg clone https://dev.monetdb.org/hg/check_whitespace/

This will create a directory check_whitespace in whatever directory you execute this in (preferably not within a clone of an existing repository). Then add the following to your configuration file:

[hooks]
pretxncommit.whitespace = python:<path-to-check_whitespace-directory>/check_whitespace.py:hook

This is a pre transaction hook which will check whether any of the added or changed lines contains incorrect white space. Incorrect white space is currently only defined for Python source code. In Python source code, it is not allowed to have trailing white space or to have TAB characters. The hook will also check for and refuse left over conflict resolution markers and non-empty files not ending with a newline.

Optional Configuration

Some other configuration ideas are (use hg help extensions for more information about extensions):

[extensions]
# temporarily save changes
shelve =

There used to be an extension called autopager which would only use the pager if the output was more than a screen full. This same effect can be had by using the correct options to the program less. To do so, add a pager section:

[pager]
pager = less -FXR

Merge Conflicts

Vim users might want to use the following to resolve conflicts:

[merge-tools]
vimdiff.args = -X $other $local $base
vimdiff.priority = 1
vimdiff.premerge = keep

Using this, for each file that has a conflict, vimdiff will be started with three open files. From left to right the files are: the other file (the version that is being merged in), the local file (the one that you should be editing), and the base file (the common ancestor of your version and the incoming version). Because of the premerge setting, Mercurial will attempt to merge and start vimdiff in case there are conflicts. In the middle pane you will see the conflict markers which you must resolve.

(X)Emacs users might, instead, want to use this (Replace emacs with xemacs if you're into XEmacs):

[merge-tools]
emacs.args = --eval "(ediff-merge-files-with-ancestor
                     \""$local"\" \""$other"\" \""$base"\" nil \""$output"\")"
emacs.priority = 1

Quoting in this and the following examples is important to get right.

Using this, for each file that has a conflict, (X)Emacs will be started and the Emacs command ediff-merge-files-with-ancestor will be run. In the default configuration this will show two smaller windows at the top, with on the left your original file and on the right the incoming file. Below this there is a full-width window with the local file. This latter file contains the result of the merge with markers where there are conflicts that you must resolve. Below that is the Ediff Control Panel.

The following two versions show the same window configuration as the above.

When using Emacs with emacsclient, you can try the following:

[merge-tools]
emacsclient.args = -a emacs -c \
                  --eval "(ediff-merge-files-with-ancestor
                           \""$local"\" \""$other"\" \""$base"\" nil \""$output"\")"
emacsclient.priority = 2

Here the -c option is essential. It causes emacsclient to ask emacs for a new frame, but, and this is the important bit, it also causes emacsclient (and hence Mercurial) to wait until the frame is closed.

When using XEmacs with gnuserv, you can try the following. It works, sort-of:

[merge-tools]
gnuclient.args = --eval "(progn
                         (select-frame (make-frame))
                         (ediff-merge-files-with-ancestor
                          \""$local"\" \""$other"\" \""$base"\" nil \""$output"\"))" \
                       $output
gnuclient.priority = 2

The final $output argument is needed to make gnuclient (and hence Mercurial) wait until you're done editing the file.

Work flow

The normal work flow consists of the following work items.

Note that there is extensive help available for each command. You can always do

hg help <subcommand>

to get help about the subcommand. To see a short list of basic commands, do

hg

To see a complete list of all available commands, do

hg help

Many commands can be abbreviated or have aliases. Use hg help <subcommand> to see which aliases are available. Generally, a command can be abbreviated to the shortest non-ambiguous prefix of the command.

Clone

First you need to make a copy (clone) of another version of the repository. The clone can be made from the central clone, or from one you made earlier (i.e., you can have cascading clones). The command to do this is:

hg clone <URL-of-originating-repository> <new-directory>

The <new-directory> argument is optional. It defaults to the basename (last component) of the <URL-of-originating-repository> argument. The <URL-of-originating-repository> can be the read-only http-style URL or the updatable ssh-style URL, but it can also be a local copy. When making a clone of a local repository, Mercurial will use links to the files in the originating clone (on capable operating systems) so that this is an efficient operation.

The possible URLs for the central MonetDB repository are:

https://dev.monetdb.org/hg/MonetDB/
ssh://hg@dev.monetdb.org/MonetDB/

The former is read-only and open to anyone, the latter is updatable and open only to the core developers.

History

There are several commands to look at the history of a clone.

To see a list of log messages, use:

hg log

In order to get the same list of log messages, but with a (line) graphic to better indicate the relation between changesets, you can use

hg log -G

In order to make this a little easier, you can use an alias so that you can use hg glog instead:

[alias]
glog = log -G

To see which line of a file was last modified in which changeset, you can use

hg annotate <file>

An alias for annotate is blame, as in, who is to blame for a particular line of code.

Another way of viewing the repository is by using your favorite Internet browser. First run

hg serve

then point your browser to

http://localhost:8000/

When you're done browsing, just kill (interrupt) the hg serve command.

Add, Remove, and Rename

When you want to add a file to the repository, create the file, and then use the following command to tell Mercurial about it:

hg add <file>

When a file is to be deleted from the repository (of course, its history will not be removed), you can use the command:

hg remove <file>

Note that there is also a command hg addremove. This command will add all new files and remove all missing files. Since often there will be extra files in the working directory that should not be added, it is not recommended to use this command.

When you want to rename a file or want to move it to another directory, use

hg rename <oldname> <newname>

Note that it is important to use this command rather than a combination of hg add and hg remove since only with hg rename will the change history be maintained over the file rename. You may need to use the --follow option in hg log to see the history, though.

Status and Diff

In order to see which files you have changed or added but not yet committed, you can use the command:

hg status

To see the actual changes, use:

hg diff

In order to not see lots of temporary files in the output of hg status you can create a file with patterns of file names that are to be ignored by Mercurial. See the file .hgignore in the MonetDB clone for more instructions.

Commit

After having changed something, you can commit your changes to your local repository using the command

hg commit

You need to give a message describing the change. This can be done on the command line using the -m option, or you can have Mercurial ask for it by not using the -m option.

Since Mercurial uses the first line of commit messages in summaries, it is a good idea to use the first line of your message as a summary of the change, and to use the following lines as an elaboration (if needed).

An alias for commit is ci.

If you have made multiple independent changes which you want to commit, you can use the --interactive option to interactively select which changes to commit.

There is a deprecated record extension which provides a record command to do the same. You can also use an alias for this:

[alias]
record = commit --interactive

Note that commit requires you to supply a log message. In other words, if you keep your log message empty, the commit will be aborted.

It is good practice to commit separate changes in separate commits. If in the process of fixing a bug you also do some refactoring, commit the two sets of changes separately.

Push

A commit only affects the current repository. If a change needs to be shared with another repository, it needs to be pushed to that repository. Pushing involves comparing the changesets of the current and remote repositories and sending the changesets that are missing on the remote repository there. Before pushing the changes, it's a good idea to see what will get pushed:

hg outgoing

The command to actually push the changes is:

hg push

When there are changes on the remote repository that are not yet on the local repository, Mercurial will warn you about that and suggest to use hg push -f. Don't follow that advice! Instead, first pull those changes from the other repository and merge them with your own changes, and then push again normally. See Pull and Update for extra information.

Pushing is only possible if you have write access to the remote repository. In the case of the MonetDB repository, only core developers have write access, and only if they have cloned the repository using the ssh scheme.

Notifications

When you push to the central repository, an e-mail message is sent for each changeset that you push. These messages are sent to the mailing list checkin-list@monetdb.org. Please consider subscribing to this mailing list by going to https://mail.monetdb.org/mailman/listinfo/checkin-list/. Alternatively, you can keep informed by subscribing to the RSS feed at https://dev.monetdb.org/hg/MonetDB/rss-log/.

Pull and Update

If there are changes in a remote repository that you want to get into your own repository, you need to pull them in. In order to see which changes are available, use the command

hg incoming

The command to do the actual pull is

hg pull

Pulling by itself does not affect the working files, it only adds the changesets to the repository. After pulling you need to update your working files. This is done using the command

hg update

The two steps can be combined into one by instead using

hg pull -u

During an update (either with hg update or hg pull -u), you may have to resolve conflicts. If you do, Mercurial will tell you. After resolving, you need to commit the resolution:

hg resolve
hg commit

It is recommended to not use the (deprecated) fetch extension. If you have enabled the fetch extension, you could use:

hg fetch

This command will do a pull and possibly a merge and commit. The problem is that you don't have enough control over how the merge and commit are done.

If you have done local commits (i.e. in your private clone), and you then pull in changes from a remote clone, you will get a new head which you will have to merge. This will result in extra changesets that only exist because unrelated changesets have to be combined. If you enable the rebase extension, you can use

hg pull --rebase

This command will rebase your local commits on top of the incoming changesets. This means that you don't need to create a separate changeset to merge your changes with the incoming changes.

To enable the rebase extensions, add the following to your configuration file:

[extensions]
rebase =

Updating the working set involves merging the changes that came from the remote repository with the changes made locally. Usually that can be done automatically, but sometimes there will be conflicts. If there are conflicts, Mercurial will start a program to help resolve the conflicts. Which program it starts depends on your configuration and on the programs that are installed on the system. By default, Mercurial is configured to try a number of different programs, one of which is bound to work. See Optional Configuration for some suggestions for a merge command.

Smart Updating and Merging

In order to keep the history of the repository easier to understand it is highly recommended to rebase your outgoing changesets on top of new incoming changesets. This is briefly described above. Rebasing may be hard to do if you also have locally modified files. If you copy the following code into a file and use that file as a shell script whenever you pull in changes, all the hard work will be done for you:

: # -*-shell-script-*-
shelve=
if [ -n "$(hg -q out --limit 1 --template '{node|short}\n')" ]; then
    # we have outgoing changes, move them on top of incoming changes
    u=--rebase
    if [ -n "$(hg status -mard)" ]; then
        # we have modified files, temporarily move the changes out of
        # the way
        shelve=temp-shelve-$RANDOM
        hg shelve -n $shelve
    fi
else
    u=-u
fi
hg pull $u
if [ -n "$shelve" ]; then
    # restore our changes
    hg unshelve $shelve
fi

Don't forget to add the rebase and shelve extensions to your configuration file:

[extensions]
rebase =
shelve =

Ad-hoc Collaboration

Pulling and Pushing

You can pull from other clones than the one you originally cloned from. This is done by specifying a source (i.e. remote clone) in the command:

hg pull <source>

The source can be the central repository (if you cloned from another clone which you want to bypass):

hg pull https://dev.monetdb.org/hg/MonetDB/

You can also use this technique to pull updates from a collaborator. If your collaborator is on a shared file system, you just need to be able to read their repository:

hg pull /path/to/collaborator/clone/

If you have SSH access to your collaborator's system, you can use an SSH-style scheme in the URL:

hg pull ssh://login@host.example.org/path/to/clone/

The /path/to/clone is actually a relative path to the home directory of the remote user. This mechanism could be used in combination with having your collaborator temporarily add your OpenSSH public key to their .ssh/authorized_keys file. If the /path/to/clone starts with two slashes, it is interpreted as an absolute path name.

Another way of getting access to another repository is to have your collaborator execute

hg serve

in their repository, and then accessing the server through an HTTP-style scheme in the URL:

hg pull http://host.example.org:8000/

You do need to access the server through any firewalls there may be.

Sending E-mails

It is also possible to collaborate over e-mail. Basically, you identify a changeset in your clone that you want to communicate, and then you send that by e-mail. For this to work, you need to set up some configuration first.

In your configuration file add the following:

[extensions]
patchbomb =
[email]
from = My Name <my.address@example.org>

If you want a copy of the mail you send, add this to the [email] section:

bcc = my.address@example.org

(or use cc instead of bcc)

Once the configuration is in order, you can send changesets using the command

hg email -a -r <revision>

This command will ask for the intended recipient and send the mail. The patch will be sent as an attachment (by virtue of the -a option).

The recipient of the message can save the attachment and then apply the contained patch using the command

hg import <patchfile>

where <patchfile> is the name of the saved attachment.

For more information see:

hg help patchbomb
hg help email
man hgrc

Debugging

Sometimes you know that a particular version of the software did not have some bug, whereas a later version does have the bug. The command hg bisect can help find the changeset that introduced the bug. The command works by first specifying a known good changeset and a known bad changeset. If you can also provide a (shell) command that can automatically find out whether a particular revision exhibits the bug, then Mercurial can do the rest:

hg bisect -g <good-changeset>
hg bisect -b <bad-changeset>
hg bisect -c <command-to-test-revision>

The checking whether a revision contains the bug can also be done manually. You need to run either hg bisect -g or hg bisect -b after a manual check to indicate whether the checked version was good or bad.

When you're done searching, reset the bisect state with

hg bisect -r

Note that this procedure assumes that the bug occurs in all versions after it was first introduced.

Working with Branches

Branches in Mercurial happen all over the place. Usually branches in Mercurial are anonymous. Sometimes you will see a message about multiple heads. These are, in fact, end points of, usually anonymous, branches. Usually these multiple heads need to be merged by using hg merge.

In MonetDB we use branches for development and release versions. These branches are not anonymous. This section explains how to work with these branches.

Use of Branches in MonetDB

In MonetDB we make extensive use of branches. Three branches are of particular interest. They have the nicknames development, candidate, and stable. The development branch is always the default branch ("default" is an official Mercurial term for the otherwise nameless main branch). The stable branch is the branch from which the last release of the MonetDB suite was created. The candidate branch is the branch from which the next feature release will be created. There is not always a candidate branch.

Apart from these three branches, there are usually a whole host of other branches active. Those branches are for actual development that can then happen more-or-less in isolation until the development is ready to be merged into the development (default) branch.

There is a strict hierarchy among the branches in MonetDB. Changes that happen in a particular branch are propagated (manually) to branches higher in the hierarchy. A bug should be fixed on the lowest branch in the hierarchy in which it occurs.

Fixing Bugs

It is worth reiterating the above point.

Bugs should be fixed in the lowest branch in which they occur. They then get propagated to (merged into) the next higher branch, usually when somebody needs the fix there or a number of changes have accumulated on the lower branch.

Switching Branches

If a branch already exists, you can switch the working files to it by using

hg update -r <branchname>

To see which branch is being used, use

hg branch

To see which branches are available, use

hg branches

If a branch does not have a head you will see a comment (inactive). This can happen if all changes on the branch have already been propagated to the default (development) branch.

To create a branch, use

hg branch <new-branchname>

After switching the working files to a branch, you can use the normal Mercurial commands to make and commit changes, push and pull changesets, and resolve conflicts.

The development branch is the one that was not created explicitly. Mercurial calls this the default branch, and so switching to the default branch is done using

hg update -r default

If you add the -C switch to the hg update command (as is suggested in various places), you discard any local and uncommitted changes you may have.

Propagating Changes Between Branches

When a bug is fixed in a release branch, typically that change needs to be propagated to the development branch. In Mercurial this is done by selecting the destination branch and then issuing the command

hg merge <branchname>

This command will merge all changesets on the named branch that have not yet been merged with the current branch. This will of course only work if all changesets are available in the clone, so if you're doing this in different clones, make sure that the clones are updated.

After a successful hg merge, you need to hg commit (and possibly hg push) the result of the merge.

Backporting Changes

If you want to backport a change that was made on the default branch to a release branch, you need the graft command. Use:

hg graft --log <rev>

where <rev> is the revision number of the changeset you want to backport.