This post is an extended discussion covering removal of paths from a Subversion repository, covered more succinctly in the Subversion FAQ.
Before you start removing paths from your repository consider this: Subversion was not designed to have paths removed. That is why you cannot remove paths using the standard client interface, in fact you need to be an administrator and have direct access to the repository’s physical file system. Removing paths from a repository is a last resort. (There is a suggestion for an svn obliterate feature to provide path removal through the standard client, for which a functional specification has been placed in the Subversion development notes, but until this is implemented the following is the only mechanism available for permanently removing paths from a repository.)
Why might you want to permanently remove a path from your repository? Legitimate reasons include:
- Archive old material from a very large repository to save space.
- Remove sensitive information from a repository (perhaps added accidentally)
Strictly speaking we do not actually remove paths from a repository, we build a completely new repository. To do this we use three commands; svnadmin dump, svndumpfilter, svnadmin load.
svnadmin dump
The svnadmin dump command creates a stream of data that describes all of the data in the repository’s virtual file system.
svnadmin dump /path/to/repository
This command will produce a stream of data describing the content of the repository at path /path/to/repository. This stream of data is sent to STDOUT, usually the screen by default.
The output from svnadmin dump can be redirected into a file, for example.
svnadmin dump /svn/oldrepos > dump.file
This command will create a dump file called dump.file containing all of the output from the svnadmin dump command.
svnadmin load
The svnadmin load command performs the opposite function to svnadmin dump. It takes a stream of data describing the content of a repository and loads in into a repository database. The stream is read from STDIN, usually the keyboard by default.
svnadmin load /path/to/repository
This command will read a stream of data from STDIN and place it into the repository at /path/to/repository.
The repository at path /path/to/repository must exist before you run the svnadmin load command.
Similarly to the svnadmin dump command, input can be redirected.
svnadmin load /svn/newrepos < dump.file
This command will read in the file dump.file to STDIN for the svnadmin load command to process. In this way you can process the dump file and then load it into another repository.
If you wanted to copy a repostory you could use dump and load as follows.
svnadmin create /svn/newrepos
svnadmin dump /svn/oldrepos | svnadmin load /svn/newrepos
Here the output from the svnadmin dump command is being piped directly into the svnadmin load command with no intermediate dump file. We will use this technique a little later to create one command line to build a new repository with paths filtered out from the original.
svndumpfilter
svndumpfilter reads input from STDIN and produced output to STDOUT. It will also include or exclude specified paths as it encounters them.
This allows us to svnadmin dump a repository, filter the data with svndumpfilter and then insert the filtered data into another repository with svnadmin load
The svndumpfilter has two sub-commands, include and exclude. As their names suggest these specify paths we want to include in, or exclude from, the data processed through svndumpfilter.
svndumpfilter exclude trunk
This command will read data from STDIN, remove any references to paths beginning trunk, and write the resulting data to STDOUT.
If we have a dump file called dump.file from which we want to remove all paths beginning trunk and create a new dump file called newdump.file we use the following command.
svndumpfilter exclude trunk < dump.file > newdump.file
We can bypass the need to use these intermediate files and chain all three commands together.
Suppose I have a repository at /svn/repos and I want to remove all paths starting trunk/mistake. We create a new, empty repostory to receive the processed data, then use svnadmin dump, svndumpfilter and svnadmin load to do all the heavy lifting.
svnadmin create /svn/newrepos
svnadmin dump /svn/repos | \
svndumpfilter exclude trunk/mistake | \
svnadmin load /svn/newrepos
[Note: The '\' indicates a line continuation. The last three lines in the previous example are all one command line]
Once this is complete, we have two repositories; /svn/repos and /svn/newrepos. You now need to replace /svn/repos with /svn/newrepos (the subject perhaps of another post).