This is the old version of
the documentation: New
ChIP-Seq Analysis: Step 1, Creating a "Tag Directory"
To facility the analysis of ChIP-Seq (or any other type of short read
re-sequencing data), it is useful to first transform the sequence
alignment into platform independent data structure representing the
experiment, analogous to loading the data into a database. HOMER
does this by placing all relevant information about the experiment into
a "Tag Directory", which is essentially a directory on your computer
that contains several files describing your experiment.
To create a "Tag Directory", you must have alignment files in one of
the following formats:
If your alignment is in a different format, it is recommended that you
convert it into a BED file format:
- BED format
- *.eland_result.txt or *_export.txt format from the Illumina
- bowtie output format
Column2: start position
Column3: end position
Column4: Name (or strand +/-)
Column5: Number of reads at this position
Column6: Strand +/-
Alternatively (or in combination), you can make tag directories from
existing tag directories or from tag files (explained below).
To make a tag directory, run the following command:
[options] <alignment file1>
[alignment file 2] ...
Where the first argument must be the output directory (required).
If it does
not exist, it will be created. If it does exist, it will be
Several additional options exist for
makeTagDirectory. The program attempts to guess the format of
your alignment files, but if it is unsuccessful, you can force the
format with "-format <X>".
directories, for example when combining two separate
experiments into one, do the following:
What does makeTagDirectory do?
basically parses through the alignment file and splits the tags into
separate files based on their chromosome. As a result, several
*.tags.tsv files are created in the output directory. These are
made to very efficiently return to the data during downstream
analysis. This also helps speed up the analysis of very large
data sets without running out of memory.
In the end, your output directory will contain several *.tags.tsv
files, as well as a file named "tagInfo.txt".
information about your sequencing run, including the
total number of tags considered. This file is used by later
programs to quickly reference information about the experiment, and can
be manually modified to set certain parameters for analysis.
performs several quality control steps which are covered in the next
Command line options of makeTagDirectory command:
<directory> <alignment file 1> [file 2] ... [options]
platform-independent 'tag directory' for later analysis.
Currently BED, Eland, and
bowtie files are accepted. The program will try to
automatically detect the
alignment format if not specified
Existing tag directories can
be added or combined to make a new one using -d/-t
If more than one format is
needed and the program cannot auto-detect it properly,
make separate tag
directories by running the program separately, then combine them.
(specify genome for later analysis)
genomes, run "??"
(optional, names the experiment)
X can be: (with column specifications
from bowtie (run with --best -k 2 options)
from basic eland
from illumina pipeline (22 columns total)
mapping with bowtie)
mapping of each read regardless if multiple equal
column of BED file contains stupid values, like
of tags, then ignore this column)
[tag directory 2] ... (add Tag directory to
new tag directory)
[tag file 2] ... (add tag file i.e. *.tags.tsv to
new tag directory)