We retrieved the mapped sequenced reads of nucleosome fragments  and ChIP-Seq mapped read data for H2A.Z, 20 histone methylations  and 18 histone acetylations . Similarly, we analyzed DNA methylation data from the same cell-type . These datasets contained all reads that match the genome in a unique position with up to two mismatches. To minimize sequence amplification bias, we removed identical reads. We shifted the start position of the reads by 75 bp in the direction of sequencing (75 bp is approximately half of the length of the isolated DNA fragments), this way transforming the read start positions to nucleosome dyad positions. All datasets were rescaled to 10 million uniquely mapped nucleosome fragments. To generate the average chromatin profiles shown in the figures, we counted the number of dyads that fall at each position along the region surrounding the gene start site. Smoothed lines were generated based on the per-base-pair averaged position-shifted read count using the loess regression function in R (with 180 bp span) . The predict.loess R function was used for the calculation of 95% confidence intervals. For the background subtracted chromatin profiles included in Additional file 3 we used a 75-bp window sliding by 1 bp and calculated the difference between the number of shifted reads from the histone modification (or H2A.Z) and the number of shifted reads from the nucleosome occupancy. At each position with respect to the transcription start site we then calculated the mean and the standard error of the background-subtracted values assuming a Normal distribution. Heatmaps were generated using Java Tree View 1.1.5r2 . We repeated all chromatin profiles using data from a fetal lung fibroblast cell line (IMR90) generated by the NIH Roadmap Epigenomics Project [64, 65]. We downloaded the mapped reads provided as BED files. Because these reads were mapped to human genome version hg19, we converted all gene promoters from hg18 to hg19 using the LiftOver tool. These profiles are shown in Additional file 9. The accession identifiers of the samples used for these profiles are included in the figure legend. Regions of statistically significant CTCF binding in CD4+ T cells (used in Additional file 10), based on the data from Barski et al. , were retrieved from Ensembl (regulatory build of Ensembl release 68). We defined distal CTCF binding sites as those not overlapping any annotated Ensembl gene. The coordinates of CTCF peaks were converted from human genome assembly hg19 to hg18 using the liftOver tool .
Additional file 9: Chromatin profiles of expression-matched CpG and non-CpG promoter genes in IMR90 cells. Note that the CD4+ T-cell chromatin profiles shown in the main paper and the IMR90 chromatin datasets shown here were generated with different chromatin immunoprecipitation protocols. In the case of CD4+ T cells, nucleosomes were isolated by micrococcal nuclease digestion before immunoprecipitation. In the case of IMR90 cells, chromatin was sonicated before immunoprecipitation. These experimental differences may account for some of the differences in the CD4+ T-cell chromatin profiles and the IMR90 chromatin profiles. The GEO file accession numbers for the datasets used here are GSE2672 (a), GSM521890 (b), GSM521915 (c), GSM521904 (d), GSM752986 (e), GSM521899 (f), GSM521895 (g), GSM521866 (h), GSM521881 (i), GSM521885 (j), GSM469975 (k). Promoters are annotated according to UCSC downloaded CpG islands . (PDF 2 MB) 2b1af7f3a8