International HapMap Project

About the International HapMap Project


The International HapMap Project is a multi-country effort to identify and catalog genetic similarities and differences in human beings. Using the information in the HapMap, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. The Project is a collaboration among scientists and funding agencies from Japan, the United Kingdom, Canada, China, Nigeria, and the United States. All of the information generated by the Project will be released into the public domain.

The goal of the International HapMap Project is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. [See What is the HapMap?] By making this information freely available, the Project will help biomedical researchers find genes involved in disease and responses to therapeutic drugs. In the initial phase of the Project, genetic data are being gathered from four populations with African, Asian, and European ancestry. Ongoing interactions with members of these populations are addressing potential ethical issues and providing valuable experience in conducting research with identified populations.

Public and private organizations in six countries are participating in the International HapMap Project. Data generated by the Project can be downloaded with minimal constraints.

Access the Data

Contents of the Data

  • Genotypes: Individual genotype data submitted to the DCC to date. Phase 3 data is available in PLINK format and HapMap format.
  • Frequencies: Allele & genotype frequencies compiled from genotyping data submitted to the DCC to date. These have also been submitted to dbSNP and should be available in the next dbSNP build.
  • LD Data: Linkage disequilibrium properties D', LOD , R2 compiled from the genotype data to date
  • Phasing Data: Phasing data generated using the PHASE software, compiled from the genotype data to date.
  • Allocated SNPs: dbSNP reference SNP clusters that have been picked and prioritized for genotyping according to several criteria (see info on how SNPs were selected). The file 00README contains per-chromosome SNP counts and further details.
  • Recombination rates and Hotspots: Recombination rates and hotspots compiled from the genotyping data.
  • SNP assays: Details about assays submitted to the DCC to date. PCR primers, extension probes etc., specific to each genotyping platform.
  • Perlegen amplicons: Details for mapping Perlegen amplicons to HapMap assayLSID. For primer sequences, see Perlegen's Long Range PCR Amplicon data.
  • Raw data: Raw signal intensity data from HapMap genotypes. Currently includes data from Affymetrix GeneChip 100k, 500k and GenomeWide 6.0 (hapmap3 samples only) Mapping Arrays.
  • Inferred genotypes: Genotypes inferred using the method of Burdick et al. Nat Genet 38:1002-4.
  • Mitochondrial and chrY haplogroups: Classification of phase I HapMap samples into mtDNA and chrY haplogroups. The distribution shown in Table 4 of the HapMap phase I paper (Nat Genet 38:1002-4) corresponds to unrelated parents in each one of the populations analyzed.