Software for Haplotype Inference

Reference:
Huang, Y.-T., Chao, K.-M., and Chen, T. An Approximation Algorithm for Haplotype Inference by Maximum Parsimony,
Journal of Computational Biology, 12: 1261-1274, 2005.

 

Given a set of genotypes, we wish to find a minimum subset of haplotypes that can resolve these genotypes. This problem has been shown to be NP-hard.

bulletFor short genotypes, we design and implement a software called SDPHapInfer.

bulletFor long genotypes, we design and implement a software called LongHapInfer.

Note that the zip files are compressed using WinZip. If you encounter any problem, please contact Prof. Kun-Mao Chao.

SDPHapInfer

SDPHapInfer is implemented in MatLab. The SDPHapInfer formulates the haplotype inference problem as the following integer quadratic programming (IQP) problem. The objective function is to minimize the number of haplotypes, and the constraint functions are to resolve each genotype.

To solve the IQP problem, we propose an iterative SDP-relaxation algorithm which finds a solution within a factor of O(log n) of the optimal solution. The SDP problem is solved via an external MatLab library called SDPA-M.

bulletDownload SDPHapInfer. This zip file contains four MatLab scripts: SDPHapInfer.m, ReadGenoFile.m, Invert.m, Chk_Resolve.m, and Find_Index.m.

bulletInput file format:
First line: #_of_genotypes #_of_SNPs. Second line: first genotype; Third line: second genotype; ...
For each locus on the genotype, we represent the homozygous wild type as 0, homozygous mutant as 1, and heterozygous site as 2. The following is an example containing 3 genotypes with 7 SNPs.
3 7
1022101
1001222
2211002

bulletTo run this program, type "SDPHapInfer" under the MatLab command line, and you will see a prompt for input of the file name.
>>SDPHapInfer
****************************************
*********** SDPHapInfer v0.4 ***********
****************************************
Please specify the input file name: test.dat
...

bulletThe SDPHapInfer will generate an output file *.out (e.g., test.dat.out) that contains solutions.
2211002 = 0011000 + 1111001 ;
1022101 = 1001101 + 1010101 ;
1001222 = 1001101 + 1001010 ;

Number of iterations: 1
Number of selected haplotypes: 5
List of selected haplotypes:
0011000
1001010
1001101
1010101
1111001

 

LongHapInfer

LongHapInfer is implemented in MatLab. This algorithm partitions the genotypes into segments, infers subhaplotypes for each segment, and concatenates subhaplotypes in different segments for the final solution. LongHapInfer requires an external MatLab library called SDPA-M.

bulletDownload LongHapInfer. This zip file contains four MatLab scripts: LongHapInfer.m, ReadGenoFile.m, Invert.m, Chk_Resolve.m, and Find_Index.m.

bulletInput file format:
First line: #_of_genotypes #_of_SNPs. Second line: first genotype; Third line: second genotype; ...
For each locus on the genotype, we represent the homozygous wild type as 0, homozygous mutant as 1, and heterozygous site as 2. The following is an example containing 3 genotypes with 7 SNPs.
3 7
1022101
1001222
2211002

bulletTo run this program, type "LongHapInfer" under the MatLab command line, and you will see a prompt for input of the file name.
>>LongHapInfer
****************************************
*********** LongHapInfer v0.1 ***********
****************************************
Please specify the input file name: test.dat
...

bulletThe LongHapInfer will generate an output file *.out (e.g., test.dat.out) that contains solutions.

 

 

This site was last updated 05/30/06