Small size InDel variants calling First, InDels (insertions and deletions) with lengths of less than 10 bp were extracted from the gap extension alignment between the genome assembly and the reference using LASTZ (Version 1.01.50). Second, we removed the unreliable InDels containing N base within 50 bp upstream and downstream, and we removed InDels with more than two mismatches within a total of 20 bp upstream and downstream. Finally, the candidate InDels were verified by comparing sample reads to the surrounding PND-1186 region of the InDels (100 bp each side) with MK-8931 chemical structure the reference
sequence by using BWA (Version 0.5.8) [20]. Synteny analysis The LCT-EF258 target sequences were ordered according to the reference sequence based on MUMmer. Then, the X and Y axes of the two-dimensional synteny graphs and the upper and following axes of linear syntenic graphs were constructed after the same proportion of size reduction in the length of both sequences. The protein set P1 of the target sequence was aligned with the protein set P2 of the reference sequence using BLASTP (e-value < = 1e-5, identity > = 85%, and the best hit of each MLN2238 in vitro protein was selected). Finally, the results with the best-hit value were reserved and the average of two consistent values was obtained. Transcriptome sequencing and comparison Sequencing and filtering Total
RNAs were purified using TRIzol (Invitrogen) and rRNA was removed. Then, cDNA synthesis was performed with random hexamers and Superscript II reverse transcriptase (Invitrogen). Meanwhile, double-stranded cDNAs were purified with a Qiaquick PCR purification kit (Qiagen) and sheared with a nebuliser (Invitrogen) very to ~200 bp fragments. After end repair and poly (A) addition, the cDNAs were ligated to Illumina N-acetyl-D-galactosamine (pair end) adapter oligo mix and suitable fragments were selected as templates by gel purification. Next, the libraries were PCR amplified and were sequenced using the Illumina Hiseq 2000 platform and the paired-end sequencing
module. The filtration consisted of three steps: removing reads with 1 bp of Ns’ base numbers, removing reads with 40 bp of low quality (≤Q20) base numbers, and removing adapter contamination. Additionally, reads mapped to the reference (LCT-EF90) rRNA sequences were removed. All gene expression data generated in this study have been deposited under accession numbers SRR922447 and SRR922448 (https://trace.ddbj.nig.ac.jp/DRASearch/). Gene expression value statistics The gene coverage was evaluated by mapping clean reads to the reference genes using SOAPaligner software, and the gene expression value was calculated by the RPKM (Reads Per kb per Million reads) formula based on the method described in Ali et al. [21]. The RPKM method was able to eliminate the influence of gene length and sequencing discrepancy on the gene expression calculation.