Entually, 2, 229 CRF01_AE sequences were included for subsequent analysis. In order to exclude the influence of convergent evolution at drug resistance SIS3 web mutation site on the phylogenetic analysis, 45 sites in protease (PR) and reverse transcriptase (RT) were removed before phylogenetic analysis. HIV-1 Drug resistance SIS3MedChemExpress SIS3 mutations were determined using Stanford University HIV Drug Resistance Database tool: HIVdb Program: Sequence Analysis (http://sierra2.stanford.edu/sierra/servlet/ JSierra?action=sequenceInput) and the last updated (Oct-Nov 2015) guidelines from the International AIDS Society Resistance Testing-USA panel27. HIV-1 transmitted drug mutations (TDR) were determined based on the WHO 2009 list of mutations for surveillance of TDR HIV strains (http://hivdb.stanford.edu/pages/ WHOResistanceList.html). FastTree 2.328 was used to estimate an approximately-maximum likelihood phylogenetic tree for pol sequences using the GTR + G + I nucleotide substitution model. The phylogenetic tree’s reliability was determined with local support values based on the Shimodaira-Hasegawa (SH) test29 and presented using FigTree v1.3.1 (http://beast.bio.ed.ac.uk). Monophyletic groups with bootstrap support 0.9 were identified as a lineage.Phylogenetic analyses.Bayesian skyline plot analysis and divergence time estimation.Bayesian skyline plot (BSP) analysis was conducted to explore the changes in the effective population size of CRF01_AE among MSM over time in Shanghai. The models selected were GTR + Relaxed clock (uncorrelated) + Bayesian skyline. The tMRCAs of CRF01_AE lineages were estimated using a Bayesian inference approach to explore the timescale of CRF01_AE expansion among Shanghai MSM. Rates of evolution (in units of nucleotide substitutions/site/year) were estimated simultaneously. Phylogenies were inferred using BEAST v.1.7.2. The Markov chain Monte Carlo (MCMC) analysis was computed for 200 million generations and sampled every 1,000 steps and output was assessed for convergence by means of effective sampling size (ESS) after a 20 burn-in using Tracer. To minimize the effects of standard errors, only ESS 200 were accepted.Identification and analysis of genetic transmission networks. The flowchart of genetic transmission networks was depicted as Supplementary material 8. Briefly, transmission clusters were extracted from the phylogenetic tree using the software Cluster Picker30. Transmission clusters were defined as node support threshold greater than 95 and intra-cluster maximum pairwise genetic distances less than 3.0 nt substitutions per site. As a significant proportion of the sequences (26.3 ) were presumably from patients in the stage of long-standing infection (details in below), we chosen a threshold of genetic distance confined to 3.0 31?3, in order to identify relevant transmission clusters30. The pairwise genetic distances of all sequences within the available clusters were calculated. The minimum genetic distances algorithm was used to define the linkages within a cluster (Supplementary material 9). For visualizing and analyzing network, the network data were processed using a custom R script utilizing the network package in the R software34. Analysis of individuals with potential transmission Links. Three groups were compared including (1)individuals with no link to others, (2) individuals who linked to another one, and (3) individuals who linked to 2 others. Chi-square test was used to determine linking-associated factors amo.Entually, 2, 229 CRF01_AE sequences were included for subsequent analysis. In order to exclude the influence of convergent evolution at drug resistance mutation site on the phylogenetic analysis, 45 sites in protease (PR) and reverse transcriptase (RT) were removed before phylogenetic analysis. HIV-1 Drug resistance mutations were determined using Stanford University HIV Drug Resistance Database tool: HIVdb Program: Sequence Analysis (http://sierra2.stanford.edu/sierra/servlet/ JSierra?action=sequenceInput) and the last updated (Oct-Nov 2015) guidelines from the International AIDS Society Resistance Testing-USA panel27. HIV-1 transmitted drug mutations (TDR) were determined based on the WHO 2009 list of mutations for surveillance of TDR HIV strains (http://hivdb.stanford.edu/pages/ WHOResistanceList.html). FastTree 2.328 was used to estimate an approximately-maximum likelihood phylogenetic tree for pol sequences using the GTR + G + I nucleotide substitution model. The phylogenetic tree’s reliability was determined with local support values based on the Shimodaira-Hasegawa (SH) test29 and presented using FigTree v1.3.1 (http://beast.bio.ed.ac.uk). Monophyletic groups with bootstrap support 0.9 were identified as a lineage.Phylogenetic analyses.Bayesian skyline plot analysis and divergence time estimation.Bayesian skyline plot (BSP) analysis was conducted to explore the changes in the effective population size of CRF01_AE among MSM over time in Shanghai. The models selected were GTR + Relaxed clock (uncorrelated) + Bayesian skyline. The tMRCAs of CRF01_AE lineages were estimated using a Bayesian inference approach to explore the timescale of CRF01_AE expansion among Shanghai MSM. Rates of evolution (in units of nucleotide substitutions/site/year) were estimated simultaneously. Phylogenies were inferred using BEAST v.1.7.2. The Markov chain Monte Carlo (MCMC) analysis was computed for 200 million generations and sampled every 1,000 steps and output was assessed for convergence by means of effective sampling size (ESS) after a 20 burn-in using Tracer. To minimize the effects of standard errors, only ESS 200 were accepted.Identification and analysis of genetic transmission networks. The flowchart of genetic transmission networks was depicted as Supplementary material 8. Briefly, transmission clusters were extracted from the phylogenetic tree using the software Cluster Picker30. Transmission clusters were defined as node support threshold greater than 95 and intra-cluster maximum pairwise genetic distances less than 3.0 nt substitutions per site. As a significant proportion of the sequences (26.3 ) were presumably from patients in the stage of long-standing infection (details in below), we chosen a threshold of genetic distance confined to 3.0 31?3, in order to identify relevant transmission clusters30. The pairwise genetic distances of all sequences within the available clusters were calculated. The minimum genetic distances algorithm was used to define the linkages within a cluster (Supplementary material 9). For visualizing and analyzing network, the network data were processed using a custom R script utilizing the network package in the R software34. Analysis of individuals with potential transmission Links. Three groups were compared including (1)individuals with no link to others, (2) individuals who linked to another one, and (3) individuals who linked to 2 others. Chi-square test was used to determine linking-associated factors amo.