Application of Data Dimension Reduction Method in High-dimensional Data based on Single-cell 3D Genomic Contact Data

Authors

  • Zilin Wang College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
  • Ping Zhang College of Informatics, Huazhong Agricultural University, Wuhan 430070, China; School of Computer, BaoJi University of Arts and Sciences, Baoji 721016, China
  • Weicheng Sun College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
  • Dongxu Li School of Computer, BaoJi University of Arts and Sciences, Baoji 721016, China

DOI

https://doi.org/10.52810/TC.2021.100043

Keywords:

dimensionality reduction, single-cell Hi-C, PCA, t-SNE, LDA

Abstract

The volume and dimensions of data in a variety of fields, especially in biology, are increasing day by day, but our existing analytical methods are difficult to directly apply to high-dimensional data such as single-cell Hi-C Data. Here we perform unsupervised method PCA, t-SNE to reduce the dimensions for data visualization. And we further evaluate the information retention of decomposed components by using LDA classifier model. Our results suggest that those methods can capture and present information that we cannot directly observe.

Downloads

Download data is not yet available.

Author Biographies

Zilin Wang, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China

Zilin Wang Currently studying in the first year of BS at Huazhong Agricultural University. The research direction is bioinformatics.

Ping Zhang, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China; School of Computer, BaoJi University of Arts and Sciences, Baoji 721016, China

Ping Zhang Currently studying in the first year of PhD in Huazhong Agricultural University. He is a lecturer in Baoji University of Arts and Sciences. His current research interests include bioinformatics, machine learning and graph neural network.

Weicheng Sun, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China

Weicheng Sun Currently studying in the first year of BS at Huazhong Agricultural University. The research direction is bioinformatics, machine learning and graph neural network.

Dongxu Li, School of Computer, BaoJi University of Arts and Sciences, Baoji 721016, China

Dongxu Li Graduated with B.S in the Department of Computer of Baoji University of Arts and Sciences from 2018 to 2022. His current research interests include machine learning and computer vision.

References

Rosenthal, M., Bryner, D., Huffer, F., Evans, S., Srivastava, A., & Neretti, N. (2019). Bayesian estimation of three-dimensional chromosomal structure from single-cell Hi-C Data. Journal of Computational Biology, 26(11), 1191–1202.

Yang, T., Zhang, F., Yardımci, G. G., Song, F., Hardison, R. C., Noble, W. S., Yue, F., & Li, Q. (2017). HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949.

Ursu, O., Boley, N., Taranova, M., Wang, Y. X. R., Yardimci, G. G., Noble, W. S., & Kundaje, A. (2017). GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs. BioRxiv, 1–15.

Yan, K., Gu, G., Yan, C., Noble, W. S., & Gerstein, M. (2017). HiC-spector : a matrix library for spectral and reproducibility analysis of Hi-C contact maps. Bioinformatics, 33(March), 2199–2201.

Sauria,M.E.G. and Taylor,J. (2017) QuASAR: quality assessment of spatial ar- rangement reproducibility in Hi-C data. BioRxiv

Yu, M., Abnousi, A., Zhang, Y., Li, G., Lee, L., Chen, Z., Fang, R., Wen, J., Sun, Q., Li, Y., Ren, B., & Hu, M. (2020). SnapHiC: A computational pipeline to map chromatin contacts from single cell Hi-C data. BioRxiv.

Lindsay, R. J., Pham, B., Shen, T., & McCord, R. P. (2018). Characterizing the 3D structure and dynamics of chromosomes and proteins in a common contact matrix framework. Nucleic Acids Research, 46(16), 8143–8152.

Zhou, J., Ma, J., Chen, Y., Cheng, C., Bao, B., Peng, J., Sejnowski, T. J., Dixon, J. R., & Ecker, J. R. (2019). Robust single-cell Hi-C clustering by convolution- And random-walk-based imputation. (PNAS) Proceedings of the National Academy of Sciences of the United States of America, 116(28), 14011–14018.

Liu, J., Lin, D., Yardlmcl, G. G., & Noble, W. S. (2018). Unsupervised embedding of single-cell Hi-C data. Bioinformatics, 34(13), i96–i104.

Rosenthal, M., Bryner, D., Huffer, F., Evans, S., Srivastava, A., & Neretti, N. (2019). Bayesian estimation of three-dimensional chromosomal structure from single-cell Hi-C Data. Journal of Computational Biology, 26(11), 1191–1202.

Lee, D. S., Luo, C., Zhou, J., Chandran, S., Rivkin, A., Bartlett, A., Nery, J. R., Fitzpatrick, C., O’Connor, C., Dixon, J. R., & Ecker, J. R. (2019). Simultaneous profiling of 3D genome structure and DNA methylation in single human cells. Nature Methods, 16(10), 999–1006.

Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003.

Hu M, Deng K, Selvaraj S, Qin Z, Ren B, Liu JS. 2012. HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics 28: 3131–3133.

Knight, P. A. and Ruiz, D. (2013). KR A fast algorithm for matrix balancing. IMA Journal of Numerical Analysis, 33(3), 1029–1047

Sun, W., Zhang, P., Wang, Z., & Li, D. (2021). Prediction of Cardiovascular Diseases based on Machine Learning. ASP Transactions on Internet of Things, 1(1), 30–35.

Yardımcı, G. G., Ozadam, H., Sauria, M. E. G., Ursu, O., Yan, K. K., Yang, T., Chakraborty, A., Kaul, A., Lajoie, B. R., Song, F., Zhang, Y., Ay, F., Gerstein, M., Kundaje, A., Li, Q., Taylor, J., Yue, F., Dekker, J., & Noble, W. S. (2019). Measuring the reproducibility and quality of Hi-C data. Genome Biology

Li, Y., & Cao, J. (2021). WSN Node Optimal Deployment Algorithm Based on Adaptive Binary Particle Swarm Optimization. ASP Transactions on Internet of Things, 1(1), 1–8.

Application of Dimension Reduction Method in High-dimensional Data Based on Single-cell 3D Genomic Contact Data

Downloads

Published

2021-07-02

How to Cite

Wang, Z., Zhang, P., Sun, W., & Li, D. (2021). Application of Data Dimension Reduction Method in High-dimensional Data based on Single-cell 3D Genomic Contact Data. ASP Transactions on Computers, 1(2), 1–6. https://doi.org/10.52810/TC.2021.100043

Issue

Section

Regular Paper