China National Center for Bioinformation
Guangdong-Hong Kong-Macao Greater Bay Area

Published:21 Mar. 2021 | Version 2.0

Description of this dataset

Dataset of the chest X-Ray images (CXRs) is constructed from cohorts from the China Consortium of Chest X-ray Image Investigation (CC-CXRI). All CXRs are classified into COVID-19 pneumonia due to SARS-CoV-2 virus infection, other viral pneumonia, bacterial pneumonia, other lung disorders, and normal controls.

ChestDx and ChestDx-PE were used to developed AI models for identifying common chest diseases with labels of 14 common thoracic pathologies, including atelectasis, cardiomegaly, consolidation, edema, effusion, emphysema, fibrosis, hernia, infiltration, nodule, mass, pleural thickening, pneumonia, and pneumothorax. ChestDx is a dataset which consists of patients from hospital visits; ChestDx-PE is another dataset which consists of additional patients who underwent a routine annual physician examination.

For pneumonia diagnosing/triaging with an application to COVID-19 pneumonia, the dataset (CC-CXRI-P) consists of viral pneumonia (including COVID-19 pneumonia), other types of pneumonia, and normal controls.

This dataset and AI code are available globally with the aim to assist the clinicians and researchers to combat the COVID-19 pandemic.


Please cite this paper

Guangyu Wang, Xiaohong Liu, Jun Shen, et al. Kang Zhang, Weimin Li, Tianxin Lin. (2021). A deep-learning pipeline for the diagnosis and discrimination of viral, non-viral and COVID-19 pneumonia from chest X-ray images. Nature Biomedical Engineering,


Views:  3043
Downloads:  4622


Experiment data files

Data is available on the following table. The data is structured as follows:

  1. to : 51 zip files from ChestDx, containing CXRs of common thoracic diseases.
  2. to : 23 zip files from ChestDx-PE, containing CXRs of common thoracic diseases.
  3. to : 4 zip files from CC-CXRI-P, containing CXRs of normal controls.
  4. to : 2 zip files from CC-CXRI-P, containing CXRs of viral pneumonia.
  5. to : 2 zip files from CC-CXRI-P, containing CXRs of other pneumonia.
  6. chest_dx.csv : a file for the ChestDx dataset, with one row for each image, including labels of common thoracic diseases.
  7. viral_pneumonia_COVID-19.csv : a file for the CC-CXRI-P dataset, with one row for each image, including a COVID-19 tag.
Category File Size MD5 summary Link
ChestDx 573MB c5ef5205687caca0d4520372a522ec57 download 543MB 702821de09bdb9b45602d2a010ac904b download 584MB b12ea5dc7e4253c8e4c06a0f1cb3245b download 545MB 4fe9a583654f72950cd811b52eb9013c download 570MB 3cfcbcacf5b900654ada873bf9f01401 download 576MB d0988bffed859035e8ee7fc648ffc3e9 download 570MB 4c9a1de3531ec4269678fec1fd128852 download 576MB b5c5be09d8c9ab098842d3be141eaf80 download 588MB 6c3b147c74227a1eb605d7d746a20164 download 556MB a271700875ea47062fd858991c70f804 download 535MB cf0ce3a405924f8fef8a8017617a9969 download 543MB 681032edb0319d0b8a36e7f5f2da8c3d download 578MB 5280721be6616119b225a0fd1b4a62b7 download 552MB 7b381b7cb0268bb3805be7af9e8060fc download 574MB 95a103ae1418051fb30bc8a9be2588cd download 548MB 8eb71b9d2a7b97c31a81fbcdb5ab9ba7 download 543MB 0edce345b65f5be27b3e7e5dded37b2a download 609MB 116d25588c4fae52230f9558034539a3 download 547MB f8092348304a37bf67eb45b8b395279c download 573MB 05285cc69b1e4780d1057b18ef487cba download 566MB 07a5ef8007577211d1032ba9bd52a818 download 591MB ab281a2d44d1fc2d9b0b643802bab205 download 564MB 34d810ee2124ea522dd17fa4a7450fe3 download 560MB 5c1886488351d37d3f62e7e54ddee563 download 543MB 234546e6a7572163d0a0afc88d80a4a9 download 550MB 134a3536ced2540921823d68367c2a16 download 540MB 3fdbba6bb5551403e33d1d3cbad41291 download 592MB 174c6a4942cb24e964ee9c3796b8035d download 556MB d663782a7fdf3aa5f85b9b210d4a68da download 551MB e2d07a42e0b0657466753a3beb1cb6e8 download 551MB 78766adcf2fb4f6de2b59b4baa87ce95 download 567MB 4906e7262ceffb643ac24ef2d23ebe5c download 553MB 75caeb01b45f94b215f9d06d9753df5c download 544MB 7908df7404562d10a0abacec4e928485 download 576MB 3cd32814b2ba6c80496cbae901482ff1 download 575MB 98a531e3aafea2eaaf817b6607fd0413 download 565MB 0131bdc2fdfa3239fb95824b1b92cca7 download 561MB 08a4498fa90caca6385b7aff5474f420 download 536MB 3fb8beb9690961ced3859f0301d5e45f download 575MB 3223dc1c190c9db3de048519c9307ce8 download 515MB 4a8d0e7e7860d2897326cbf17468a197 download 545MB ef626826246a8e9f0ea1d975b1d11659 download 581MB 7d3a0ab58f8546be2e661884aa976e8b download 545MB a4a65eb5e6d254183d6d79024eeb73cc download 556MB bf7d22cc9eae4a1cd69221b54ebb66b1 download 558MB f5204c8756329d644a1b6c36f4a9ff07 download 563MB ba2b4cd985ca757ed33301aaccfed704 download 564MB fba01a2e4f4bf3c49f502352da8abefa download 552MB c0306b3a8eebcfcfc6ae9645ac118e1e download 542MB 2dc9df334137bf9ce69cc59744a61f93 download 515MB 8b00a342ed0e70654bb583db24a34970 download
ChestDx_PE 576MB 56bdf48d6b9bedb5be3267143edb0ae1 download 552MB 1c58bd30316a2a3ab7f8f8d748aaaf31 download 569MB cb13fc9290e5be4bd54fae704627a78d download 554MB 87be0cedd6f51d136b80255c3ea347ef download 550MB f2870171cf1825680d50d7d489676923 download 553MB 8f4e57925fffe6b10e5bd4f89c9e530d download 568MB 63d72cbc7f9760703c1d5dfbafe32524 download 555MB 7a205c7731c28a637c039b74386c81aa download 591MB 4fb4665acbd07ede7d24d26a9f3c4e67 download 550MB 42069f031de169ab9922fd8eb9d16f5d download 554MB 4dcb412b91e21a1868164fef96ade18f download 577MB 5fce47cb82d0c52d9159fb40103d05ed download 570MB dd6a0a37bb476c27ded46e2bfe10f60a download 579MB ea0f083d3f80a7bb22bc43c462740f00 download 572MB 89157192de5fdfedc531a66cc1a4b1c1 download 564MB 38591a9c5235370c9b198d78adb6a681 download 553MB 4cfa7ec9e35351e73cd4bc93c5f8aa28 download 572MB c9499960575d21a086433a392674b3d1 download 541MB 563818f8050935a6900a2a909a686901 download 559MB 59dc72a4ef4207e6b8a796dd31aa74bb download 591MB d2a9cbff38e17b74e929d7bb9d9e961a download 560MB 877545ce7bd7bf5f8aac854ccc98e630 download 579MB 61025bca73fff5b71d9e447803b07326 download
  chest_dx.csv 11MB 5d6a62877294be00eafffbbc03d27ca0 download
Normal controls 223MB a86fdf00c562b835b7cd07eccbe668af download 206MB f22464709a8bc9dec3d480307eb015da download 213MB 6540cd2c7a7ededd35c3e4419a5ecbed download 210MB 7e4f3e1538141c98aa98d57bfd89909f download
Viral pneumonia 157MB 6a060d8bb142f835437c749831be9b10 download 166MB e52aa3a81746f929f451e418a2d44e92 download
  viral_pneumonia_COVID-19.csv 59K 244dac585baf283e4d95e34b4b1c6a76 download
Other pneumonia 184MB 6a060d8bb142f835437c749831be9b10 download 95MB 9b7dd54372ca1ad0a28ebb4676f4dc43 download

Change log

Version 2.0

Released 2021-03-21

  1. Updated the ChestDx, ChestDx-PE, and CC-CXRI-P set with csv files.

Version 1.0

Released 2021-01-17

  1. Release the base dataset and AI code.