top of page

Healthcare Data Science Books

An attempt to try to identify a comprehensive list of books focused on / relevant healthcare focused data science.  Over time I'll attempt to review as many of these books as possible, but in the meantime I provide a short thought about my impression. 

 

As data science is becoming more pervasive in healthcare, books are multiplying like bunnies - in an attempt to be more focused I don't include every book that includes the words 'health data' in the title.  There are plenty of books focused on health IT, clinical quality, visualization, etc, that I'm excluding because I think a reasonable person could quickly recognize they're not a foundational data science text.  I'm trying to identify all the apparent general-purpose books that a novice might investigate to learn healthcare data science - if I've missed something relevant, let me know!

​

Source is amazon.  

Books identified using search term 'healthcare data science', 'ehr data'.  

Focused on methods applied to certain healthcare datasets like MEPS and NHANES using R.  Several references to using this as a textbook for graduate school methods courses.  â€‹â€‹â€‹

​

HEAVILY, if not EXCLUSIVELY, focused on process control charts and related methods.  This book seems to focus on one subset of methods for the domain of 'quality' or 'performance improvement' with some specific healthcare examples.  â€‹

​

Check out my review of this book.  My opinion - not a book about healthcare, but a book about using Python for analysis.  This  Amazon review sums up my opinion:  "This is just a generic overview of some machine learning methods. The case studies are not that good either. I was expecting at least some practical examples to work through but no luck. No math, no code. An entry level book with a specialized title."​

​

Haven't read it, but look forward to it.  Does seem to have a focus on data rather than methods, although some of that focus is VERY high level (i.e. tabular data vs time series).  This book also appears written for the EU market, with a section on GPDR and data standards (does the EU use HL7 and FHIR??).  Potentially interesting introductory book to data in general, with some focus on healthcare data.  â€‹

​

Another book I haven't read, but looks interesting for advanced data scientists focused on expanding methods or doing projects in new areas.  This is an academic-anthology style book, where the chapters are written by different (academic) authors and then the chapters edited together.  I find those books are often most geared towards other academics/researchers or other very advanced users.  The scope is large and includes discussions on analysis of images, sensors, signals, text, genomics, biomedical literature, social media, etc, although I imagine each chapter is more of an overview than a cookbook of how to get up and running on that topic.  Of note, the first chapter is explicitly on data sources (restricted to EHR) and comprises only about 30 pages according to the TOC.  

​

It's like the publisher or editor wanted to jam every possible concept into the title to maximize SEO.  This is another academic/anthology edited book that appears to focus much more on advanced concepts than foundational ones.  There are a few early chapters that appear more about broad-strokes EHR information (databases, querying, data pre-processing) before jumping into methods and causal inference. â€‹

​

The title alone should make it clear this is a STONE COLD health IT / standards / interoperability tome.  Interfacing and standards are not my thing, but if they were - this book would probably be under my pillow.  Seriously - there's a section on 'Comparison of HL7 vs ISO/TS 18308' which sounds as exciting as comparing Linux kernels, but at least it's some real information.  This book does spend about 40 pages covering coding systems, which could  make it worth a look (and even includes a section on the UMLS, don't see that every day), but it also seems like they don't fully explain some basics like procedural coding - ICD is only diagnostic and CPT is only procedures?  I haven't read the book, but those sections would require a little investigation...

​​

To my knowledge, this was really the first book focused EMR analysis.  Since, at the time, it was the only EMR analysis book I used it as the quasi-textbook for the EMR data course I taught.  This is another academic, anthology book, but I believe all (or almost all) of the authors are in and around the MIT physiology lab that creates, uses, and curates the MIMIC datasets.  Many of the chapters are devoted to the specific projects/papers that PI's, postdocs, etc, associated with the lab have published.  â€‹To me, this is a pretty good book if you're already an academic experienced with healthcare data or a really experienced, advanced healthcare data scientist.  I don't think I could recommend this book as an introduction to the topics of healthcare data or EMR data.  

​

​

It looks like this is a book that goes along with a course the authors teach in the UK, possibly focused on clinicians without programming skills that want to learn analysis.  Exclusive focus on using R and Rstudio.  I imagine the book employs some healthcare examples, but this is clearly one of the books that focuses purely on the 'programming R' style approach to making a data scientist.  

​

Only recently found this book after some digging, published back in 2007.  Seems like a good, practical guide to a bunch of commonly used research datasets used in public health like HCUP, MEPS, National Immunization Survey, BRFSS, National Maternal and Infant Health Survey, and Medicare and Medicaid data.  At 137 pages I imagine most of this information can be found from existing, published documentation, but just compressed and organized a little more neatly.  ​

​

Very limited information and no reviews to determine what the content is.  Here's a description from the publisher:  "This book can be used in introductory courses on hypothesis testing, intermediate courses on regression, and advanced courses on causal analysis. It can also be used to learn SQL language. Its extensive online instructor resources include course syllabi, PowerPoint and video lectures, Excel exercises, individual and team assignments, answers to assignments, and student-organized tutorials."​

​

Not available until March 26, 2021 - currently no real information on the content of this book.  Here's a description from the publisher:  "The authors present the challenges faced by the healthcare industry, including capturing, storing, searching, sharing and analyzing data. This book illustrates the challenges in the applications of Big Data and suggests ways to overcome them, with a primary emphasis on data repositories, challenges, and concepts for data scientists, engineers and clinicians.​

​

Seems like more of a 'soft' analytics book geared towards executives and a non-technical audience.  â€‹

​

Another of the academic, anthology style books on a bunch of advanced applications (presumably largely from papers published by the authors).  Not gonna learn the introductory material to healthcare data science from this book.  

​

I don't really understand the purpose of these short, Springer brief books.  This total book is 100 page...the first chapter covers 'healthcare, data analytics, and business intelligence' in 12 pages and that includes an introduction, conclusion, and references.  Is there really a market for these books with almost no content?  

​

bottom of page