Stocking the Data Analyst’s Bookshelf

Many years ago, when I was teaching in a statistics department, I had my first consulting gig. Two psychology researchers didn’t know how to analyze their paired rank data. Unfortunately, I didn’t either. I asked a number of statistics colleagues (who didn’t know either), then finally borrowed a nonparametrics book. The answer was right there. (If you’re curious, it was a Friedman test.)

But the bigger lesson for me was the importance of a good reference library. No matter how much statistical training and experience you have, you won’t remember every detail about every statistical test. And you don’t need to. You just need to have access to the information and be able to understand it.

My statistics library consists of a collection of books, software manuals, articles, and web sites. Yet even in the age of Google, the heart of my library is still books. I use Google when I need to look something up, but it’s often not as quick as I’d hoped, and I don’t always find the answer. I rely on my collection of good reference books that I KNOW will have the answer I’m looking for (and continually add to it).

Not all statistics books are equally helpful in every situation. I divide books into four categories– Reference Books, Software Books, Applied Statistics Books, and data analysis books. My library has all four, and yours should too, if data analysis is something you’ll be doing long-term. I’ve included examples for running logistic regression in SAS, so you can compare the four types.

1. Reference Books are often text books. They are filled with formulas, theory, and exercises, as well as explanations. As a data analyst, not a student, you can skip most of it and go right for the explanations or formula you need. While I find most text books aren’t useful for learning HOW to do a new statistical method on your own, they are great references for already-familiar methods.

While I have a few favorites, the best one is often the one you already own and are familiar with, i.e. the textbooks you used in your stats classes. Hopefully, you didn’t sell back your stats text books (or worse, have the post office lose them in your cross-country move, like I did).

Example: Alan Agresti’s Categorical Data Analysis.

2. Statistical Software Books focus on using a software package. They tend to be general, often starting from the beginning, and cover everything from entering and manipulating data to advanced statistical techniques. This is the type of book to use when learning a new package or area of a package. They don’t, however, usually tell you much about the actual statistics–what it means, why to use it, or when different options make sense. And these are not manuals–they are usually written by users of the software, and are much better for learning a software program. (I think of learning a software program like learning French from a French dictionary–not so good).

Example: Ron Cody & Jeffrey Smith’sĀ  Applied Statistics and the SAS Programming Language

3. Applied Statistics Books are written for researchers. The focus is not on the formulas, as text books are, but on meaning and use of the statistics. Good applied statistics books are fabulous for learning a new technique when you don’t have time for a semester-length class, but you will have to have a reasonably strong statistical background to read or use them well. They aren’t for beginners. The nice thing about applied statistics books is they are not tied to any piece of software, so they’re useful to anyone. That is also their limitation, though–they won’t guide you through the actual analysis in your package.

Example: Scott Menard’s Applied Logistic Regression Analysis

4. Statistical Analysis Books are a hybrid between applied statistics and statistical software books. They explain both the steps to the software AND what it all means. There aren’t many of these, but many of the ones that exist are great. The only problem is they are often published by the software companies, so each one only exists for one software package. If it’s not the one you use, they’re less useful. But they are often great anyway as Applied Statistics books.

Example: Paul Allison’s Logistic Regression using the SAS System: Theory and Application

If you are without reference books you like, buy them used. Unlike students, you don’t need the latest edition. Most areas of statistics don’t change that much. Linear regression isn’t getting new assumptions, and factor analysis isn’t getting new rotations. Unless it’s in an area of statistics that is still developing, like multilevel modeling and missing data, you’re pretty safe with a 10 year old version.

And it does help to buy them. Use your institution’s library to supplement your personal library. Even if it’s great, getting to that library is an extra barrier, and waiting a few weeks for the recall or interlibrary loan is sometimes too long.

I have bought used textbooks for $10. Menard’s book, and all of the excellent Sage series, are only $17, new. So it doesn’t have to cost a fortune to build a library. Even so, paying $70 for a book is sometimes completely worth it. Having the information you need will save you hours, or even days of work. How much is your time and energy worth? If you plan to do data analysis long term, invest a little each year in statistical reference books.

The full list of all four types of books Karen recommends is on The Analysis Factor Bookshelf page.

If you know of any other great books we should recommend, comment below.Ā  I’m always looking for good books to recommend.


Reader Interactions


  1. Caleb says

    Hi Karen,

    I landed a database analyst job. I would like a few books to get me started. Information on SQL and SSRS (maybe some Microsoft Access), how to find and pick the data i’m looking for would be helpful. I will be analyzing marketing campaigns to determine if they were successful or not. Thanks!

    • Karen says

      Hi Caleb,

      Congrats on the new job. šŸ™‚

      You know, I don’t know much about data bases or SQL, believe it or not. So I’m not sure I have a good answer. I usually start with the data once it’s in stat software.

      Can anyone else answer Caleb?

      • chand says

        Hi Caleb,
        Congrats on the Job šŸ˜€
        I don’t know your academic background.
        But there are so many books on SQL and SSRS, but what we need is understanding and application. So book reading gives knowledge and applying then in our day to day duties increases skill.

        Have a happy start!!!

  2. Raghavendra V B says

    Anyone please suggest me some really good books for Data Analysis as am new to this field. I need to study complete process of Data Analysis model which includes Data Acquisition, Visualization, optimization, transformation etc.

    • Karen says

      Hi Rahgavendra, Can you tell us the field you’re working in? That may help me make suggestions as the types of analyses done in each field differs.

      • chand says

        Hi Karen,
        I am also a Data Analyst. working on SQL for extracting data and using MS Excel I’ll prepare my reports (Using Formulas and charts).
        Here i would like to know what are the tools or softwares to present/analyse the extracted data. Excel was helpful but up to an extent only, bcoz if the data was huge than Excel was taking more time, sometimes hang up.
        Books/sources/sites to learn them.

        Thanks in anticipation!

  3. Jessica says

    Hi Karen,

    I was looking for the full list of books you recommend in each of these categories, but the link at the end of the article doesn’t seem to be working. I’m particularly interested in good “Applied Statistics” and “Statistical Analysis” books, if that helps. Could you point me to the list with your recommendations?


  4. Dinesh says

    Hi, I found your blog extremely interesting. I am a former business analyst stepping into a level 2 data analyst. I need some simple and understandable books to get me started in this career. Could you suggest some?

    • Karen says

      Hi Dinesh,

      I can if you give me a bit more information.

      1. What kind of statistics do you need to do in your role? Data mining? Regression?
      2. What is your current stat background? I’m not sure what a business analyst would know already. Are you a beginner? You’ve taken classes, but many years ago? You know the basics but need to move beyond it?
      3. What stat software do you use/plan to use? The best books incorporate examples using one or more software programs.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.