<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Analysis Factor</title>
	<atom:link href="http://www.theanalysisfactor.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.theanalysisfactor.com</link>
	<description>Statistical Consulting, Resources, and Statistics Workshops for Researchers in Psychology, Sociology, and other Social and Biological Sciences</description>
	<lastBuildDate>Fri, 18 May 2012 15:08:08 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Can a Regression Model with a Small R-squared Be Useful?</title>
		<link>http://www.theanalysisfactor.com/small-r-squared/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=small-r-squared</link>
		<comments>http://www.theanalysisfactor.com/small-r-squared/#comments</comments>
		<pubDate>Mon, 14 May 2012 20:21:21 +0000</pubDate>
		<dc:creator>Karen Grace-Martin</dc:creator>
				<category><![CDATA[Linear Regression]]></category>
		<category><![CDATA[Power and Sample Size]]></category>
		<category><![CDATA[effect size]]></category>
		<category><![CDATA[R-squared]]></category>

		<guid isPermaLink="false">http://www.theanalysisfactor.com/?p=2619</guid>
		<description><![CDATA[R² is such a lovely statistic, isn't it?  Unlike so many of the others, it makes sense--the percentage of variance in Y accounted for by a model. 

I mean, you can actually understand that.  So can your grandmother.  And the clinical audience you're writing the report for.

A big R² is always big (and good!) and a small one is always small (and bad!), right?

Well, maybe.]]></description>
			<content:encoded><![CDATA[<p></p><p>R² is such a lovely statistic, isn&#8217;t it?  Unlike so many of the others, it makes sense&#8211;the percentage of variance in Y accounted for by a model.</p>
<p>I mean, you can actually understand that.  So can your grandmother.  And the clinical audience you&#8217;re writing the report for.</p>
<p>A big <a href="http://www.theanalysisfactor.com/assessing-the-fit-of-regression-models/">R²</a> is always good and a small one is always bad, right?</p>
<p>Well, maybe.<span id="more-2619"></span></p>
<p>I&#8217;ve seen a lot of people get upset about small R² values, or any small <a href="http://www.theanalysisfactor.com/effect-size/">effect size</a>, for that matter.  I recently heard a comment that no regression model with an R² smaller than .7 should even be interpreted.</p>
<p>Now, there may be a context in which that rule makes sense, but as a general rule, no.</p>
<p>Just because effect size is small doesn&#8217;t mean it&#8217;s bad, unworthy of being interpreted, or useless.  It&#8217;s just small.  Even small effect sizes can have scientific or clinical significance.  It depends on your field.</p>
<p>For example, in a dissertation I helped a client with many years ago, the research question was about whether religiosity predicts physical health.  (If you&#8217;ve been in any of my workshops, you&#8217;ll recognize this example&#8211;it&#8217;s a great data set.  The model used frequency of religious attendance as an indicator of religiosity, and included a few personal and demographic control variables, including gender, poverty status, and depression levels, and a few others.</p>
<p>The model R² was about .04, although the model was significant.</p>
<p>It&#8217;s easy to dismiss the model as being useless.  You&#8217;re only explaining 4% of the variation?  Why bother?</p>
<p>But think about this.  If you think about all of the things that might affect someone&#8217;s health, do you really expect religious attendance to be a <em>major</em> contributor?</p>
<p>Even though I&#8217;m not a health researcher, I can think of quite a few variables that I would expect to be much better predictors of health.  Things like age, disease history, stress levels, family history of disease, job conditions.</p>
<p>And putting all of them into the model would indeed give better predicted values.  If the <em>only</em> point of the model was prediction, my client&#8217;s model <em>would</em> do a pretty bad job. (Perhaps the 70% comment came from someone who only runs prediction models).</p>
<p>But it wasn&#8217;t.  The point was to see if there was a small, but reliable relationship.  And there was.</p>
<p>Do small effect sizes<a href="http://www.theanalysisfactor.com/5-ways-to-increase-power-in-a-study/"> require larger samples</a> to find significance?  Sure.  But this data set had over 5000 people.  Not a problem.</p>
<p>Many researchers turned to using effect sizes because evaluating effects using p-values alone can be misleading.  But effect sizes can be misleading too if you don&#8217;t think about what they mean within the research context.</p>
<p>Sometimes being able to easily improve an outcome by 4% is clinically or scientifically important.  Sometimes it&#8217;s not even close enough.   Sometimes it depends on how much time, effort, or money would be required to get a 4% improvement.</p>
<p>As much as we&#8217;d all love to have straight answers to what&#8217;s big enough, that&#8217;s not the job of any statistic.  You&#8217;ve got to think about it and interpret accordingly.</p>
<p><!-- AddThis Button BEGIN --><br />
<a class="addthis_button" href="http://addthis.com/bookmark.php?v=250&amp;pub=kgracemartin"><img style="border: 0;" src="http://s7.addthis.com/static/btn/v2/lg-share-en.gif" alt="Bookmark and Share" width="125" height="16" /></a><script type="text/javascript" src="http://s7.addthis.com/js/250/addthis_widget.js#pub=kgracemartin"></script><br />
<!-- AddThis Button END --></p>
<form style="border: 1px solid #ccc; padding: 3px; text-align: center;" action="http://www.feedburner.com/fb/a/emailverify" method="post">Like this post?<br />
Enter your email address to have posts delivered:<br />
<input style="width: 140px;" type="text" name="email" />
<input type="submit" value="Subscribe" /></form>
<div class="SPOSTARBUST-Related-Posts"><H3>Related Posts</H3><ul class="entry-meta"><li class="SPOSTARBUST-Related-Post"><a title="A Comparison of Effect Size Statistics" href="http://www.theanalysisfactor.com/effect-size/" rel="bookmark">A Comparison of Effect Size Statistics</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="What Happened to R squared?: Assessing Model Fit for Logistic, Multilevel, and Other Models that use Maximum Likelihood Webinar" href="http://www.theanalysisfactor.com/june-webinar-what-happened-to-r-squared-assessing-model-fit-for-logistic-multilevel-and-other-models-that-use-maximum-likelihood/" rel="bookmark">What Happened to R squared?: Assessing Model Fit for Logistic, Multilevel, and Other Models that use Maximum Likelihood Webinar</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="How to Calculate Effect Size Statistics" href="http://www.theanalysisfactor.com/calculate-effect-size/" rel="bookmark">How to Calculate Effect Size Statistics</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="Assessing the Fit of Regression Models" href="http://www.theanalysisfactor.com/assessing-the-fit-of-regression-models/" rel="bookmark">Assessing the Fit of Regression Models</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="5 Ways to Increase Power in a Study" href="http://www.theanalysisfactor.com/5-ways-to-increase-power-in-a-study/" rel="bookmark">5 Ways to Increase Power in a Study</a></li>
</ul></div>]]></content:encoded>
			<wfw:commentRss>http://www.theanalysisfactor.com/small-r-squared/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Sample Size Estimates for Multilevel Randomized Trials</title>
		<link>http://www.theanalysisfactor.com/sample-size-randomized-trials/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=sample-size-randomized-trials</link>
		<comments>http://www.theanalysisfactor.com/sample-size-randomized-trials/#comments</comments>
		<pubDate>Tue, 01 May 2012 17:21:59 +0000</pubDate>
		<dc:creator>Karen Grace-Martin</dc:creator>
				<category><![CDATA[Mixed and Multilevel Models]]></category>
		<category><![CDATA[Power and Sample Size]]></category>
		<category><![CDATA[Multilevel Models]]></category>
		<category><![CDATA[Optimal Design]]></category>
		<category><![CDATA[Power Analysis]]></category>
		<category><![CDATA[Randomized Trials]]></category>
		<category><![CDATA[Sample Size Calculations]]></category>

		<guid isPermaLink="false">http://www.theanalysisfactor.com/?p=2573</guid>
		<description><![CDATA[But there are many design issues that affect power in a study that go way beyond a z-test.  Like:

    repeated measures
    clustering of individuals
    blocking
    including covariates in a model

Regular sample size software can accommodate some of these issues, but not all.  And there is just something wonderful about finding a tool that does just what you need it to.

Especially when it's free.]]></description>
			<content:encoded><![CDATA[<p></p><p>If you learned much about calculating power or sample sizes in your statistics classes, chances are, it was on something very, very simple, like a z-test.</p>
<p>But there are many <a href="http://www.theanalysisfactor.com/5-ways-to-increase-power-in-a-study/">design issues that affect power</a> in a study that go way beyond a z-test.  Like:</p>
<ul>
<li>repeated measures</li>
<li>clustering of individuals</li>
<li>blocking</li>
<li>including covariates in a model</li>
</ul>
<p>Regular sample size software can accommodate some of these issues, but not all.  And there is just something wonderful about finding a tool that does just what you need it to.</p>
<p>Especially when it&#8217;s free.</p>
<p>Enter <em>Optimal Design Plus Empirical Evidence</em> software.<span id="more-2573"></span></p>
<p>Optimal Design is software for <a href="http://www.theanalysisfactor.com/power-and-sample-size-calculations/">power calculations</a> on individual and group randomized trials.  It was developed by a group of statistical researchers, headed by Jessaca Spybrook at Western Michigan University.  It was funded by a grant from the William T. Grant Foundation, so it is available for download free of charge.  (Link below).</p>
<p>I found it recently when working with a client who is planning a study for the effectiveness of an educational intervention.  The design options in the software were exactly the design issues we needed to consider for developing the best design to maximize power while accounting for data collection limitations and the budget.</p>
<p>There are many design options for a randomized trial.  If randomized trial isn&#8217;t a term used in your field, this is the basic idea.</p>
<p>The outcome is measured at the individual level, and repeated measures on individuals are possible.  There is some sort of randomization to treatment groups, and this randomization can occur at the individual level or at a cluster level.  The point of the study is to compare the mean of the outcome in the treatment groups.</p>
<p>For example, educational studies are often conducted on students clustered within classrooms.  In a study comparing two teaching formats for effectiveness, usually an entire classroom of students is assigned to one format condition.  It&#8217;s just not possible to randomly assign individuals within classrooms.  So the clusters (classrooms) are randomly assigned to treatment, not the individual student.</p>
<p>So both Person-Randomized and Group-Randomized trials are possible, and the level of randomization affects power.</p>
<p>It is also possible to include a third level of grouping, if classrooms are nested within schools, and a fourth, if schools are blocked within districts.</p>
<p>The randomization to treatments and the measurement of covariates can occur at any level.</p>
<p>If this is starting to get overwhelming, it&#8217;s not as bad as it sounds.  The software comes with one of the best written statistical software manuals I&#8217;ve seen.</p>
<p>The manual explains in great detail, with excellent examples, what each design criteria means, so that you&#8217;ll be able to recognize it in your own design.</p>
<p>Here is an excerpt from a table in the manual that explains some of the available designs:</p>
<p style="padding-left: 30px;"> <a href="http://www.theanalysisfactor.com/wp-content/uploads/2012/05/od-designs.png"><img class="size-full wp-image-2576 alignleft" title="od-designs" src="http://www.theanalysisfactor.com/wp-content/uploads/2012/05/od-designs.png" alt="Design Options for Individual Level Outcome Measures" width="555" height="302" /></a></p>
<p>&nbsp;</p>
<p>Beyond the very simplest, all of these designs will require using multilevel or mixed models to analyze the data, once you&#8217;ve got it.  And to run the <a href="http://www.theanalysisfactor.com/5-reasons-sample-size-calculations/">prospective power analyses</a>, you will have to have estimates for some of the design effects&#8211;an intraclass correlation for the clustering, blocking effects, correlations between covariates and the outcome variable&#8211;in addition to the usual standard deviation estimates that you need for any power analysis.</p>
<p>But it doesn&#8217;t cover the sample size estimates for any multilevel analysis.  The effect size it requires is a mean difference for a comparison of two treatments.</p>
<p>This works great if you really are doing this type of intervention study&#8211;it&#8217;s exactly what you need. And within that context, the design options are plentiful.</p>
<p>Like any specialized tool, it works very, very well for what it&#8217;s designed for, and not much else.  It&#8217;s also very easy to use and well documented.</p>
<p>You can download the <a href="http://sitemaker.umich.edu/group-based/optimal_design_software" target="_blank">Optimal Design Plus software and documentation here</a>.<br />
<!-- AddThis Button BEGIN --><br />
<a class="addthis_button" href="http://addthis.com/bookmark.php?v=250&amp;pub=kgracemartin"><img style="border: 0;" src="http://s7.addthis.com/static/btn/v2/lg-share-en.gif" alt="Bookmark and Share" width="125" height="16" /></a><script type="text/javascript" src="http://s7.addthis.com/js/250/addthis_widget.js#pub=kgracemartin"></script></p>
<p>&nbsp;</p>
<p>If you&#8217;d like to learn more about power and sample size estimates, take a look at our online workshop:  <a title="Calculating Power and Sample Size Workshop" href="http://www.theanalysisinstitute.com/workshops/CPSS/index.html" target="_self"><strong>Calculating Power and Sample Size</strong></a>.  We’ll go over the logic, the info you need, where to get it, how to do the steps, and how to use power software to get good estimates. We’ll also go over what these estimates really tell you, and what they don’t.</p>
<div class="SPOSTARBUST-Related-Posts"><H3>Related Posts</H3><ul class="entry-meta"><li class="SPOSTARBUST-Related-Post"><a title="Multilevel Models with Crossed Random Effects" href="http://www.theanalysisfactor.com/multilevel-models-with-crossed-random-effects/" rel="bookmark">Multilevel Models with Crossed Random Effects</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="Concepts in Linear Regression you need to know before learning Multilevel Models" href="http://www.theanalysisfactor.com/concepts-in-linear-regression-you-need-to-know-before-learning-multilevel-models/" rel="bookmark">Concepts in Linear Regression you need to know before learning Multilevel Models</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="Confusing Statistical Terms #3: Levels of a Factor in Multilevel Models Measured at a Nominal Level" href="http://www.theanalysisfactor.com/levels-of-a-factor-in-multilevel-models-measured-at-a-nominal-level/" rel="bookmark">Confusing Statistical Terms #3: Levels of a Factor in Multilevel Models Measured at a Nominal Level</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="Some Good References for Multilevel Regression Models" href="http://www.theanalysisfactor.com/some-good-references-for-multilevel-regression-models/" rel="bookmark">Some Good References for Multilevel Regression Models</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="Specifying Fixed and Random Factors in Mixed Models" href="http://www.theanalysisfactor.com/specifying-fixed-and-random-factors-in-mixed-models/" rel="bookmark">Specifying Fixed and Random Factors in Mixed Models</a></li>
</ul></div>]]></content:encoded>
			<wfw:commentRss>http://www.theanalysisfactor.com/sample-size-randomized-trials/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Confusing Statistical Term #6: Factor</title>
		<link>http://www.theanalysisfactor.com/confusing-statistical-term-6-factor/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=confusing-statistical-term-6-factor</link>
		<comments>http://www.theanalysisfactor.com/confusing-statistical-term-6-factor/#comments</comments>
		<pubDate>Fri, 27 Apr 2012 14:37:16 +0000</pubDate>
		<dc:creator>Karen</dc:creator>
				<category><![CDATA[ANOVA]]></category>
		<category><![CDATA[Confusing Statistical Terms]]></category>
		<category><![CDATA[Factor Analysis]]></category>
		<category><![CDATA[Categorical Independent Variable]]></category>
		<category><![CDATA[Factor Score]]></category>

		<guid isPermaLink="false">http://www.theanalysisfactor.com/?p=2483</guid>
		<description><![CDATA[Factor is tricky much in the same way as hierarchical and beta, because it too has different meanings in different contexts.  Factor might be a little worse, though, because its meanings are related.

In both meanings, a factor is a variable.  But a factor has a completely different meaning and implications for use in two different contexts.

Factor analysis 

In factor analysis, a factor is an unmeasured, latent variable, that expresses itself through its relationship with other measured variables.]]></description>
			<content:encoded><![CDATA[<p></p><p>Factor is confusing much in the same way as <a title="Hierarchical Regression vs. Hierarchical Model" href="../confusing-statistical-term-4-hierarchical-regression-vs-hierarchical-model/" target="_self">hierarchical</a> and <a title="Alpha and Beta" href="../confusing-statistical-terms-1-alpha-and-beta/" target="_self">beta</a>, because it too has different meanings in different contexts.  Factor might be a little worse, though, because its meanings are related.</p>
<p>In both meanings, a factor is a variable.  But a factor has a completely different meaning and implications for use in two different contexts.</p>
<h3>Factor in Factor Analysis</h3>
<p>In factor analysis, a factor is an latent (unmeasured) variable that expresses itself through its relationship with other measured variables.</p>
<p>Take for example a variable like leadership. We may want to measure a person&#8217;s or an organization&#8217;s leadership style, but this is the kind of construct that would be impossible to measure using a single variable. It&#8217;s just too abstract and multifaceted, although it does represent a single concept.</p>
<p>So instead, you may have to develop the scale with many items, each of which measures some more measurable part of leadership. The idea would be that there is an underlying unmeasurable factor, leadership, that causes people to respond in certain patterns on the many items on the scale.</p>
<p>The purpose of factor analysis is to analyze these patterns of response as a way of getting at this underlying factor. Factor analysis also allows you to use the weighted item responses to create what are called factor scores.  These represent a single score for each person on the factor.</p>
<p>Factor scores are nice because they allow you to use a single variable as a measure of the factor in the other analyses, rather than a set of items.</p>
<h3>Factor as a Categorical Predictor Variable</h3>
<p>Contrast that to the use of a factor in a linear model or a linear mixed model.  In this context, a factor is still a variable, but it refers to a categorical independent variable. So you may have heard of fixed factors and random factors. In both cases, those are referring to a categorical independent variable.</p>
<p>Like covariates, factors in a linear model can be either control variables or important independent variables. The model uses them the same way in either case. The only difference is how you are going to interpret the results.</p>
<p>This all gets especially tricky when the continuous factor scores from a factor analysis are used as predictors in a linear model.  Technically, since they are continuous, they wouldn&#8217;t be factors in the model, in the second definition. They would be <a href="http://www.theanalysisfactor.com/confusing-statistical-terms-5-covariate/">covariates</a>.</p>
<p>———————————————————————————-</p>
<p>Read other posts in the series on <a title="Series on Confusing Statistical Terms" href="http://www.theanalysisfactor.com/series-on-confusing-statistical-terms/">Confusing Statistical Terms.</a></p>
<p>&nbsp;</p>
<p><a href="http://addthis.com/bookmark.php?v=250&amp;pub=kgracemartin"><img style="border: 0;" src="http://s7.addthis.com/static/btn/v2/lg-share-en.gif" alt="Bookmark and Share" width="125" height="16" /></a><script type="text/javascript" src="http://s7.addthis.com/js/250/addthis_widget.js#pub=kgracemartin"></script><br />
Sign up for free updates from The Analysis Factor to get more clarity on what statistics mean and how to use them.</p>
<form style="border: 1px solid #ccc; padding: 3px; text-align: center;" action="http://www.feedburner.com/fb/a/emailverify" method="post">Like this post? Enter your email address to have posts delivered:</p>
<input style="width: 140px;" type="text" name="email" />
<input type="submit" value="Subscribe" /></form>
<p>&nbsp;</p>
<div class="SPOSTARBUST-Related-Posts"><H3>Related Posts</H3><ul class="entry-meta"><li class="SPOSTARBUST-Related-Post"><a title="SPSS GLM: Choosing Fixed Factors and Covariates" href="http://www.theanalysisfactor.com/spss-glm-choosing-fixed-factors-and-covariates/" rel="bookmark">SPSS GLM: Choosing Fixed Factors and Covariates</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="When Unequal Sample Sizes Are and Are NOT a Problem in ANOVA" href="http://www.theanalysisfactor.com/when-unequal-sample-sizes-are-and-are-not-a-problem-in-anova/" rel="bookmark">When Unequal Sample Sizes Are and Are NOT a Problem in ANOVA</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="Can Likert Scale Data ever be Continuous?" href="http://www.theanalysisfactor.com/can-likert-scale-data-ever-be-continuous/" rel="bookmark">Can Likert Scale Data ever be Continuous?</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="Interpreting Interactions:  When the F test and the Simple Effects disagree." href="http://www.theanalysisfactor.com/interpreting-interactions-when-the-f-test-and-the-simple-effects-disagree/" rel="bookmark">Interpreting Interactions:  When the F test and the Simple Effects disagree.</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="The 3 Stages of Mastering Statistical Analysis" href="http://www.theanalysisfactor.com/the-3-stages-of-mastering-statistical-analysis/" rel="bookmark">The 3 Stages of Mastering Statistical Analysis</a></li>
</ul></div>]]></content:encoded>
			<wfw:commentRss>http://www.theanalysisfactor.com/confusing-statistical-term-6-factor/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Five Extensions of the General Linear Model</title>
		<link>http://www.theanalysisfactor.com/extensions-general-linear-model/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=extensions-general-linear-model</link>
		<comments>http://www.theanalysisfactor.com/extensions-general-linear-model/#comments</comments>
		<pubDate>Fri, 13 Apr 2012 16:02:09 +0000</pubDate>
		<dc:creator>Karen Grace-Martin</dc:creator>
				<category><![CDATA[ANOVA]]></category>
		<category><![CDATA[Mixed and Multilevel Models]]></category>
		<category><![CDATA[Regression models]]></category>
		<category><![CDATA[GEE]]></category>
		<category><![CDATA[General Linear Model]]></category>
		<category><![CDATA[generalized estimating equations]]></category>
		<category><![CDATA[generalized linear mixed models]]></category>
		<category><![CDATA[Generalized Linear Model]]></category>
		<category><![CDATA[Marginal Mod]]></category>

		<guid isPermaLink="false">http://www.theanalysisfactor.com/?p=2382</guid>
		<description><![CDATA[Generalized linear models, linear mixed models, generalized linear mixed models, marginal models, GEE models.  You’ve probably heard of more than one of them and you’ve probably also heard that each one is an extension of our old friend, the general linear model.

This is true, and they extend our old friend in different ways, particularly in regard to the measurement level of the dependent variable and the independence of the measurements.  So while the names are similar (and confusing), the distinctions are important.]]></description>
			<content:encoded><![CDATA[<p></p><p>Generalized linear models, linear mixed models, <a href="http://www.theanalysisfactor.com/mixed-models-logistic-regression-in-spss/">generalized linear mixed models</a>, <a href="http://www.theanalysisfactor.com/repeated-and-random-2/">marginal models</a>, GEE models.  You’ve probably heard of more than one of them and you’ve probably also heard that each one is an extension of our old friend, the <a href="http://www.theanalysisfactor.com/general-linear-model-anova-regression-same-model/">general linear model</a>.</p>
<p>This is true, and they extend our old friend in different ways, particularly in regard to the <a href="http://www.theanalysisfactor.com/6-types-of-dependent-variables-that-will-never-meet-the-glm-normality-assumption/">measurement level of the dependent variable</a> and the independence of the measurements.  So while the names are similar (and confusing), the distinctions are important.</p>
<p>It’s important to note here that I am glossing over many, many details in order to give you a basic overview of some important distinctions.  These are complicated models, but I hope this overview gives you a starting place from which to explore more.<span id="more-2382"></span></p>
<p><strong>General Linear Models</strong></p>
<p>The general linear model has this basic form:</p>
<p><img src="http://www.theanalysisfactor.com/wp-content/uploads/2012/03/equation-1.gif" alt="" width="200" height="74" /></p>
<p>And has these <a href="http://www.theanalysisfactor.com/assumptions-of-linear-models/">assumptions (among others)</a></p>
<ol>
<li>the residuals are independent of each other</li>
<li>the residuals are normally distributed</li>
<li>the relationship between Y and the model parameters is linear</li>
</ol>
<p>So let&#8217;s see how some of the different model types extend this model in different ways.</p>
<p><strong>Generalized linear models</strong></p>
<p>Generalized linear models extend the last two assumptions. They generalize the possible distributions of the residuals to a family of distributions called the exponential family.  This family includes the normal as well as the <a href="http://www.theanalysisfactor.com/binary-ordinal-multinomial-logistic/">binomial</a>, <a href="http://www.theanalysisfactor.com/poisson-regression-analysis-for-count-data/">Poisson</a>, negative binomial, and gamma distributions, among others. You are probably familiar with common examples like logistic, Poisson, and probit models.</p>
<p><img src="http://www.theanalysisfactor.com/wp-content/uploads/2012/03/equation-2.gif" alt="" /></p>
<p>When you change the distribution of the residuals, it turns out that the relationship between Y and the model parameters is no longer linear. However, for each distribution in the exponential family, there exists at least one function of the mean of Y whose relationship with the model parameters is linear. This function is called the link function.</p>
<p>The link function you choose will depend on which distribution you are choosing for the outcome variable. For example, a binomial residual can use a probit or a logit link function. A Poisson residual uses a log link function.</p>
<p><strong>Marginal Models</strong></p>
<p><a href="http://www.theanalysisfactor.com/repeated-measures-approaches/">Marginal models</a> are a type of linear model that accounts for <a href="http://www.theanalysisfactor.com/longitudinal-repeated-measures/">repeated response measures</a> on the same subject. They extend the general linear model by allowing and accounting for non-independence among the observations of a single subject.</p>
<p>They do this by estimating one or more parameters that capture the covariance among the residuals.  So rather than having a single constant variance and zero covariance for all residuals, observations from the same subject are allowed to have different variances and nonzero covariances. The pattern of variances and covariances is known as the covariance structure of the R matrix.</p>
<p>They still assume that observations from different subjects are independent, and linear marginal models still assume residuals are normally distributed.</p>
<p><strong>GEE Models</strong></p>
<p>Generalized estimating equation models are generalized linear marginal models.  That is, they combine the generalized linear model for a non-normal residual with the repeated measures of a marginal model. You would use these when you have repeated measures on each subject and need to run a logistic, multinomial, Poisson or other generalized linear regression model.</p>
<p><strong>Linear Mixed Models</strong></p>
<p>Like marginal models, linear mixed models account for non-independence among clustered observations, but they do it in a different way.</p>
<p>Instead of estimating nonzero correlations among residuals, linear mixed models account for the fact that clustered observations are similar by estimating the variance among cluster means and among observations within a cluster.  It literally partitions the variance in Y into cluster-level and observation-level parts.</p>
<p>Because of the way they account for variation among subjects, linear mixed models are much more flexible than marginal models.</p>
<p>For example, they can accommodate three levels of repeat or clustering, like repeated measurements on patients clustered within hospitals, and can be used to estimate more precise subject effects beyond variation among means.</p>
<p>They can accomplish these feats because they include parameters to measure the random effects of the clusters&#8211;by treating the variation among clusters as another sort of residual variation.  The Mixed in the name comes from the fact that they estimate both fixed and random effects.</p>
<p>Like all linear models, linear mixed models assume residuals are normally distributed and the relationship between Y and the model parameters is linear.</p>
<p><strong>Generalized Linear Mixed Models</strong></p>
<p>You probably know by now where this one is going.</p>
<p>Generalized Linear Mixed Models are mixed models in which the residuals follow a distribution from the same exponential family.  They require the same link functions as generalized linear models<em>and</em> at least one random effect.</p>
<p>Both generalized linear models and linear mixed models can be computationally intensive, especially as the number of random effects to be estimated goes beyond one or two.  Putting them together can be especially so.  I’ve run GLMMs that took hours to run on not very large data sets.  They require special care and should not be undertaken lightly.</p>
<p><strong>How all the models are the same</strong></p>
<p>I’ve focused on how these models differ, but they also have underlying similarities.</p>
<ol type="1" start="1">
<li>The structure is the same: they all are models of the relationship between a single response variable Y, and one or more predictor variables X.  The variation around the model is estimated in the residual.</li>
<li>They generally all use some form of maximum likelihood estimation.  Even OLS estimation, used in the general linear model, is a special case of maximum likelihood.</li>
<li><a href="http://www.theanalysisfactor.com/spss-glm-choosing-fixed-factors-and-covariates/">Fixed effects</a> work the same in all these models.  The function of Y may differ, and the residual structure may differ, but the X variables work the same in every one of these models.  Dummy and effect coding, continuous predictors, interactions, quadratic terms have the same inherent meaning and can be used in any of these models.</li>
<li>The General Linear Model is a subset of each of these other models.  You could, if you really wanted to, run a GLM model in a software procedure designed for any of these other models by choosing the right options.  The reverse is not true.</li>
</ol>
<p><a class="addthis_button" href="http://addthis.com/bookmark.php?v=250&amp;pub=kgracemartin"><img style="border: 0;" src="http://s7.addthis.com/static/btn/v2/lg-share-en.gif" alt="Bookmark and Share" width="125" height="16" /></a><script type="text/javascript" src="http://s7.addthis.com/js/250/addthis_widget.js#pub=kgracemartin"></script></p>
<p style="padding-left: 30px;"> If you want to learn more about linear mixed models, check out the recording of my Random Intercept and Random Slope Models webinar. These two models are the basic building blocks of all mixed models.</p>
<p style="padding-left: 30px;"><a title="Random Intercept and Random Slope Models Webinar" href="http://www.theanalysisfactor.com/random-intercept-and-random-slope-models-webinar/" target="_self"><strong>Get it all here</strong></a>. It’s free.</p>
<div class="SPOSTARBUST-Related-Posts"><H3>Related Posts</H3><ul class="entry-meta"><li class="SPOSTARBUST-Related-Post"><a title="Mixed Models for Logistic Regression in SPSS" href="http://www.theanalysisfactor.com/mixed-models-logistic-regression-in-spss/" rel="bookmark">Mixed Models for Logistic Regression in SPSS</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="A Sneak Peak at SPSS 19" href="http://www.theanalysisfactor.com/a-sneak-peak-at-spss-19/" rel="bookmark">A Sneak Peak at SPSS 19</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="The General Linear Model, Analysis of Covariance, and How ANOVA and Linear Regression Really are the Same Model Wearing Different Clothes" href="http://www.theanalysisfactor.com/general-linear-model-anova-regression-same-model/" rel="bookmark">The General Linear Model, Analysis of Covariance, and How ANOVA and Linear Regression Really are the Same Model Wearing Different Clothes</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="3 Reasons Psychology Researchers should Learn Regression" href="http://www.theanalysisfactor.com/3-reasons-psychology-researchers-should-learn-regression/" rel="bookmark">3 Reasons Psychology Researchers should Learn Regression</a></li>
<li class="SPOSTARBUST-Related-Post"><a title="3 Situations when it makes sense to Categorize a Continuous Predictor in a Regression Model" href="http://www.theanalysisfactor.com/3-situations-when-it-makes-sense-to-categorize-a-continuous-predictor-in-a-regression-model/" rel="bookmark">3 Situations when it makes sense to Categorize a Continuous Predictor in a Regression Model</a></li>
</ul></div>]]></content:encoded>
			<wfw:commentRss>http://www.theanalysisfactor.com/extensions-general-linear-model/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>When to leave insignificant effects in a model</title>
		<link>http://www.theanalysisfactor.com/insignificant-effects-in-model/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=insignificant-effects-in-model</link>
		<comments>http://www.theanalysisfactor.com/insignificant-effects-in-model/#comments</comments>
		<pubDate>Thu, 05 Apr 2012 19:57:18 +0000</pubDate>
		<dc:creator>Karen Grace-Martin</dc:creator>
				<category><![CDATA[ANOVA]]></category>
		<category><![CDATA[Regression models]]></category>

		<guid isPermaLink="false">http://www.theanalysisfactor.com/?p=1549</guid>
		<description><![CDATA[<!--SPOSTARBUST 317 else (count($tags) > 0) -->You may have noticed conflicting advice about whether to leave insignificant effects in a model or take them out in order to simplify the model. One effect of leaving in insignificant predictors is on p-values&#8211;they use up precious df in small samples. But if your sample isn&#8217;t small, the effect is negligible. The bigger effect [...]]]></description>
			<content:encoded><![CDATA[<!--SPOSTARBUST 317 else (count($tags) > 0) --><p></p><p>You may have noticed conflicting advice about whether to leave insignificant effects in a model or take them out in order to simplify the model.</p>
<p>One effect of leaving in insignificant predictors is on p-values&#8211;they use up precious df in small samples. But if your sample isn&#8217;t small, the effect is negligible.</p>
<p>The bigger effect is  on interpretation, and really the above cases are about whether it aids interpretation to leave them in. Models do get so cluttered it’s hard to figure out what’s going on, and it makes sense to eliminate effects that aren&#8217;t serving a purpose, but even insignificant effects can have a purpose.<span id="more-1549"></span></p>
<p>So these are three situations where there is a purpose in showing that specific predictors were not significant and to measure their coefficient anyway:</p>
<h4>1. Expected control variables.  You need to show that you&#8217;ve controlled for them.</h4>
<p>In many fields, there are control variables that everyone expects to see.</p>
<ul>
<li>Age in medical studies</li>
<li>Race, income, education in sociological studies</li>
<li>Socioeconomic status in education studies</li>
</ul>
<p>The examples go on and on.</p>
<p>If you take these expected <a href="http://www.theanalysisfactor.com/confusing-statistical-terms-5-covariate/">controls</a> out, you will just get criticism for not including them.  And it may be interesting to show that in this sample and with these variables, these controls weren&#8217;t significant.</p>
<h4>2. Predictors you have specific hypotheses about.</h4>
<p>Another example is if the point of a model is to specifically test a predictor–you have a hypothesis about a predictor and it’s meaningful to show that it’s not significant. In that case, I would leave it in, even if not significant.</p>
<h4>3. Items involved in higher-order terms</h4>
<p>When you take out a term that is involved in something higher, like a<a href="http://www.theanalysisfactor.com/interaction-association/"> two-way interaction</a> that is part of a three-way interaction, you actually change the meaning of the higher order term.  The sums of squares for each higher-order term is based on comparisons to specific means and represents variation around that mean.</p>
<p>If you take out the lower order term, that variation has to be covered somewhere, and it&#8217;s usually not where you expect it.  For example, a two-way interaction represents the variation in cell means around the main effect means.  But if the variation between the main effect means isn&#8217;t measured with a main effect term, it ends up in the interaction, and that interaction doesn&#8217;t reflect the variation it did if the main effect were in the model.</p>
<p>So it&#8217;s not that it&#8217;s <em>wrong</em>, but it changes the meaning of the interaction.  For that reason, most people recommend leaving those lower-order effects in.</p>
<p>The main point here is there are often good reasons to leave insignificant effects in a model.  The p-values are just one piece of information.  You may be losing important information by automatically removing everything that isn&#8217;t significant.<br />
<!-- AddThis Button BEGIN --><a class="addthis_button" href="http://addthis.com/bookmark.php?v=250&amp;pub=kgracemartin"><img style="border: 0;" src="http://s7.addthis.com/static/btn/v2/lg-share-en.gif" alt="Bookmark and Share" width="125" height="16" /></a><script type="text/javascript" src="http://s7.addthis.com/js/250/addthis_widget.js#pub=kgracemartin"></script></p>
<p>&nbsp;</p>
<form style="border: 1px solid #ccc; padding: 3px; text-align: center;" action="http://www.feedburner.com/fb/a/emailverify" method="post">Like this post?<br />
Enter your email address to have posts delivered:</p>
<input style="width: 140px;" type="text" name="email" />
<input type="submit" value="Subscribe" /></form>
]]></content:encoded>
			<wfw:commentRss>http://www.theanalysisfactor.com/insignificant-effects-in-model/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Data Mining Webinar with Peter Bruce, President, Statistics.com</title>
		<link>http://www.theanalysisfactor.com/data-mining-peter-bruce/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=data-mining-peter-bruce</link>
		<comments>http://www.theanalysisfactor.com/data-mining-peter-bruce/#comments</comments>
		<pubDate>Wed, 04 Apr 2012 16:40:56 +0000</pubDate>
		<dc:creator>Karen</dc:creator>
				<category><![CDATA[The Craft of Statistical Analysis Webinars]]></category>
		<category><![CDATA[Upcoming]]></category>

		<guid isPermaLink="false">http://www.theanalysisfactor.com/?p=2385</guid>
		<description><![CDATA[<!--SPOSTARBUST 317 else (count($tags) > 0) -->Data Mining methods lie at the center of the constellation of techniques under the umbrella of &#8220;business analytics.&#8221;  These techniques deal with analysis of large existing datasets (as opposed to controlled experiments, or sample surveys). This webinar will give an overview of data mining techniques, which include: In predictive modeling, we build a model to [...]]]></description>
			<content:encoded><![CDATA[<!--SPOSTARBUST 317 else (count($tags) > 0) --><p></p><p>Data Mining methods lie at the center of the constellation of techniques under the umbrella of &#8220;business analytics.&#8221;  These techniques deal with analysis of large existing datasets (as opposed to controlled experiments, or sample surveys).</p>
<p>This webinar will give an overview of data mining techniques, which include:</p>
<ul>
<li>In <span style="color: #000000;">predictive modeling</span>, we build a model to predict the known value of a variable of interest using &#8220;training&#8221; data, then apply the model to data where the value is unknown.</li>
<li>In <span style="color: #000000;">clustering</span>, we seek to identify groups of similar customers, records, etc. Clustering is the statistical component in customer segmentation.</li>
<li>A <span style="color: #000000;">recommender system</span> identifies, statistically, &#8220;what goes with what.&#8221;  These systems lie behind the notices advising you that &#8220;customers who bought X also bought Y.&#8221;</li>
<li>Additional methods include <span style="color: #000000;">graphical techniques</span> and <span style="color: #000000;">text analytics</span> (the most rapid growth is in text &#8211; Twitter feeds, Facebook contents, emails, etc.).</li>
</ul>
<p><strong>Date:</strong> May 30, 2012</p>
<p><strong>Time:</strong> 1pm Eastern Time UTC -4 (12pm Central, 11am Mountain, 10am Pacific)</p>
<p><strong>Where:</strong> Anywhere you have a fast internet connection</p>
<p><strong>Length of Program:</strong> An Hour</p>
<p><strong>Cost:</strong> Always FREE</p>
<p><strong>Space is limited.</strong></p>
[upcoming-webinar-optin]
<h3>About Our Guest</h3>
<p>Mr. Peter Bruce <img class="alignright" src="http://www.statistics.com/uploads/images/pete_profiler_1.jpg" alt="" width="100" height="131" />is President of The Institute for Statistics Education at Statistics.com. He is the developer of Resampling Stats software (originated by Julian Simon in the 1970&#8242;s), and has taught resampling statistics at the University of Maryland and in a variety of short courses. He is the co-author of <em>Data Mining for Business Intelligence</em> (Wiley, 2006, 2nd ed. 2010), as well as a number of journal articles.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theanalysisfactor.com/data-mining-peter-bruce/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Difference Between Interaction and Association</title>
		<link>http://www.theanalysisfactor.com/interaction-association/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=interaction-association</link>
		<comments>http://www.theanalysisfactor.com/interaction-association/#comments</comments>
		<pubDate>Fri, 23 Mar 2012 15:07:31 +0000</pubDate>
		<dc:creator>Karen Grace-Martin</dc:creator>
				<category><![CDATA[ANOVA]]></category>
		<category><![CDATA[Linear Regression]]></category>
		<category><![CDATA[Association]]></category>
		<category><![CDATA[Correlation]]></category>
		<category><![CDATA[interaction]]></category>

		<guid isPermaLink="false">http://www.theanalysisfactor.com/?p=2362</guid>
		<description><![CDATA[Interaction is different.  Whether two variables are associated says nothing about whether they interact in their effect on a third variable.  Likewise, if two variables interact, they may or may not be associated.]]></description>
			<content:encoded><![CDATA[<p></p><p>It’s really easy to mix up the concepts of association (a.k.a. correlation) and interaction.  Or to assume if two variables interact, they must be associated.  But it’s not actually true.</p>
<p>In statistics, they have different implications for the relationships among your variables, especially when the variables you’re talking about are predictors in a regression or ANOVA model.</p>
<p><strong>Association</strong></p>
<p>Association between two variables means the values of one variable relate in some way to the values of the other.  Association is usually measured by correlation for two continuous variables and by cross tabulation and a Chi-square test for two categorical variables.</p>
<p>Unfortunately, there is no nice, descriptive measure for association between one <span id="more-2362"></span>categorical and one continuous variable, but either one-way analysis of variance or logistic regression can test an association (depending upon whether you think of the categorical variable as the independent or the dependent variable).</p>
<p>Essentially, association means the values of one variable generally co-occur with certain values of the other.</p>
<p><strong>Interaction</strong></p>
<p>Interaction is different.  Whether two variables are associated says nothing about whether they interact <em>in their effect on a third variable</em>.  Likewise, if two variables interact, they may or may not be associated.</p>
<p>An interaction between two variables means the effect of one of those variables on a third variable is not constant—the effect differs at different values of the other.</p>
<p><strong>What Association and Interaction Describe in a Model</strong></p>
<p>The following examples show three situations for three variables: X1, X2, and Y. X1 is a continuous<a href="http://www.theanalysisfactor.com/the-many-names-of-independent-variables/"> independent variable</a>, X2 is a <a href="http://www.theanalysisfactor.com/spss-glm-choosing-fixed-factors-and-covariates/">categorical independent variable</a>, and Y is the dependent variable.  I chose these types of variables to make the plots easy to read, but any of these variables could be either categorical or continuous.</p>
<p>In scenario 1, X1 and X2 are associated.  If you ignore Y, you can see the mean of X1 is lower when X2=0 than when X2=1.  But they do not interact in how they affect Y—the regression lines are parallel.  X1 has the same effect on Y (the slope) for both X2=1 and X2=0.</p>
<p>A simple example is the relationship between height (X1) and weight (Y) in male (X2=0) and female (X2=1) teenagers.  There is a relationship between height (X1) and gender (X2), but for both genders, the relationship between height and weight is the same.</p>
<p>This is the situation you’re trying to take care of by including <strong><a href="http://www.theanalysisfactor.com/confusing-statistical-terms-5-covariate/">control variables</a></strong>.  If you didn’t include gender as a control, a regression would fit a single line to all these points and attribute all variation in weights to differences in heights.  This line would also be steeper, as it tried to fit all the points using one line, and it would overestimate the size of the unique effect of height on weight.</p>
<p>&nbsp;</p>
<p><a href="http://www.theanalysisfactor.com/wp-content/uploads/2011/12/interaction-graphic-1.gif"><img class="aligncenter size-full wp-image-2274" title="interaction-graphic-1" src="http://www.theanalysisfactor.com/wp-content/uploads/2011/12/interaction-graphic-1.gif" alt="Association without Interaction" width="400" height="332" /></a></p>
<p>In a second scenario, X1 and X2 are not associated—the mean of X1 is the same for both categories of X2.  But how X1 affects Y differs for the two values of X2—the definition of an interaction.  The slope of X1 on Y is greater for X2=1 than it is for X2=0, in which it is nearly flat.</p>
<p>An example of this would be an experiment in which X1 was a pretest score and Y a posttest score.  Imagine participants were randomly assigned to a control (X2=1) or a training (X2=0) condition.</p>
<p>If randomization is done well, the assigned condition (X2) should be unrelated to the pretest score (X1).  But they do interact—the relationship between pretest and posttest differs in the two conditions.</p>
<p>In the control condition, without training, the pretest and posttest scores would be highly correlated, but in the training condition, if the training worked well, pretest scores would have less effect on posttest scores.</p>
<p>&nbsp;</p>
<p><a href="http://www.theanalysisfactor.com/wp-content/uploads/2011/12/interaction-graphic-2.gif"><img class="aligncenter size-full wp-image-2275" title="interaction-graphic-2" src="http://www.theanalysisfactor.com/wp-content/uploads/2011/12/interaction-graphic-2.gif" alt="Interaction without Association" width="400" height="350" /></a></p>
<p>In the third scenario, we’ve got <em>both</em> an association <em>and</em> an <strong><a href="http://www.theanalysisfactor.com/clarifications-on-interpreting-interactions-in-regression/">interaction</a></strong>.   X1 and X2 are associated—once again the mean of X1 is lower when X2=0 than when X2=1.  They also interact with Y—the slopes of the relationship between X1 and Y are different when X2=0 and X2=1.  So X2 affects the relationship between X1 and Y.</p>
<p>A good example here would be if Y is the number of jobs in a county, X1 is the percentage of the workforce that holds a college degree, and X2 is whether the county is rural (X2=0) or metropolitan (X1=0).</p>
<p>It’s clear rural counties have, on average, lower percentages of college-educated citizens than metropolitan counties.  They also have fewer jobs.</p>
<p>It’s also clear that the workforce’s education level in metropolitan counties is related to how many jobs there are.  But in rural counties, it doesn’t matter at all.</p>
<p>This situation is also what you would see if the randomization in the last example did not go well or if randomization was not possible.</p>
<p><a href="http://www.theanalysisfactor.com/wp-content/uploads/2011/12/graph-interaction-3.gif"><img class="aligncenter size-full wp-image-2263" title="graph-interaction-3" src="http://www.theanalysisfactor.com/wp-content/uploads/2011/12/graph-interaction-3.gif" alt="Interaction and Association" width="400" height="336" /></a>The differences between Interaction and Association will become clearer as you analyze more data. It&#8217;s always a good idea to stop and explore your data through graphs or by trying different terms in your model to figure out exactly what&#8217;s going on with your variables.</p>
<p><a class="addthis_button" href="http://addthis.com/bookmark.php?v=250&amp;pub=kgracemartin"><img style="border: 0;" src="http://s7.addthis.com/static/btn/v2/lg-share-en.gif" alt="Bookmark and Share" width="125" height="16" /></a><script type="text/javascript" src="http://s7.addthis.com/js/250/addthis_widget.js#pub=kgracemartin"></script><br />
If you want more information on using and interpreting interactions, get the recording from my webinar: <a href="../learning/teletraining4.html">Interpreting Regression Coefficients: A Walk Through Output</a>. It&#8217;s free.</p>
<div class="SPOSTARBUST-Related-Posts"><H3>Related Posts</H3><ul class="entry-meta"><li class="SPOSTARBUST-Related-Post"><a title="Interpreting Lower Order Coefficients When the Model Contains an Interaction" href="http://www.theanalysisfactor.com/interpreting-lower-order-coefficients-when-the-model-contains-an-interaction/" rel="bookmark">Interpreting Lower Order Coefficients When the Model Contains an Interaction</a></li>
</ul></div>]]></content:encoded>
			<wfw:commentRss>http://www.theanalysisfactor.com/interaction-association/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When To Fight For Your Analysis and When To Jump Through Hoops</title>
		<link>http://www.theanalysisfactor.com/when-to-fight-for-your-analysis-and-when-to-jump-through-hoops/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=when-to-fight-for-your-analysis-and-when-to-jump-through-hoops</link>
		<comments>http://www.theanalysisfactor.com/when-to-fight-for-your-analysis-and-when-to-jump-through-hoops/#comments</comments>
		<pubDate>Tue, 14 Feb 2012 11:00:09 +0000</pubDate>
		<dc:creator>Karen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.theanalysisfactor.com/?p=2326</guid>
		<description><![CDATA[<!--SPOSTARBUST 317 else (count($tags) > 0) -->In the world of data analysis, there&#8217;s not always one clearly appropriate analysis for every research question. There are so many factors to take into account, including the research question to be answered, the measurement of the variables, the study design, data limitations and issues, the audience, practical constraints like software availability, and the purpose [...]]]></description>
			<content:encoded><![CDATA[<!--SPOSTARBUST 317 else (count($tags) > 0) --><p></p><p>In the world of data analysis, there&#8217;s not always one clearly appropriate analysis for every research question. There are so many factors to take into account, including the research question to be answered, the measurement of the variables, the study design, data limitations and issues, the audience, practical constraints like software availability, and the purpose of the data analysis.</p>
<p>So what do you do when a reviewer rejects your choice of data analysis? This reviewer can be your boss, your dissertation committee, a co-author, or journal reviewer or editor. What do you do?</p>
<p>There are ultimately only two choices: You can redo the analysis their way. Or you can fight for your analysis. How do you choose?</p>
<p>The one absolute in this choice is that you have to honor the integrity of your data analysis and yourself. Do not be persuaded to do an analysis that will produce inaccurate or misleading results, especially when readers will actually make decisions based on these results. (If no one will ever read your report, this is less crucial).</p>
<p>But even within that absolute, there are often choices. Keep in mind the two goals in data analysis:</p>
<ol>
<li>The analysis needs to accurately reflect the limits of the design and the data, while still answering the research question.</li>
<li>The analysis needs to communicate the results to the audience.</li>
</ol>
<p>So first and foremost, if your reviewer is asking you to do an analysis that does not appropriately take into account the design or the variables, you need to fight.</p>
<p>For example, a few years ago I worked with a researcher who had a study with repeated measurements on the same individuals. It had a small sample size and an unequal number of observations on each individual. It was clear that to take into account the design and the unbalanced data, the appropriate analysis was a linear mixed model.</p>
<p>The researcher&#8217;s co-author questioned the use of the linear mixed model, mainly because he wasn&#8217;t familiar with it, and thought the researcher was attempting something fishy. His suggestion was to use an ad hoc technique of averaging over the multiple observations for each subject.</p>
<p>This was a situation where fighting was worth it. Unnecessarily simplifying the analysis to please people who were unfamiliar with an appropriate method was not an option because the simpler model would have violated assumptions. This was particularly important because the research was being submitted to a high-level journal.</p>
<p>So it was the researcher&#8217;s job to educate not only his coauthor, but the readers, in the form of explaining the analysis and its advantages, with citations, right in the paper.</p>
<p>In contrast, often the reviewer is not really asking for a completely different analysis, but a different way of running the same analysis, or reporting different specific statistics.</p>
<p>For example a confirmatory factor analysis can be run either in standard statistical software like SAS, SPSS, or Stata, or it can be run it in structural equation modeling software like Amos or MPlus. The analysis is essentially the same, but the two types of software will report different statistics.</p>
<p>If your committee members are familiar with structural equation modeling, they probably want to see the type of statistics that structural equation modeling software will report, including overall model fit statistics like RMSEA or model chi-squares.</p>
<p>This is a situation where it may be easier, and produces no ill-effects, to jump through the hoop. Running the factor analysis in the software they prefer, assuming you have access to the software, won&#8217;t violate any assumptions or produce inaccurate results. Especially if the reviewer is someone who has the ability to stop your research in its tracks, it may be worth it to rerun the analysis to get the statistics they want to see reported.</p>
<p>Now you do have to decide whether the cost of jumping through the hoop, in terms of time, money, and emotional energy, is worth it. If the request is relatively minor, it usually is. If it&#8217;s a matter of rerunning every analysis you&#8217;ve done to indulge a committee member&#8217;s pickiness, it may be worth standing up for yourself and your analysis.</p>
<p>When you&#8217;re dealing with anonymous reviewers, the situation can get sticky.  You cannot ask them to clarify their concerns and you have limited ability to explain the reasons for choosing your analysis. It may be harder to discern if they are being overly picky, don&#8217;t understand the statistics themselves, or have a valid point.</p>
<p>If you choose to stand up for yourself, be well armed. Research the issue until you are absolutely confident in your approach (or until you&#8217;re convinced that you were missing something). A few hours in the library or talking with a trusted expert is never a wasted investment. Compare that to running an unpublishable analysis to please a committee member or coauthor.</p>
<p>Often, the problem is actually not in the analysis you did, but in the way you explained it. It&#8217;s your job to explain why the analysis is appropriate and, if it&#8217;s unfamiliar to readers, what it does. Rewrite that section, making it very clear. Ask colleagues to review it. Cite other research that uses or explains that statistical method.</p>
<p>Whatever you choose, be confident that you made the right decision, then move on.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.theanalysisfactor.com/when-to-fight-for-your-analysis-and-when-to-jump-through-hoops/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Understanding Probability, Odds, and Odds Ratios in Logistic Regression</title>
		<link>http://www.theanalysisfactor.com/understanding-probability-odds/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=understanding-probability-odds</link>
		<comments>http://www.theanalysisfactor.com/understanding-probability-odds/#comments</comments>
		<pubDate>Tue, 31 Jan 2012 17:57:51 +0000</pubDate>
		<dc:creator>Karen</dc:creator>
				<category><![CDATA[Recordings]]></category>
		<category><![CDATA[The Craft of Statistical Analysis Webinars]]></category>

		<guid isPermaLink="false">http://www.theanalysisfactor.com/?p=2312</guid>
		<description><![CDATA[<!--SPOSTARBUST 317 else (count($tags) > 0) -->Odds ratios are the bane of many data analysts. Interpreting them can be like learning a whole new language. This webinar will go over an example to show how to interpret the odds ratios in binary logistic regression. You will learn: how probability and odds both measure the same thing on different scales the meaning [...]]]></description>
			<content:encoded><![CDATA[<!--SPOSTARBUST 317 else (count($tags) > 0) --><p></p><p>Odds ratios are the bane of many data analysts. Interpreting them can be like learning a whole new language. This webinar will go over an example to show how to interpret the odds ratios in binary logistic regression. You will learn:</p>
<ul>
<li>how probability and odds both measure the same thing on different scales</li>
<li>the meaning of odds</li>
<li>how to interpret an odds ratio for continuous and categorical predictors in logistic regression</li>
</ul>
<p><strong>[webinar-recording-optin]<br />
</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://www.theanalysisfactor.com/understanding-probability-odds/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When Can Count Data be Considered Continuous?</title>
		<link>http://www.theanalysisfactor.com/count-data-considered-continuous/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=count-data-considered-continuous</link>
		<comments>http://www.theanalysisfactor.com/count-data-considered-continuous/#comments</comments>
		<pubDate>Fri, 13 Jan 2012 19:19:48 +0000</pubDate>
		<dc:creator>Karen Grace-Martin</dc:creator>
				<category><![CDATA[Poisson and Negative Binomial Regression Models]]></category>

		<guid isPermaLink="false">http://www.theanalysisfactor.com/?p=2295</guid>
		<description><![CDATA[Q: How high does the count scale have to be before you can consider it continuous?

I suspect you're getting at the same issue as in the last question. It's certainly true that when you get into very large numbers, many of the issues with count variables aren't issues anymore.]]></description>
			<content:encoded><![CDATA[<!--SPOSTARBUST 317 else (count($tags) > 0) --><p></p><p>Last month I did a <a href="http://www.theanalysisfactor.com/poisson-and-negative-binomial-regression/">webinar on Poisson and negative binomial models for count data</a>. With a few hundred participants, we ran out of time to get through all the questions, so I&#8217;m answering some of them here on the blog.</p>
<p>This set of questions are all related to when it&#8217;s appropriate to treat count data as continuous and run the more familiar and simpler linear model.</p>
<p><strong>Q: Do you have any guidelines or rules of thumb as far as how many discrete values an outcome variable can take on before it makes more sense to just treat it as continuous?</strong></p>
<p>The issue usually isn&#8217;t a matter of how many values there are.  I see what you mean in that a discrete scale that goes from <span id="more-2295"></span>0 to 8 for example feels more discrete because there are only nine possible values, compared to a discrete scale that goes from 0 to 200.  201 values just feels more continuous.</p>
<p>But that&#8217;s not really the issue in most <a href="http://www.theanalysisfactor.com/poisson-regression-analysis-for-count-data/">count models</a>. The issue with <a href="http://www.theanalysisfactor.com/6-types-of-dependent-variables-that-will-never-meet-the-glm-normality-assumption/">count variables is that they bounded at zero</a>. This wreaks havoc on the <a href="http://www.theanalysisfactor.com/assumptions-of-linear-modelsassumptions-of-linear-models/">assumptions of a linear model</a>, which require continuous data.</p>
<p>If none of your data are near zero, it would be less of an issue.  Treating that count variable as continuous would give you predicted values that are non-integers, but perhaps that&#8217;s not a big issue in your particular data set.</p>
<p><strong>Q: How high does the count scale have to be before you can consider it continuous?</strong></p>
<p>I suspect you&#8217;re getting at the same issue as in the last question. It&#8217;s certainly true that when you get into very large numbers, many of the issues with count variables aren&#8217;t issues anymore.</p>
<p>For example, most incomes are not measured using decimals, just whole numbers. You could consider them a count of the number of dollars. Likewise, demographic variables like the number of children vaccinated in a state over the course of the year are truly counts, but the smallest values are likely to be in the hundreds of thousands, or even millions.</p>
<p>As long as there are no data along the bound of zero, and you don&#8217;t mind predicted values that include decimals, there&#8217;s no problem treating it as continuous.</p>
<p><strong>Q: For count data distributing not skewed, but in a symmetric/or even normal shape, are poisson and NB still the best choice?</strong></p>
<p>Sometimes.  The Poisson distribution is only skewed when the mean is very small. When the mean gets up to only 10, the distribution will become symmetric and bell shaped.</p>
<p>Depending on the effects of the predictors, and actual range of the data, i.e. whether there are actual 0 pounds or not, you may get identical results from running a linear model compare to a Poisson or negative binomial model.</p>
<p>If you do run a linear model, it will be possible to get predictive values below zero, and you need to consider whether that&#8217;s problematic in your situation. If the point of your model is prediction, it may be more of an issue.</p>
<p><strong>Q: If count data can be normalized by log transformation, will you recommend using poisson or linear regression?</strong></p>
<p>It&#8217;s never wrong to run a Poisson model, so what you&#8217;re asking is if the increased accuracy is worth the trouble of running the more complicated model.  There are certainly cases where running a linear model simplifies things a lot and still gives you the same results.  (You just won&#8217;t know you have the same results unless you run both).</p>
<p>When the mean count is very small, and zero is the most common value in the data set, it will be impossible to normalize using a log transformation.  It just won&#8217;t work. The mode will always be at the lowest value.  In that situation, you have no choice.</p>
<p>However, if the mean count is a little bit larger, zero may not be the most common value. When the mode is not the lower bound, it will be possible to use a log transformation to normalize the data. In fact, not too many years ago, when Poisson and negative binomial models were not readily available in software, textbooks did suggest this approach.  You may still have some on your shelf that do so.</p>
<p>It&#8217;s not necessarily a bad approach.  You may very well get the exact same results. If so, and if, for example, you are writing a report for an audience with out the statistical sophistication to understand the Poisson model, it may be a better choice.</p>
<p>However, it&#8217;s not exactly the same thing. A Poisson model uses a <a href="http://www.theanalysisfactor.com/interpreting-regression-coefficients-in-models-other-than-ordinary-linear-regression/">log link function</a>, which applies the log to the mean&#8211;not each individual data point. So it&#8217;s harder to back transform coefficients when you&#8217;ve got a log transformation.  So if there are not major advantages to running a linear model, you are usually better off with a more sophisticated and more accurate Poisson model, or one of its derivatives.</p>
<p><em>If you&#8217;d like to learn more about the different models available for Count data, you can download a recording of the webinar: <strong><a href="”"> Poisson and Negative Binomial Regression for Count Data.</a></strong> It’s free.</em></p>
<p><a class="addthis_button" href="http://addthis.com/bookmark.php?v=250&amp;pub=kgracemartin"><img style="border: 0;" src="http://s7.addthis.com/static/btn/v2/lg-share-en.gif" alt="Bookmark and Share" width="125" height="16" /></a><script type="text/javascript" src="http://s7.addthis.com/js/250/addthis_widget.js#pub=kgracemartin"></script><br />
<!-- AddThis Button END --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.theanalysisfactor.com/count-data-considered-continuous/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

