• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
The Analysis Factor

The Analysis Factor

Statistical Consulting, Resources, and Statistics Workshops for Researchers

  • Home
  • About
    • Our Programs
    • Our Team
    • Our Core Values
    • Our Privacy Policy
    • Employment
    • Guest Instructors
  • Membership
    • Statistically Speaking Membership Program
    • Login
  • Workshops
    • Online Workshops
    • Login
  • Consulting
    • Statistical Consulting Services
    • Login
  • Free Webinars
  • Contact
  • Login

Why do I need to have knowledge of multiple regression to understand SEM?

by guest 2 Comments

by Manolo Romero Escobar

General Linear Model (GLM) is a tool used to understand and analyse linear relationships among variables. It is an umbrella term for many techniques that are taught in most statistics courses: ANOVA, multiple regression, etc.

In its simplest form it describes the relationship between two variables, “y” (dependent variable, outcome, and so on) and “x” (independent variable, predictor, etc). These variables could be both categorical (how many?), both continuous (how much?) or one of each.

Moreover, there can be more than one variable on each side of the relationship. One convention is to use capital letters to refer to multiple variables. Thus Y would mean multiple dependent variables and X would mean multiple independent variables. The most known equation that represents a GLM is:

Y = BX + E

Y represents the “dependent” variables (a.k.a. “outcomes”) which in the most simple (i.e. univariate) case has only one element.

Β are the weights (“loadings”, “effects”) of the “independent” variables to predict the Ys. Again, in the simple of cases it would contain:

β0 the “intercept”: The mean value of y when the x takes a value of zero.

β1 the weight or effect of the first predictor (x1).

X are the “independent” variables, in our simple case it has two values 1 and x1.

Finally, E, are the errors (aka, disturbance terms or residuals), which are what is left to be explained of the Ys after accounting for the intercept and the effect of the Xs.

The nature of the error term is one of the fundamental aspects of GLM. The general method used to estimate the β parameters (Ordinary Least Squares, OLS) minimizes the errors, which are normally distributed with a mean of zero and a variance (aka “residual variance”, or “error variance”) of σ².

Find below an example of a path diagram representing a multiple regression of an outcome “y” regressed on four predictors “x”.

You will notice that each x predictor variable has a path drawn to y, representing its effect on y.  This effect is measured by the coefficient.

sem-regression

And how does this way of presenting GLM help us understand SEM?

SEM makes it very clear, visually, which variables are the predictors and which are the outcomes.  In a linear regression, all the variables in our equation are directly measured and depicted in rectangles.  The only one that isn’t is the error term, ε, which is in an oval.

Like all latent variables, the error isn’t directly measured in our data set.  It is inferred from the set of residuals and we’re able to measure its variance.

By setting up the regression model this way, we can now test many different relationships among the variables by expanding the model.  We could add more paths to test mediation or add in other latent variables, measured by observed indicators.

These models can be very complicated, but the building blocks of predictors, outcome variables, error terms, and effects measured by coefficients are always present.

Thus, understanding GLM, and multiple regression in particular, is one of the requirements to successfully fitting SEM to your data.

********

manolo-2b

E. Manolo Romero Escobar is a Senior Psychometrician at Multi-Health Systems Inc (a psychological test publishing company) in Toronto.

He has extensive expertise in factor analytical and latent-trait methods of measurement, as well as applications of linear mixed effects to nested, longitudinal, unbalanced data.

3 Overlooked Strengths of Structural Equation Modeling
Confirmatory factor analysis (CFA) and path models make up two core building blocks of SEM. Learn how these help you understand how SEM is used.

Tagged With: GLM, Multiple Regression, SEM, Structural Equation Modeling

Related Posts

  • Five things you need to know before learning Structural Equation Modeling
  • One of the Many Advantages to Running Confirmatory Factor Analysis with a Structural Equation Model
  • First Steps in Structural Equation Modeling: Confirmatory Factor Analysis
  • Three Myths and Truths About Model Fit in Confirmatory Factor Analysis

Reader Interactions

Comments

  1. Denis de Crombrugghe says

    March 28, 2016 at 7:15 pm

    Dear Karen,
    I think you should be careful to prevent confusion between the error term or disturbance and the residuals. The former (denoted with the Greek epsilon in your figure) is a latent variable. The latter is its calculated (and therefore observed) sample estimate. This is not just a semantic question, as is implicitly ackowledged by the following sentence:
    “the error isn’t directly measured in our data set. It is inferred from the set of residuals and we’re able to measure its variance”
    (where the last “measure” actually means “estimate”). You can only infer something from the “residuals” if they are indeed observable.

    Also, IMO it is not very correct to state “OLS minimizes the errors”; OLS has no control over the errors. It would be clearer and more accurate to state “OLS minimizes the sum of squared RESIDUALS”.

    Reply
  2. Denis de Crombrugghe says

    March 28, 2016 at 10:42 am

    I think you should be careful to prevent confusion between the error term or disturbance and the residuals. The former (denoted with the Greek epsilon in your figure) is a latent variable. The latter is its calculated (and therefore observed) sample estimate. This is not just a semantic question, as is implicitly ackowledged by the following sentence:
    “the error isn’t directly measured in our data set. It is inferred from the set of residuals and we’re able to measure its variance”
    (where the last “measure” actually means “estimate”). You can only infer something from the “residuals” if they are indeed observable.

    Also, IMO it is not very correct to state “OLS minimizes the errors”; OLS has no control over the errors. It would be clearer and more accurate to state “OLS minimizes the sum of squared RESIDUALS”.

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Please note that, due to the large number of comments submitted, any questions on problems related to a personal study/project will not be answered. We suggest joining Statistically Speaking, where you have access to a private forum and more resources 24/7.

Primary Sidebar

This Month’s Statistically Speaking Live Training

  • January Member Training: A Gentle Introduction To Random Slopes In Multilevel Models

Upcoming Workshops

  • Logistic Regression for Binary, Ordinal, and Multinomial Outcomes (May 2021)
  • Introduction to Generalized Linear Mixed Models (May 2021)

Read Our Book



Data Analysis with SPSS
(4th Edition)

by Stephen Sweet and
Karen Grace-Martin

Statistical Resources by Topic

  • Fundamental Statistics
  • Effect Size Statistics, Power, and Sample Size Calculations
  • Analysis of Variance and Covariance
  • Linear Regression
  • Complex Surveys & Sampling
  • Count Regression Models
  • Logistic Regression
  • Missing Data
  • Mixed and Multilevel Models
  • Principal Component Analysis and Factor Analysis
  • Structural Equation Modeling
  • Survival Analysis and Event History Analysis
  • Data Analysis Practice and Skills
  • R
  • SPSS
  • Stata

Copyright © 2008–2021 The Analysis Factor, LLC. All rights reserved.
877-272-8096   Contact Us

The Analysis Factor uses cookies to ensure that we give you the best experience of our website. If you continue we assume that you consent to receive cookies on all websites from The Analysis Factor.
Continue Privacy Policy
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.