Contents >>
Combined Analysis >>
Chapter 9 Regression analysis to study the influence of socio-economic and regional characteristics on the Internet and Broadband access
CHAPTER 9 REGRESSION ANALYSIS TO STUDY THE INFLUENCE OF SOCIO-ECONOMIC AND REGIONAL CHARACTERISTICS ON THE INTERNET AND BROADBAND ACCESS
In Chapters 3 to 8 of this paper, a range of regional and socio-economic variables were analysed to understand the patterns of Internet and Broadband access in Australia in 2006. Although some clear patterns were observed, it is difficult to isolate, from the many socio-economic and regional variables examined, the key drivers for connectivity. Cross-tabular analyses do not control for the examined influence of other factors.
To deal with this shortcoming of cross-tabular analyses, multiple regression analysis was carried out. Multiple regression techniques, by disentangling the effects of multiple factors, estimate the separate effect of each individual independent variable (such as age) on the dependent variable (which is connection to Internet or Broadband), holding other variables (income, education etc) constant. It therefore helps identify the key factors influencing the Internet and Broadband access. The methodology underlying the regression analysis in this study broadly replicates the methodology adopted in the Australia Online study (Lloyd and Bill, 2004).
A binomial logistic regression model was used. A binomial logistic regression is a form of regression which is used when the dependent variable is dichotomous (value set to one or zero) and the independent variables are either dichotomous or continuous. A logistic regression can be used to predict the probability of an event occurring on the basis of continuous and /or categorical independent variables.
The odds associated with a particular event (e.g. have Broadband access) are simply an indicator of the probability in favour of that event taking place. The odds ratio is the ratio of probability of an event occurring versus the probability of it not occurring. The odds of an event relative to a non-event (e.g. having an Internet access versus not having Internet access) can be expressed as follows:
Ln{Pi/(1-Pi)}= a1+b1Xi1+...+bkXik
Where Ln is the natural log, Pi is the probability of an event for a person i, a1 is the intercept parameter, b s are regression parameters, and X s are a set of k explanatory variables representing individual i's observed characteristics. The parameters of the b vector can be estimated using standard maximum likelihood techniques.
The odds ratio can range from zero to infinity. If the odds ratio is greater than one, then that particular category is more likely than the reference category to have the Internet/Broadband access (within each variable one category is chosen to be the reference category). If the odds ratio is less than one, then that particular category is less likely than the reference category to have the Internet/Broadband access. The most common way of interpreting a logit is to convert it to an odds ratio using the exponential function.
Logistic regression models log-odds as a linear model, therefore predicted odds ratios are often a useful tool for displaying results. In this chapter we present the odds as an indicator for probability of an individual to have access to Internet/Broadband with respect to each socio-economic and regional variable when other independent variables are constant.
The sign of the regression coefficient indicates positive or negative impacts of the selected independent variables on home Internet or Broadband connection. A positive coefficient for a particular variable and a category suggests that an individual with that particular characteristic is more likely to have access to Internet/Broadband when holding every other variables constant, compared to the reference category. The negative coefficient suggests that an individual is less likely to have access to Internet/Broadband when holding every other variables constant, compared to the reference category.
9.1. The data set
In creating the data set for the regression analysis the following data were excluded from the data set in line with the Australia online study based on the 2001 Census data:
- persons under 15 years of age
- people in migratory areas and off-shores
- overseas visitors
- non-private dwellings, unoccupied private dwellings and migratory and off-shore dwellings
- people who did not respond to the Internet question
- people who did not state their labour force status
- people who did not state their indigenous status
- people who did not state their level of non-school highest education attainment
- people in households where weekly equivalised household income was "not stated" or was only partially stated.
- After the omissions of these, the total number of observations used in the regression analyses is 12,691,410 out of a total of 20,061,646 persons counted on the Census night.
9.2. Regression modelling
Stepwise multiple regression analysis was carried out separately for Broadband access and Any Internet access. Table 17 gives a summary of the explanatory variables used and the categories included for each variable with the reference category. The explanatory variables (independent variables) were chosen based on the Australia online (Lloyd and Bill, 2004) study. The results of the regression analyses for each variable are expressed with respect to the reference category. Strong correlations between states/territories and remoteness areas in some states/territories yielded unrealistic results, and thus the former variable was excluded. Analyses with respect to remoteness were considered of more interest than states/territories variables from equity of access perspective. Therefore in this analysis states/territories were excluded. For convenience in this analysis, broad income groups were considered for the weekly equivalised household income.
The results of the multiple regression analyses are given in tables 18 and 19. Examination of likelihood of Chi-square ratio showed that the final models for both Broadband and Internet are highly significant at the 5% significance level. The maximum rescaled R-square values greater than 0.2 and percent concordant greater than 70% also indicated the two models can be considered as having a reasonable goodness of fit.
(a) Includes Aboriginal, Torres Straight Islanders and both Aboriginal and Torres Straight Islanders
* Low skill - broad category of machinery operators, drivers and labourers
* * High skill - remainder of occupation categories. Please see the explanatory notes for details of these categories
9.3. Results of the regression analysis of Internet access
Nearly all the explanatory variables were highly significant at the 5% significance level. Table 18 gives a summary of the stepwise regression modelling output.
Table 18: Regression Analysis Results for Individuals , with Home Access to any Internet Connection in 2006(a) |
|  |
 | Coefficient estimate | P value | Odds ratio |  |
|  |
Intercept | 0.4865 | <0.0001 |  |  |
Major cities (reference) |  |  | 1.000 |  |
Inner Regional | -0.1814 | <0.0001 | 0.834 |  |
Outer Regional | -0.3460 | <0.0001 | 0.708 |  |
Remote | -0.4299 | <0.0001 | 0.651 |  |
Very Remote | -0.9514 | <0.0001 | 0.386 |  |
Nil or negative income | 0.3850 | <0.0001 | 1.470 |  |
$1-$599 (reference) |  |  | 1.000 |  |
$600-$999 | 0.5216 | <0.0001 | 1.685 |  |
$1000-$1999 | 0.9990 | <0.0001 | 2.715 |  |
$2000 or more | 1.3596 | <0.0001 | 3.895 |  |
Couple family without any children (reference) |  |  | 1.000 |  |
Couple family with dependent children | 1.2916 | <0.0001 | 3.639 |  |
Couple family without dependent children | 0.6425 | <0.0001 | 1.901 |  |
Single parent with dependent children | 0.8290 | <0.0001 | 2.291 |  |
Single parent without dependent children | 0.2582 | <0.0001 | 1.295 |  |
Other family | 0.1698 | <0.0001 | 1.185 |  |
Age15 to 17 | 1.1478 | <0.0001 | 3.151 |  |
Age18 to 24 | 0.4812 | <0.0001 | 1.618 |  |
Age25 to 34 | -0.0933 | <0.0001 | 0.911 |  |
Age35 to 44 (reference) |  |  | 1.000 |  |
Age45 to 54 | 0.1280 | <0.0001 | 1.137 |  |
Aage55 to 64 | -0.0387 | <0.0001 | 0.962 |  |
Age65 to 74 | -0.4950 | <0.0001 | 0.610 |  |
Age75 to plus | -1.1296 | <0.0001 | 0.323 |  |
Female not married | -0.5868 | <0.0001 | 0.556 |  |
Male not married | -0.5627 | <0.0001 | 0.570 |  |
Male married (reference) |  |  | 1.000 |  |
Female married * |  |  |  |  |
Post graduate qualifications | 1.3615 | <0.0001 | 3.902 |  |
Graduate diploma or certificate | 1.0384 | <0.0001 | 2.825 |  |
Bachelor degree | 0.9051 | <0.0001 | 2.472 |  |
Advance diploma and diploma | 0.7695 | <0.0001 | 2.159 |  |
Certificate level | 0.2004 | <0.0001 | 1.222 |  |
No post school qualification (reference) |  |  | 1.000 |  |
Not in the labour force | -0.4165 | <0.0001 | 0.659 |  |
Unemployed | -0.3020 | <0.0001 | 0.739 |  |
Employed in low skill occupations | -0.4632 | <0.0001 | 0.629 |  |
Employed in high skill occupations (reference) |  |  | 1.000 |  |
English proficiency /very well (reference) |  |  | 1.000 |  |
English proficiency not well | -0.4866 | <0.0001 | 0.615 |  |
English proficiency not at all | -0.0444 | <0.0001 | 0.957 |  |
Non-Indigenous (reference) |  |  | 1.000 |  |
Indigenous | -1.1761 | <0.0001 | 0.308 |  |
|  |
(a) N=12,691,410; Max-rescale R-Square=0.31; Percent concordant=80% |
* Not significant at 5% level
9.3.1. Remoteness
Similar to Broadband, any Internet connections showed a large deviation between the major cities and regional areas. When other variables were held constant, it was found that people living in the inner Regional areas are about 17% less likely to have any Internet access. People living in very remote areas are about 61% less likely to have any Internet connection.
9.3.2. Weekly equivalised household income
Weekly equivalised household income showed a strong positive relationship with having access to any Internet. Results show that higher the income, greater the likelihood of a person having any Internet connection. When all other variables are held constant, the likelihood of having any Internet access for persons with the equivalised household income of $1000 to $1999 are about 2.7 times more than people in the reference income group $1-$599. Odds of having any Internet access are about 3.9 times higher for those in the $2000 or more income group compared to those in the reference group.
9.3.3. Family composition
The likelihood of having any Internet access for people in families with dependent children is much higher (3.6 times) compared to those couple families without any children (reference group).
9.3.4. Age
People in the 15 to 17 years age group are about 3.2 times more likely, and people in the 18 to 24 years group are about 1.6 times more likely, to have any Internet connectivity at home compared to those in the reference age group (35 to 44 years). The odds of having access to any Internet connection decrease with increasing age.
9.3.5. Gender and Marital status
Compared to the reference group (married males), the likelihood of having any Internet connectivity at home for unmarried males is about 43% less likely, and for females about 44% less likely. Results for married females were found to be not significant in this analysis at the 5% significant level.
9.3.6. Highest level of post-school qualifications
The likelihood of having any Internet access by persons with post graduate qualifications are about 3.9 times the likelihood of those in the reference group (no post school qualifications). The probability of having any Internet access by persons with graduate certificate or diploma, bachelor degree or diploma are more than twice the probability of those in the reference group. People with certificate level qualifications are about 1.2 times more likely to have access to any Internet. The results show that the level of post school qualifications has a strong impact on an individual's likelihood of having access to any Internet.
9.3.7. Labour force status and occupation
The probability of having any Internet connection for people not in the labour force is 34% less compared to those people in the reference group (occupied in high skill jobs). The likelihood of having any Internet connection for unemployed persons is 26% less compared to those in the reference group. Although this is an i2% increase from the 2001 Census results (Lloyd 2004), it is interesting to note that likelihood of having any Internet connection for those people occupied in the low skill jobs is 37% less compared to those in the reference group.
9.3.8. Proficiency in spoken English
People with good spoken English proficiency are the reference group. The likelihood of having any Internet connection for those in the English proficiency not well are 38% less than those in the reference group. The probability of those people with very poor spoken English proficiency (English proficiency not at all) are about 4% less than the probability of those in the reference group.
9.3.9. Indigenous status
The likelihood of having any Internet connection for the indigenous people are about 69% less than non-Indigenous people (reference group). These are similar to those produced on the 2001 Census data (Lloyd 2004).
9.4. Results of the regression analysis of Broadband access
Nearly all the explanatory variables were highly significant at the 5% significance level. Therefore all variables are included in the final output. Table 19 gives a summary of the regression modelling output.
Table 19: Regression Analysis Results for Individuals, with Home Access to Broadband Connection in 2006(a) |
|  |
 | Coefficient estimate | P value | Odds ratio |  |
|  |
Intercept | -0.5201 | <0.0001 |  |  |
Major cities (reference) |  |  | 1.000 |  |
Inner Regional | -0.5107 | <0.0001 | 0.600 |  |
Outer Regional | -0.7730 | <0.0001 | 0.464 |  |
Remote | -0.7590 | <0.0001 | 0.468 |  |
Very Remote | -0.8877 | <0.0001 | 0.412 |  |
Nil or negative income | 0.4964 | <0.0001 | 1.643 |  |
$1-$599 (reference) |  |  | 1.000 |  |
$600-$999 | 0.3416 | <0.0001 | 1.407 |  |
$1000-$1999 | 0.7485 | <0.0001 | 2.114 |  |
$2000 or more | 1.2223 | <0.0001 | 3.395 |  |
Couple family without any children (reference) |  |  | 1.000 |  |
Couple family with dependent children | 0.8677 | <0.0001 | 2.381 |  |
Couple family without dependent children | 0.4638 | <0.0001 | 1.590 |  |
Single parent with dependent children | 0.6086 | <0.0001 | 1.838 |  |
Single parent without dependent children | 0.0845 | <0.0001 | 1.088 |  |
Other family | 0.1657 | <0.0001 | 1.180 |  |
Age15 to 17 | 0.8474 | <0.0001 | 2.334 |  |
Age18 to 24 | 0.5519 | <0.0001 | 1.737 |  |
Age25 to 34 | 0.0299 | <0.0001 | 1.030 |  |
Age 35 to 44 (reference) |  |  | 1.000 |  |
Age45 to 54 | 0.1538 | <0.0001 | 1.166 |  |
Aage55 to 64 * |  |  |  |  |
Age65 to 74 | -0.4337 | <0.0001 | 0.648 |  |
Age75 to plus | -0.9700 | <0.0001 | 0.379 |  |
Female not married | -0.4202 | <0.0001 | 0.657 |  |
Male not married | -0.2269 | <0.0001 | 0.797 |  |
Male married (reference) |  |  | 1.000 |  |
Female married | -0.0178 | <0.0001 | 0.982 |  |
Post graduate qualifications | 0.5955 | <0.0001 | 1.814 |  |
Graduate diploma or certificate | 0.3845 | <0.0001 | 1.469 |  |
Bachelor degree | 0.4298 | <0.0001 | 1.537 |  |
Advance diploma and diploma | 0.3594 | <0.0001 | 1.432 |  |
Certificate level | 0.0317 | <0.0001 | 1.032 |  |
No post school qualification (reference) |  |  | 1.000 |  |
Not in the labour force | -0.2157 | <0.0001 | 0.806 |  |
Unemployed | -0.1459 | <0.0001 | 0.864 |  |
Employed in low skill ocupations | -0.3101 | <0.0001 | 0.733 |  |
Employed in high skill occupations (reference) |  |  | 1.000 |  |
English proficiency well/very well (reference) |  |  | 1.000 |  |
English proficiency not well | -0.2445 | <0.0001 | 0.783 |  |
English proficiency not at all | 0.0824 | <0.0001 | 1.086 |  |
Non-Indigenous (reference) |  |  | 1.000 |  |
Indigenous | -0.7376 | <0.0001 | 0.478 |  |
|  |
(a) N=12,691,410; Max-rescale R-square=0.22; Percent concordant=73% |
* Not significant at 5% level
9.4.1. Remoteness
Major cities and rest of the remoteness areas show a significant difference in Broadband access, when other variables are held constant. With respect to major cities, people living in the Inner Regions are about 40% less likely to have Broadband access. People living in Very Remote areas are about 59% less likely to have Broadband access. Those living in Remote areas are about 53% less likely while those living in Outer Regional areas are about 54% less likely to have Broadband access.
9.4.2. Weekly equivalised household income
The reference group is those with the weekly equivalised household income of $1 to $599. As seen in the tabular presentations in Chapter 4, even with other variables held constant, persons with nil or negative income show higher odds of having Broadband access compared to this lowest income group. People in other higher income groups are more likely to have Broadband access compared to people in the reference group. The odds of having access to Broadband increased with increasing income indicating that income is a significant determinant of Broadband connectivity of a person.
9.4.3. Family composition
Families with dependent students are more likely to have access to Broadband. The reference category for this variable is couple families without any children. Odds of a person in a couple family with dependent children (children under 15 and/or dependent students) are around 2.4 times the odds of the reference category. Odds of a person in a single parent family with dependent children are around 1.8 times the odds of the reference category. Thus the results indicate dependent students in a family increase the likelihood of home Broadband access
9.4.4. Age
In comparison with the reference category (35 to 44 years) persons in the 15 to 34 age group are more likely to have Broadband access while those in the 45 years or more age group are less likely. A person in the age between 15 and 17 is about 2.3 times more likely to have Broadband access compared to a person in the reference group. A person in the age between 18 and 24 is about 1.7 times more likely to have Broadband access with respect to a person in the reference age group. The likelihood of Broadband connectivity decreased after 55 years.
9.4.5. Gender and Marital status
Compared to married males, married females are as likely to have Broadband access. Unmarried females are about 34% and unmarried males are about 20% less likely to have Broadband access compared to married males.
9.4.6. Highest level of post-school qualifications
The results indicate that higher the level of post-school qualifications of an individual, the higher the likelihood of having Broadband access. Compared to the people with no post school qualifications, people with post graduate degrees are about 1.8 times more likely to have Broadband access. An individual with a bachelor degree is about 1.5 times more likely to have Broadband access compared to people in the reference category.
9.4.7. Labour force status and occupation
People in high skill occupations are more likely to have Broadband access compared to those in low skill occupations, unemployed and not in the labour force. Unemployed people are about 14% less likely to have Broadband access while people not in the labour force are about 19% less likely to have Broadband access compared to those occupied in the high skill jobs. People occupied in the low skilled occupations such as labourers, machine operators and drivers are about 27% less likely to have Broadband access.
9.4.8. Proficiency in spoken English
People with less spoken English proficiency are about 22% less likely to have Broadband access with respect to people with good proficiency in English. Surprisingly, people with no spoken English proficiency at all were found to be about 9% more likely to be having Broadband access compared to those with good spoken English proficiency, when other variables are held constant. This could be due to the importance of the Internet to international communication and research.
9.4.9. Indigenous status
Compared to non-Indigenous people, Indigenous people (Aboriginal, Torres Straight Island or both Aboriginal and Torres Straight Island communities) are about 52% less likely to have Broadband access.
9.5. Key conclusions from results of regression analysis
The results of regression analysis of both any Internet and Broadband connection suggest that level of income, having dependent children in the family, attainment of post school qualifications and age of a person have significant influence on an individual's likelihood of having any Internet or Broadband connection at home. There was a clear difference in the likelihood of having any Internet or Broadband connection between those living in the major cities and other geographic areas. Compared with the 2004 study based on Census 2001 data results of the analysis reveal similar relationships between Internet access and explanatory variables such as income, education and age, although there are differences in the magnitudes of the odds ratios.