6401.0.60.003 - Information Paper: Making Greater Use of Transactions Data to compile the Consumer Price Index, Australia, 2016  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 29/11/2016  First Issue
   Page tools: Print Print Page Print all pages in this productPrint All

CRITERIA FOR ASSESSING MULTILATERAL METHODS


INTRODUCTION

4.1 This publication presents four multilateral methods for producing temporal indexes from transactions data, as well as four different methods for extending the index without revising it. It also notes that there is no broad consensus at present about the best choice of methods in this context. In the absence of such consensus, the ABS has developed a framework to guide the choice of method to use to make greater use of transactions data.

4.2 A natural starting point is the ABS Data Quality Framework (DQF) (ABS 2009), which prompts the consideration of statistical quality more broadly than focussing on a single aspect (e.g. accuracy). The framework includes seven dimensions of statistical quality:

  • Institutional Environment - pertains to the institutional and organisational context in which a statistical producer operates
  • Relevance - pertains to how well a statistic meets user needs
  • Timeliness - pertains to how quickly and frequently the statistic is published
  • Accuracy - pertains to how well a statistic measures the desired concept
  • Coherence - pertains to how consistent the statistic is with sources of related information
  • Interpretability - pertains to the information available to provide insight into the statistic
  • Accessibility - pertains to ease of access to the statistic

4.3 The benefits and issues associated with using multilateral methods, as discussed earlier, can be linked to dimensions of this quality framework. These links are summarised in Table 4.1, and discussed further in the next few paragraphs.

Table 4.1: Framework for assessing multilateral methods

Consideration Quality dimensions

Resources: does this method help facilitate more effective use of human and information resources? Institutional Environment, Timeliness
Theoretical properties: what conceptual properties does the index method have, and how well do these align with the CPI purpose? Accuracy
Transitivity: to what extent is the index transitive? Accuracy, Coherence
Characteristicity: to what extent are price comparisons relevant to the time periods being compared? Accuracy, Relevance
Flexibility: what scope is there to use or adapt the method for new statistical products or data sources? Coherence, Institutional Environment
Interpretability: how easy is it to understand the method and the price movements it calculates? Interpretability



4.4 This publication considers multilateral methods with the aim of maximising the use of transactions data in the CPI. As mentioned in the Introduction of this publication, these methods also offer opportunities for automating routine manual processes - making better use of human resources - and consequently making it more feasible to produce higher-frequency outputs. These considerations are linked to the Institutional Environment and Timeliness dimensions; they will be important as the ABS moves towards implementing a method for routine use in producing the CPI.

4.5 The main advantage of maximising the use of transactions data is to improve the accuracy of the index, by reducing sampling error and biases associated with traditional sampling and weighting. However, the methods considered in this publication have different properties, both in theory and in practice. The next section explores the differences between the methods' theoretical properties and considers their advantages and disadvantages in a CPI context.

4.6 When proposing multilateral indexes for temporal aggregation, Ivancic, Fox and Diewert (2011) note a tension between transitivity - as chained bilateral indexes may drift over time - and characteristicity - as prices or preferences from distant periods may unduly influence multilateral comparisons. This links to the
Accuracy
dimension, but can also be seen as a trade-off between Coherence and Relevance, which depends more on the choices of window length and extension method than the multilateral method.

4.7 Other considerations relevant to multilateral methods include flexibility - how well a method can be adapted for new statistical products or data sources - and interpretability - how easy it is to understand a method in general as well as the index movements it produces. Both of these considerations are addressed in this section. Interpretability is already one of the quality dimensions, whereas flexibility links to Coherence, as well as Institutional Environment (as it reduces the potential complexity of systems).


THEORETICAL PROPERTIES

4.8 ILO (2004) assesses bilateral price indexes both from axiomatic/test (Chapter 16) and economic approaches (Chapters 17-18). The methods that emerge best from these assessments are the Fisher and Törnqvist indexes, which closely approximate one another in normal circumstances (ILO 2004: Chapter 17).

4.9 Similar approaches to assessing multilateral indexes in a spatial context have been developed and presented in several papers, especially Diewert (1999) and Balk (2001). These approaches reveal differences between methods; they are examined in detail for temporal applications in the next two sub-sections.


Test approach

4.10 Diewert (1999) and Balk (2001) propose similar sets of tests for spatial multilateral indexes. These Tests are expressed in terms of volume shares, which are equivalent to (normalised) multilateral quantity indexes. Table 4.2 contains the Tests proposed by both authors, expressed in terms of price and quantity comparisons between time periods. The adaptation of these Tests for the temporal context is discussed further in the Appendix 2.

4.11 Table 4.2 indicates which Tests are satisfied by each multilateral method considered in this publication, and whether the Test performance is preserved after the multilateral index is extended using the methods described earlier. The results for the GEKS and GK methods, as well as the TPD method for Tests 1 to 9, are taken from Balk (2001) and Diewert (1999); the other results are derived or discussed in the Appendix 2.

Table 4.2: Tests for multilateral comparisons

Test
GEKS
TPD
GK
QAUV_TPD
Preserved after extension

1 Positivity and continuity test: price and volume indexes are normalised, positive and continuous functions of (positive) prices and (nonnegative) quantities
Y
Y
Y
Y
Y
2 Weak proportionality test: if prices and quantities in all periods are proportional, price and volume comparisons depend only on those proportions (Balk only)
Y
Y
Y
Y
Y
2x If quantities in all periods are proportional, volume comparisons depend only on those proportions
Y
N
Y
Y
Y
2p If prices in all periods are proportional, price comparisons depend only on those proportions
Y
Y
Y
Y
Y
3 Homogeneity in quantities test: rescaling the quantities in some period does not alter the price comparisons if relative prices are unchanged
Y*
Y
N
Y
Y
4 Monetary units test: rescaling the prices in some period does not alter the volume comparisons if relative quantities are unchanged
Y
Y
Y
Y
Y
5 Commensurability test: changing the units in which all prices and quantities are measured does not alter the system of comparisons
Y
Y
Y
Y
Y
6 Symmetric treatment of entities test: reordering the periods does not alter the system of comparisons
Y
Y
Y
Y
N
7 Symmetric treatment of commodities test: reordering the commodities does not alter the system of comparisons
Y
Y
Y
Y
Y
8 Partitioning test: if there is a group of two or more periods with proportional prices and quantities, those proportions determine price and volume comparisons within the group, and aggregating price and quantities across periods within the group does not alter comparisons between periods outside the group
N
N
Y
N
?
9 Irrelevance of tiny periods test: as the aggregate volume in a period approaches zero, its influence on comparisons between other periods vanishes
N*
N
Y
N
Y
10 Monotonicity in quantities test: each period’s volume share is an increasing function of its quantities
Y
?
N
?
?
11 Bilateral consistency in aggregation test: if we can group all periods into two groups such that prices and quantities in all periods in a group are proportional to a group-specific pair of reference price and quantity vectors, aggregate price and volume comparisons between groups are equal to Fisher price and quantity comparisons between the pairs of reference vectors (Diewert only)
Y
?
N
?
?
12 Additivity test: the system of comparisons is additive (Diewert only)
N
N
Y
Y
N

Note: * Balk (2001) considers a weighted GEKS, which satisfies test 9 but not test 3, whereas the opposite is true for the unweighted GEKS considered in this publication.


4.12 If we first consider the performance of the multilateral methods without any extension. Table 4.2 reveals that no method satisfies all of these Tests. This leads both Diewert (1999) and Balk (2001) to conclude that the importance of different Tests, and hence the most appropriate method, depends on the situation.

4.13 Several Tests are satisfied by all four multilateral methods (1, 2, 4, 5, 6 and 7). Test 6 implies that the indexes are free from chain drift, which is the main motivation for using multilateral instead of chained bilateral indexes.

4.14 The Tests that differentiate the multilateral methods are of varying importance and relevance in the context of comparing prices across time. Arguably, the most interesting of these Tests are 3, 9 and 12.

4.15 The GK method's failure of Test 3 means that it gives greater weight to the price structures in periods with larger aggregate volumes. While Chessa (2016) argues that this is an advantage - because it gives out-of-season prices in seasonal product classes lower weight - it is not ideal to treat time periods unequally in temporal comparisons. Ensuring product classes are sufficiently broad that a substantial quantity of products are sold year round should both reduce the severity of the GK method's failure of this Test and reduce the risk that out-of-season prices have large weights. This also reduces the risk that there are any time periods with negligible volumes, and hence the other methods' failure of Test 9 should not be of great concern.

4.16 Test 12 is presented by de Haan (2015) and Chessa (2016) as an advantage of additive multilateral methods: they extend the notion of a unit value price to classes of broadly comparable products. If the reference prices used in the GK and QAUV_TPD methods adequately account for quality differences between products (that is, if they make products homogeneous) then this unit value approach seems appropriate. On the other hand, additive methods make strong assumptions about substitutability which are contentious from an economic perspective, as described in the next sub-section.

4.17 The monotonicity Test 10 is also interesting. The performance of several methods against this Test is unknown. A test for monotonicity in prices (which is not implied by monotonicity in quantities) may be more relevant for temporal price comparisons. Also, it is common for a product to be observed in only one period in a multilateral window, and it is not clear how monotonicity should apply in this scenario. This is an area for further study.

4.18 Of the remaining Tests, the TPD method's failure of Test 2x is not critical in this context, given it satisfies the corresponding price comparison Test 2p and the weak proportionality Test 2. Tests 8 and 11 both relate to how the system of multilateral comparisons behave when aggregated across or groups of similar entities are split up (e.g. countries in a bloc). NSOs are unlikely to aggregate or disaggregate time periods in this way. Conceivably, however, NSOs may wish to produce indexes at different frequencies (e.g. monthly, quarterly and annually), and it seems important that corresponding price movements, and trends over time, are broadly similar. An intermediate value test (footnote 1) may be more appropriate here.

4.19 Finally, note that extending these multilateral methods using the methods described earlier undermines their Test performance. This is because they involve splicing together price movements from successive multilateral windows, each of which makes use of slightly different price information. Of particular importance is that Tests 6 and 12 are prone to failure: i.e. extended indexes are not guaranteed to be free from chain drift, and extended additive indexes may only be approximately additive. Moreover, the differences between extension methods relate to how they deal with temporal ordering, both in constructing multilateral windows, and in combining certain price comparisons from successive windows to update the index. As such, different extension methods may yield different indexes: Krsinich (2016) and Chessa (2016) find substantial differences for some (but not all) datasets and product types. Empirical comparisons using data available to the ABS are presented in the next section.


Economic approach

4.20 The economic approach to justifying consumer price indexes assumes that consumers optimise their basket of purchases to maximise utility for a given budget, or minimise cost for a given level of utility. In this context, price (quantity) comparisons should reflect differences between optimal baskets of fixed utility (cost) in different time periods. This approach is addressed in more detail in the ILO CPI manual (2004: chapters 17 and 18).

4.21 An assumption underpinning the economic approach is that some function exists to express the utility of a basket of goods and services. In particular, this function describes the impact on utility of substitution between different products. Intuitively, if the price of one product grows more slowly than others, it may be advantageous for a consumer to direct a larger share of their budget towards that product over time.

4.22 Multilateral comparison methods assume that the utility function is uniform across all economic entities that are being compared. In the temporal context, there is a risk that this assumption fails due to changes in preferences over time. This risk can be mitigated by limiting the number of connecting time periods that are included in the multilateral window, though of course using too short a window increases the risk that Test 6 fails. This does not help to discriminate between methods.

4.23 However, the multilateral methods involve different assumptions about the shape or "functional form" of the utility function, and consequently the substitutability of different products. Diewert (1999) shows that the GEKS method is exact for a "flexible" functional form - that is, it expresses the price differences experienced by optimising consumers without imposing restrictive assumptions about how they can substitute between products. In contrast, the GK and other additive methods are consistent only with linear utility functions, so they may suffer substitution bias if consumer preferences are more complex. To the best of our knowledge, the TPD and QAUV_TPD methods have not yet been assessed rigorously from this economic approach.

4.24 Examining differences between the GEKS and other methods in the temporal context is an ongoing area of research. In practice, empirical testing does not reveal compelling evidence of substantial substitution bias in temporal GK indexes or other reference price indexes. Chessa (2016) finds little differences between methods. The ABS has used the method proposed by Hill (2000) to test for substitution bias in temporal GK, TPD and QAUV_TPD indexes but has found no clear evidence of such bias.


FLEXIBILITY

4.25 This publication deals primarily with methods for producing temporal price indexes from transactions datasets of a certain type that are available to the ABS. These datasets include information on quantities, expenditures, and descriptive information in a format that allows the identification and classification of products but makes it difficult to extract detailed product characteristics.

4.26 In evaluating index methods, it is an advantage for a method to be flexible enough to be used for a range of purposes and dataset types. If we prefer the properties of one method for aggregating transactions data, it makes sense to use variants of the same method for different data types and statistical products; this promotes coherence, aids interpretation, and reduces the potential complexity of processing systems.

4.27 In future, multilateral methods may facilitate the production of statistics the ABS does not currently produce, such as spatial indexes. This is feasible for any of the multilateral indexes considered in this publication, which are all based on spatial comparison methods.

4.28 The ABS may also acquire price datasets of other types, such as high frequency price datasets without weighting information (i.e. web scraped data) or transactions datasets that contain detailed product characteristics.

4.29 All of the methods described in this publication can be adapted for datasets without weighting information. This effectively treats each product as being of equal economic importance, which may be contentious; however, the question in this case is not whether the methods are capable of handling unweighted data, but whether it is appropriate to use unweighted data in the first place.

4.30 Finally, if transactions datasets contain detailed product characteristics, the GEKS, TPD and QAUV_TPD methods can be adapted to produce hedonic indexes using Time Dummy Hedonic indexes to substitute for the bilateral link formula (in the GEKS) or the TPD model (in the other methods). These hedonic indexes have the potential for better quality adjustment than the basic multilateral methods. In contrast, there is no established way of adapting the GK method to produce hedonic indexes, though Chessa (2016) suggests treating each combination of characteristics as a single product identifier. It is unclear whether this approach performs as well as a hedonic index in adjusting for quality change: this is an area of ongoing research.


INTERPRETABILITY

4.31 The ABS places a high value on transparency by understanding and explaining the statistics published, and describing and justifing the methods used. This is of critical importance for the CPI as a Main Economic Indicator (MEI). Two aspects of interpretability need consideration: first, to what extent the methods themselves are easy for index practitioners and users to understand; second, whether it is easy to understand the price movements each index produces, especially which products have the greatest influence on these movements and why.


Interpreting the methods

4.32 All of the multilateral methods considered here are more complicated than standard bilateral indexes. Nevertheless, all of them can be expressed in reasonably simple ways:
  • The GEKS is perhaps the easiest to grasp - as it is based on traditional price index theory - as the multilateral movements are derived by combining superlative bilateral indexes.
  • The TPD has a simple model representation, reflecting the relationship it models between prices, products and time; and as it is equivalent to the Rao method (Balk 2001; Rao 2005) it can also be presented using simultaneous equations or matrices.
  • The GK and QAUV_TPD methods appeal to the notions of homogeneity and unit values: first, these methods calculate a set of adjustment factors to express all products in the same terms (make them homogeneous); second, they calculate a unit value index over all these products. The GK can be presented using simultaneous equations or matrices (see Collier 1999; Balk 2001). The QAUV_TPD is perhaps the most complicated because it involves calculating one set of price comparisons in estimating a TPD model, and then another set of price comparisons through the QAUV formula (footnote 2) . It is necessary to consider the trade-off between this complexity and the additivity that is induced by applying the QAUV formula.

4.33 Note that the extension methods may also affect interpretability: the movement splice and direct extension methods are perhaps easier to understand than the window or half-window splice methods. Ultimately, however, interpretability is somewhat subjective, and different methods may appear intuitive to different audiences.


Interpreting the index movements

4.34 Price movements can be analysed using a range of analytical techniques. A useful technique is to decompose the aggregate index movement into a simple function (sum or product) of contributions from individual products. This allows the identification of the products with the greatest influence on the index movement, which is useful for validating and explaining (and managing the quality of) the index movements.

4.35 Standard bilateral index movements are relatively easy to decompose, as they are expressed in terms of product prices and weights in two periods. In contrast, multilateral price movements depend on prices and weights across the multilateral window so they are more complicated to decompose.

4.36 In recent years, some progress towards the decomposition of multilateral indexes has been made. Van der Grient (2010) and de Haan and Hendriks (2013) show how both GEKS and TPD index movements can be decomposed into a bilateral Törnqvist term as well as a number of other factors. The ABS has further developed and tested methods to express GEKS and TPD movements as a product of contributions from individual items. These methods make it easy to identify the items and product groups with the greatest influence on index movements.

4.37 The existence of these decomposition methods is advantageous for the GEKS and TPD. However, as the QAUV and GK are additive methods, the implicit quantity index has a natural additive decomposition, and there may well be decompositions for the associated price indexes. This is an area for further research.


SUMMARY OF COMPARISONS BETWEEN METHODS

4.38 This section presents a framework for assessing multilateral methods, and explores differences between methods based on their theoretical properties, flexibility and interpretability.

4.39 In particular, the Theoretical Properties sub-section distinguished between methods on the basis of five factors:
  • Whether they weight time periods equally or based on their volumes;
  • Whether they are additive;
  • Whether the price and quantity indexes are monotonic;
  • Whether they produce consistent indexes at different frequencies;
  • What assumptions they involve about uniformity over time and substitutability.

4.40 Operational decisions may reduce the differences between methods: for instance, constructing product classes of approximately consistent volumes means that all methods weight time periods approximately equally. Some properties, such as additivity, are not strictly preserved as the index is extended, and consequently their satisfaction should not be given too much weight in comparing methods. Other properties, such as monotonicity, require further study.

4.41 The Flexibility sub-section describes how all four methods can be used for new statistical products and data sources, noting that the GK method cannot be adapted to produce hedonic indexes but can make use of characteristic information in defining product identifiers.

4.42 The Interpretability sub-section argues that the four multilateral methods can be explained in reasonably simple ways, although they are all more complicated than bilateral index methods. It also notes the usefulness of movement decomposition methods for understanding index movements. Decomposition methods have been developed for GEKS and TPD methods but would require further work to develop for the GK and QAUV_TPD methods.

4.43 The ABS will consider these factors when assessing which is the most appropriate method for our purposes.

1 For instance, that the (quarterly) price comparison between two quarters lies between the minimum and maximum price comparisons between any month in the first quarter and any month in the second quarter. <back
2 De Haan and Krsinich (2014) and de Haan (2015) argue that the TPD and QAUV_TPD are approximately equal. <back