EaP Index Methodology
The Eastern Partnership Index is a set of individual and composite indicators which measure the extent to which the six Eastern European neighbour countries of the European Union have established sustainable democratic institutions and practices, and the level of their integration with the EU. Following, we provide you with an explanation of the methodology that EaP Index is based on.
How is the Eastern Partnership Index assembled?
The Eastern Partnership Index combines indicators from existing sources with first-hand empirical information gathered by local country experts within the networks underpinning the EaP Civil Society Forum (CSF). This general design makes it possible to use the best existing knowledge and to improve this body of knowledge by focused, systematic data-collection that benefits from the CSF’s unique in-country insights and access to local knowledge in the EaP countries.
However, expert surveys are prone to subjectivity. Many existing expert surveys are characterised by a mismatch between “soft”, potentially biased, expert opinions and “hard” coding and aggregation practices that suggest a degree of precision rarely matched by the more complex underlying reality and its narrative representation in country reports. The expert survey underlying the Eastern Partnership Index therefore avoids broad judgments, and instead consists of specific and detailed fact-based questions, following a methodological strategy pioneered by the World Bank’s Doing Business surveys.
Most survey questions ask for a “Yes” or “No” response to induce experts to take a clear position and to minimise misclassification errors. All questions invite experts to explain and thus to contextualise their responses. In addition, experts are requested to substantiate their assessment by listing sources.
The survey is filled by local experts and overseen by sectoral co-ordinators who supervise and assist the data collection and evaluation. Firstly, local experts with an in-depth sectoral knowledge of EaP countries evaluate the situation in their country of expertise on the basis of the questionnaire. These experts and the sectoral co-ordinators co-operate to ensure cross-nationally consistent assessments. Secondly, the sectoral co-ordinators review the scores and underlying rationales provided by the local experts. These reviews serve to clarify assessments where necessary, to compare the ratings across countries, and to revise ratings in consultation with local experts. This process facilitates a mutual understanding between experts and co-ordinators in order to improve the reliability and validity of the assessments. Thirdly, sectoral co-ordinators draft narrative reports comparing the assessments for each country and (across all countries) sector.
The Index team alongside contributing experts also holds focus groups, which are discussions focused on sectoral reports to validate findings and scores. Finally, the data scores and narrative reports are reviewed and edited by the Index team.
How are the Index scores calculated?
As a rule, all questions to be answered with yes or no by the country experts are coded 1 = yes or positive with regard, for example, to EU integration and convergence, and 0 = negative with regard to integration and convergence (labelled “1-0”). If the expert comments and consultations with experts suggest intermediate scores, such assessments are coded as 0.5. For items requiring numerical data (quantitative indicators), the figures are coded through a linear transformation, using the information they contain about distances between country scores (the same approach is taken with regard to assessing the other sector categories, e.g. deep and sustainable democracy or sustainable development). The transformation uses the following formula:
Where x refers to the value of the raw data; y is the corresponding score on the 0-1 scale; x max and x min are the endpoints of the original scale, also called “benchmarks”. We preferred this linear transformation over other possible standardisation techniques (e.g., z-transformation) since it is the simplest procedure.
For items scored with 0-1 or the intermediate 0.5, benchmarks are derived from the questions, assigning 1 and 0 to the best and worst possible performance. Since benchmarks for quantitative indicators often lack intuitive evidence, they have been defined by assigning the upper benchmark to a new EU member state.
How were the benchmarks chosen?
Lithuania was chosen as the benchmark country because it shares a post-Soviet legacy with EaP countries and, as the largest Baltic state, resembles EaP countries most with regard to population size. In addition, the selection of Lithuania reflects the idea that the target level for EaP countries should neither be a top performer nor a laggard, but rather an average new EU member state with both strengths and weaknesses. Being the sixth among 13 new EU member states in terms of economic wealth (per capita GDP in purchasing power standards in 2015 according to Eurostat), Lithuania epitomises this idea relatively well. Moreover, considerations of data availability favoured the choice of a single country rather than determining median values for all new EU member states.
The lower benchmark is defined by the value of the worst-performing EaP country in 2014. To enable a tracking of developments over time, we chose 2014 as the base year for defining benchmark values. This year represents a critical juncture for the EaP countries because three countries signed Association Agreements with the EU, and Ukraine was fundamentally transformed by the Revolution of Dignity, the annexation of Crimea, and the war in its eastern parts. In those rare cases when the values of an EaP country exceeded the upper benchmark or fell below the lower benchmark, the upper and lower scores were set to 1 and 0 respectively. All benchmark values and standardisation procedures are documented in an excel file that is available on the EaP Index website.
How are the different subcategories aggregated?
The Eastern Partnership Index 2021 measures the situation of EaP countries as of August 2021, or the latest data available up until that point. Thus, the measurement is status-oriented, making it possible to identify the positions of individual countries compared with other countries for the different sectors and questions.
The main EaP Indices summarise a wealth of detailed evidence in few scores. Since their compact and concise format implies a loss of important information contained in individual items and assessments, the aggregation has to be carefully considered and justified. Any aggregation method necessitates decisions about the relative weight of items that need to be explained.
The EaP Indices are based on a conceptual framework that establishes a hierarchy of concepts, descending from general and abstract concepts – democracy and good governance, policy convergence and sustainable development – to specific, tangible and more measurable concepts such as energy efficiency. The conceptual framework groups these concepts within higher-level concepts, places concepts at the same level and distinguishes them from other concepts. These structuring and placement decisions entail assumptions about the salience of aspects (lower-level concepts) for the realisation of the higher-level concepts measured by the Indices.
Reflecting this conceptual framework, the EaP Indices assign equal weights to those aspects of a concept that are placed on the same level of the conceptual hierarchy. The main rationale for this weighting principle is that aspects have been classified on the same level because they are considered to be as important as the other aspects on that level. Thus, the equal weighting of aspects is backed by the assumption that these aspects have equal conceptual status. One consequence of this assumption is to refrain from distinguishing between essential and auxiliary aspects. While all aspects should be present for the full realisation of the aggregate concept, a single dysfunctional aspect does not necessarily preclude its realisation. Put differently, equal weighting suggests considering the components of a concept as partially substitutable.
Partial substitutability also leads to a method of aggregation that allows for some balancing between items, or, more generally, aspects forming aggregate concepts. An arithmetical aggregation of aspects is, strictly speaking, possible only if these are measured on an interval level, that is, if the scores contain information on distances. Most numerical data are measured at interval level: in these cases, we know, for example, that a share of EU exports amounting to 40% of GDP is twice a share of 20% and that this ratio is equal to the ratio between 60% and 30%. For the yes-no questions and items measured with other ordinal scales, there is only information about the ordering of scores, not about the distances between scores.
For example, the distance between a yes and a no for the question regarding political parties’ equitable access to state-owned media is not known. Neither do we know whether the difference between yes and no for this question is equivalent with the difference between yes and no for the question asking whether political parties are provided with public funds to finance campaigns.
In principle, this uncertainty would require aggregation techniques that use the ranks of countries rather than the distances between them, that is, an aggregation by calculating the median rather than the arithmetic mean. This would, however, imply omitting the more detailed information contained by the numerical items. To use this information and to put more emphasis on larger differences between countries, the quasi-interval level scores are constructed by adding the scores of items measured at ordinal level. This has been a standard practice in many indices and can also be justified by the rationale behind equal weighting.
Since the number of aspects representing aggregate concepts differs, and since the EaP Index applies equal weighting, aggregate scores are standardised by dividing them through the number of aspects. Thus, the aggregate scores range between 1 and 0 and express the share of items evaluated positively in terms of the aggregate concept. The resulting proportions allow a range of aggregation techniques at higher levels of aggregation. The most important methods are multiplication and addition. Multiplication assigns more weight to individual aspects, emphasising the necessity of aspects for a concept; in contrast, addition facilitates the compensation of weaker scores on some aspects by stronger scores on other aspects, emphasising the substitutability of aspects for a concept.
The EaP Index applies an additive aggregation of aspects because this approach fits to the method used on the item level, reflects the substitutability of aspects, and is less sensitive with regard to deviating values on individual aspects. To standardise the aggregate sums and ensure equal weighting, arithmetical means are calculated.
How are the different questions weighted?
Since the number of items differs from subcategory to subcategory, and since we want to apply equal weighting, we standardised the subcategory scores by dividing them through the number of items. Thus, the subcategory score ranges between 1 and 0 and expresses the share of yes-no questions answered positively in terms of the aggregate concept (and/or the extent to which numerical items or ordinal-level items are evaluated positively).
Quasi-interval level scores allow a range of aggregation techniques at higher levels of aggregation (subcategories, categories, sections and dimensions). The most important methods are multiplication and addition. Multiplication assigns more weight to individual subcategories, emphasising the necessity of subcategories for a concept; in contrast, addition facilitates the compensation of weaker scores on some subcategories by stronger scores on other subcategories, emphasising the substitutability of subcategories for a concept.
We apply an additive aggregation of subcategories, categories and sections because this approach fits to the method used on the item level, reflects the substitutability of subcategories, and is less sensitive with regard to deviating values on individual subcategories. To standardise the aggregate sums and ensure equal weighting, arithmetical means are calculated. An aggregate score is thereby calculated for each of the two dimensions of Linkage (phased out from 2020) and Approximation. This method reflects the conceptual idea that the two dimensions are interdependent and jointly necessary for progress in European integration and sustainable democratic development.
Aggregation levels, aggregate scores, individual scores and the underlying raw data are documented in the Excel files.