EaP Index Methodology
The Eastern Partnership Index is a set of individual and composite indicators which measure the extent to which the six Eastern European neighbour countries of the European Union have established sustainable democratic institutions and practices, and the level of their integration with the EU. Following, we provide you with an explanation of the methodology that EaP Index is based on.
How is the Eastern Partnership Index assembled?
The Eastern Partnership Index combines indicators from existing sources with first-hand empirical information gathered by local country experts within the networks underpinning the EaP Civil Society Forum (CSF). This general design makes it possible to use the best existing knowledge and to improve this body of knowledge by focused, systematic data-collection that benefits from the CSF’s unique in-country insights and access to local knowledge in the EaP countries.
However, expert surveys are prone to subjectivity. Many existing expert surveys are characterised by a mismatch between “soft”, potentially biased, expert opinions and “hard” coding and aggregation practices that suggest a degree of precision rarely matched by the more complex underlying reality and its narrative representation in country reports. The expert survey underlying the Eastern Partnership Index therefore avoids broad judgments, and instead consists of specific and detailed fact-based questions, following a methodological strategy pioneered by the World Bank’s Doing Business surveys.
Most survey questions ask for a “Yes” or “No” response to induce experts to take a clear position and to minimise misclassification errors. All questions invite experts to explain and thus to contextualise their responses. In addition, experts are requested to substantiate their assessment by listing sources.
The survey is implemented by six country and six sectoral co-ordinators who supervise and assist the data collection and evaluation in the following sectors: deep and sustainable democracy (democracy and human rights); EU integration and convergence; sustainable development; international security, political dialogue and co-operation; sectoral co-operation and trade flows; citizens in Europe.
Firstly, the country co-ordinators ask local experts to evaluate the situation in their country on the basis of the questionnaire. These experts and the sectoral co-ordinators co-operate to ensure cross-country consistent assessments.
Secondly, the sectoral and country co-ordinators review the ratings and underlying rationales provided by the local experts. These reviews serve to clarify assessments where necessary, to compare the ratings across countries, and to revise ratings in consultation with local experts. This process facilitates a mutual understanding between experts and co-ordinators in order to improve the reliability and validity of the assessments.
Thirdly, sectoral and country co-ordinators draft narrative reports comparing the assessments for each country and (across all countries) sector. These drafts and the data scores are reviewed by a set of peer reviewers for each country. Finally, the data scores and narrative reports are reviewed and edited by the Index core team.
How are the Index scores calculated?
As a rule, all questions to be answered with yes or no by the country experts are coded 1 = yes or positive with regard, for example, to EU integration and convergence, and 0 = negative with regard to integration and convergence (labelled “1-0”). If the expert comments and consultations with experts suggest intermediate scores, such assessments are coded as 0.5. For items requiring numerical data (quantitative indicators), the figures are coded through a linear transformation, using the information they contain about distances between country scores (the same approach is taken with regard to assessing the other sector categories, e.g. deep and sustainable democracy or sustainable development). The transformation uses the following formula:
Where x refers to the value of the raw data; y is the corresponding score on the 0-1 scale; x max and x min are the endpoints of the original scale, also called “benchmarks”. We preferred this linear transformation over other possible standardisation techniques (e.g., z-transformation) since it is the simplest procedure.
For items scored with 0-1 or the intermediate 0.5, benchmarks are derived from the questions, assigning 1 and 0 to the best and worst possible performance. Since benchmarks for quantitative indicators often lack intuitive evidence, they have been defined by assigning the upper benchmark to a new EU member state.
How were the benchmarks chosen?
Lithuania was chosen as the benchmark country because it shares a post-Soviet legacy with EaP countries and, as the largest Baltic state, resembles EaP countries most with regard to population size. In addition, the selection of Lithuania reflects the idea that the target level for EaP countries should neither be a top performer nor a laggard, but rather an average new EU member state with both strengths and weaknesses. Being the sixth among 13 new EU member states in terms of economic wealth (per capita GDP in purchasing power standards in 2015 according to Eurostat), Lithuania epitomises this idea relatively well. Moreover, considerations of data availability favoured the choice of a single country rather than determining median values for all new EU member states.
The lower benchmark is defined by the value of the worst-performing EaP country in 2014. To enable a tracking of developments over time, we chose 2014 as the base year for defining benchmark values. This year represents a critical juncture for the EaP countries because three countries signed Association Agreements with the EU, and Ukraine was fundamentally transformed by the Revolution of Dignity, the annexation of Crimea, and the war in its eastern parts. In those rare cases when the values of an EaP country exceeded the upper benchmark or fell below the lower benchmark, the upper and lower scores were set to 1 and 0 respectively. All benchmark values and standardisation procedures are documented in an excel file that is available on the EaP Index website.
How are the different subcategories aggregated?
The Eastern Partnership Index 2017 measures the situation of EaP countries as of December 2017, or the latest data available up until that point. Thus, the measurement is status-oriented, making it possible to identify the positions of individual countries compared with other countries for the different sectors and questions.
Aggregating scores is necessary to arrive at an Index or composite indicator. However, aggregation implies decisions about the relative weight of subcategories that need to be explained. The Eastern Partnership Index consists of two dimensions, which are further disaggregated in sections, subsections, categories, subcategories and items. The different levels of disaggregation are designated by numbers such as 1.1, 1.1.1, etc.
This hierarchical structure reflects theoretical assumptions about the subcategories and boundaries of concepts. One could, for example, argue that free and fair elections constitute the core of democracy and should therefore be given a higher weight than the category of Freedom of Speech and Assembly. Conversely, one could also argue that democracy in most EaP countries is mainly impaired by unaccountable governments and the lack of independent media, while elections are more or less well organised.
For example, we define the section “Deep and Sustainable Democracy (Democracy and Human Rights” as consisting of nine subcategories:
- Democratic Rights and Elections, including Political Pluralism,
- Human Rights and Protection against Torture
- State Accountability
- Independent Media
- Freedom of Speech and Assembly
- Independent Judiciary
- Equal Opportunities and Non-Discrimination
- Fight Against Corruption
- Public Administration
The weights of the nine subcategories should depend on the importance each subcategory has for the normative dimension of Deep and Sustainable Democracy. One could, for example, argue that free and fair elections constitute the core of democracy and therefore Democratic Rights and Elections, including Political Pluralism, should be given a higher weight than the category of State Accountability.
Since it would be difficult to establish a clear priority of one or several subcategories over others, we decided to assign equal weights to all subcategories. Equal weighting of subcategories is also intuitively plausible since this method corresponds to the conceptual decision of conceiving, for example, the concept of democracy as composed of a variety of attributes placed on the same level. Equal weighting assumes that all subcategories of a concept possess equal conceptual status and that subcategories are partially substitutable by other subcategories.
An arithmetical aggregation of subcategories is, strictly speaking, possible only if subcategories are measured on an interval level, that is, we know that the scores of items, subcategories, categories, sections and dimensions contain information on distances. Most numerical data are measured at interval level: in these cases, we know, for example, that a share of EU exports amounting to 40% of GDP is twice a share of 20% and that this ratio is equal to the ratio between 60% and 30%. For the yes-no questions and items measured with other ordinal scales, we have information only about the ordering of scores, not about the distances between scores.
For example, we do not know the distance between a yes and a no for the question regarding parties’ equitable access to state-owned media. Neither do we know whether the difference between yes and no for this question is equivalent with the difference between yes and no for the question asking whether political parties are provided with public funds to finance campaigns.
In principle, this uncertainty would limit us to determining aggregate scores by selecting the median rank out of the ranks a country has achieved for all subcategories (assuming equal weighting). This would, however, imply omitting the more detailed information contained by the numerical items. To use this information and to put more emphasis on big differences between countries, we have opted to construct quasi-interval level scores by adding the scores of items measured at ordinal level. This has been a standard practice in many indices and can also be justified by the rationale behind equal weighting.
Given the frequent uncertainty about the importance of subcategories for aggregate concepts, the safest strategy seems to be assigning equal status to all subcategories. Equal status suggests assuming that a score of 1 used to code a positive response for one question equals a score of 1 for another positive response. Moreover, equal status means that all subcategories constituting a concept are partially substitutable. The most appropriate aggregation technique for partially substitutable subcategories is addition.
How are the different questions weighted?
Since the number of items differs from subcategory to subcategory, and since we want to apply equal weighting, we standardised the subcategory scores by dividing them through the number of items. Thus, the subcategory score ranges between 1 and 0 and expresses the share of yes-no questions answered positively in terms of the aggregate concept (and/or the extent to which numerical items or ordinal-level items are evaluated positively).
Quasi-interval level scores allow a range of aggregation techniques at higher levels of aggregation (subcategories, categories, sections and dimensions). The most important methods are multiplication and addition. Multiplication assigns more weight to individual subcategories, emphasising the necessity of subcategories for a concept; in contrast, addition facilitates the compensation of weaker scores on some subcategories by stronger scores on other subcategories, emphasising the substitutability of subcategories for a concept.
We apply an additive aggregation of subcategories, categories and sections because this approach fits to the method used on the item level, reflects the substitutability of subcategories, and is less sensitive with regard to deviating values on individual subcategories. To standardise the aggregate sums and ensure equal weighting, arithmetical means are calculated. An aggregate score is thereby calculated for each of the two dimensions of Linkage and Approximation. This method reflects the conceptual idea that the two dimensions are interdependent and jointly necessary for progress in European integration and sustainable democratic development.
Aggregation levels, aggregate scores, individual scores and the underlying raw data are documented in the Excel files.