Custom Data Integration
File Type Description
The four essential file types used in VoteWorld (JVW) are called CRR,
RCL, INF, and DTL. This comes from the fact that the file’s extension
is one of these forms, ie. 1e.crr or 28u.dtl. The number and letter combination
before the extension refers to the session/parliament of the data. For
example, in the US House, the files are simply named using the convention
“#.ext”, such as 1.dtl, where the number refers to the first
session of the DTL file, while in the European parliament, an ‘e’
is added to make 1e.dtl for the first session. Likewise, the naming convention
for the UN is “#u.ext” and for the US Senate is “#s.ext”.
Each of these file types hold different data which is described below.
Where possible each data value is separated by a “space-comma-space”
combination. For example:
“G , ANGLADE” or “1402 , 6”
If there is no space between the commas, and if the commas are not present,
the data will be considered invalid and the program will not run. The
only exception to this rule is for the US Congress data which was formatted
before the other legislatures and is assumed to be the only unique case.
In the case were data is not available, such as DW-nominate scores, n/a
is used. In cases where an integer is used to refer to a party id or a
country id (as described below), a code table explaining what the values
stand for should also be provided, usually in an Excel or text format.
Custom Data Format
The file format used for custom data for legislatures not already a part
of JVW follows. Once all file types are prepared, they need to be zipped
together for ease of integration into JVW.
CRR files contain personal or specific information about a politician
or a country, such as where the politician is from, party affiliations,
and DW-nominate scores. Each line of the crr file holds the following
polid(int) , tokenRep(string) , name with spaces(string) , geographic
name(string) , x1float(float) , y1float(float), descriptor1(int) , descriptor2(int)
50 , G , ANGLADE Magdeleine , Spain , 0.363 , -0.848, 5 , 1402 , 6
52 , M , ANSART Gustave , Italy , 6 , n/a , n/a, 7 , 1409 , 5
polid - is the unique identification number for each politician. In the
event that the politician is present in multiple sessions, their id never
changes across sessions. This number should be an integer. This value
may not be absent.
tokenRep – is a character representation of the party this politician
belongs to that will be used to display the politician on the ideological
map. This should be a single letter character and may not be absent.
name – the politician’s full name. Spaces are allowed between
first, last, and/or middle name, and no commas should be used to separate
name parts. This value may not be absent.
geographic name – the geographical full name with which to associate
the politician. Spaces are allowed between multiple name parts and no
commas should be used to separate name parts. In the case that a map is
not being used, these values should be unique and not n/a since politicians
need to be grouped in some unique fashion. This value may not be absent.
x1float/y1float – are floating point values for the DW-nominate
scores in the first and second dimension respectively. These values may
descriptor1/descriptor2/descriptor3 – are other optional values
by which politicians may be grouped (party, region, etc) in addition to
the mandatory grouping my a geographic name. These values are integer
and are supplied with Legend Tables to decode what the integer stands
for. These values may be absent.
RCL files contain the actual votes of each politician, and should not
be used to store additional descriptive data about the politician. The
minimum data needed on each line is the politician/country id number,
followed by a list of vote data. The vote data uses the numbers 0 –
9 to describe the votes and follows the convention:
0 – politician was not a member to vote (occurs when a seat is vacated
or taken in the middle of a session)
1 – yes vote
2 – pair yes vote
3 – Ann. yes vote
4 – Ann. yes vote
5 – pair no vote
6 – no vote
7 – blank
8 – present but didn’t vote
9 – abstain vote
polid(int) , v1(int) , ....vn(int)
ex: 2 , 6 , 6 , 9 , 9 , 9 , 9 , 9 , 9 , 9 , 1 , 9 , 9 , 9 , 1
polid – the unique integer identification number for the politician.
v1-vn – the first through nth vote of the politician. Note that
there is no terminating comma.
INF files, like crr’s, contain any and all necessary information
about a particular vote or issue on the table, such as the date, the number
of yes/no votes, and most importantly the cutting line points.
"date" , voteNumber(int) , "title" , numYes(int)
, numNo(int) , numAbstain(int) , d1float , z1float , d2float , z2float(floats,
may be absent)
"19-Jul-79" , 2 , "Order of Business" , 78 , 218 ,
26 , n/a , n/a , n/a ,
"19-Jul-79" , 3 , "Order of Business" , 29 , 205 ,
6 , 0.528 , -0.234 , 0.253 , 0.127
“date” – is the date stored in a string literal and
surrounded by quotations. There are no spaces between dd-mm-yy components
and hyphens are used instead. This value may not be absent.
voteNumber – is the vote within the session and not throughout all
sessions. This value is an integer and should not be absent.
“title” – is the a short descriptor of the vote in the
quotation marks. Spaces are allowed between words and this value may not
numYes – is an integer specifying the number of politicians that
voted yes on this issue. This value may not be absent.
numNo - is an integer specifying the number of politicians that voted
no on this issue. This value may not be absent.
numAbstain - is an integer specifying the number of politicians that abstained
on this issue. This value may not be absent.
d1float, z1float, d2float, z2float – are the values used in calculating
the cutting line along the first and second dimensions. d1 is the distance
from z1 to the center of the cutting line along the first dimension, and
d2 is the distance from z2 to the center of the cutting line along the
Sometimes, you would like to give more detailed information in the form
of sentences about a vote, rather than have the user interpret what particular
data codes mean. In this case only, a DTL file accompanies the INF file,
and this where you can include as detailed a “string literal”
description as you like. Each line of this description should match the
line the vote description appears on in the INF file. That is, if a vote
detail is absent or not needed in the DTL file, it should have “”
on the line and not introduce the next vote description on this line.
This one to one correspondence between INF and DTL files is essential.
"some long vote description for each vote on each line"
"1st vote - Order of Business"
"2nd vote - Order of Business"
In the case where the optional descriptors are used in the CRR Files,
JVW will prompt for the files that hold this legend information. The order
in which they are loaded into JVW must match the order of their appearance
in the CRR File. That is, the file holding the code table for descriptor1
should be loaded first, descriptor2 second, and descriptor3 third. Usually,
these files are simple text files with the following format.
integerID(unique integer) , tokenRep(string), fullDescriptorName(string);
1302 , B , Bloque Nacionalista Gallego
1204 , G , Bündnis 90/Die Grünen
integerID – is the unique integer code used in the CRR file. This
value may not be absent.
tokenRep – is a character representation of how the token used to
depict this descriptor on the ideological map. This value may not be absent.
fullDescriptorName – the full name of the descriptor (spaces are
allowed). This value may not be absent.
JVW will allow researchers the option of viewing both ideological and
geographical maps. While no additional material is required on the part
of the researcher to view ideological maps, the option to view geographical
maps will require shapefiles.
Shapefiles are Geographic Information System (GIS) data formats developed
by ESRI. For more information or an introduction to GIS it is recommended
to look at www.esri.com or www.gis.com. Shapefiles, or the map files JVW
will work with, come in three essential file types: shp, dbf, and shx.
Some examples of databases, in addition to esri.com, to get shapefiles
include http://nationalatlas.gov. A simple google of “Europe shapefile”
or “America shapefile” will find usable map files, but even
then they need to be formatted as follows.
First, you will need to open up the dbf file in Excel. This is a screen
shot of an unformatted dbf file:
You want to format the dbf file so it has only 3 columns. The first column
holds the geographic name, and has the header NAME. The second column
may hold any other descriptor for the area, such as country capitol. If
there is no other descriptor you wish to use simply copy the contents
of the first column. The third column has the header VOTE. The numbers
in this column should be unique for each entry. It’s a simple procedure
to start numbering the values from 0 to n where n is the number of rows
of data. This is what a formatted dbf file would look like:
Once you have these three columns, you will want to save your dbf file
with the changes. To do this, you will need to go to Insert->Name->Define.
Select ‘Database’ from the drop down window at the top, then
fill in the row/column parameters at the bottom, or click on the little
square icon on the bottom right of the window, and select all rows and
columns with data.
Say ‘Ok’ and save your changes. When you close Excel, say
yes to any prompts about saving changes and overwriting existing files.
Make sure your dbf, shx, and shp files are all named the same (they usually
are by default). The final step to preparing your map files for use with
JVW is to zip the shp, dbf, and shx files together. In addition, make
sure not to forget to zip all your crr, rcl, inf, and dtl files together.