CHRR Web Investigator User's Manual
Table of Contents
1.0 OverviewThe CHRR Investigator is a client/server software program designed and written by the Center for Human Resource Research (CHRR) that is available over the Internet using a client/server architecture. It allows the user (client) to connect to a database of variables (on the server) and to perform extracts of variables in a specific database. Several advantages to this type of architecture are that there is no installation on the client machine, no hard- or firmware is sent to the user, and the user spends no time in setting up the software. Several potential drawbacks to this architecture must be evaluated by the user. Among these drawbacks might be the delay time it takes to access the server and return the resulting query to the client's machine or the limitations of the browser used on the client's machine. These drawbacks are 'client-side' shortcomings and must be resolved by the user. The delay time is a function of the speed of the client's access to the Web and its accompanying bandwidth. Browser limitations should not be of concern if the client's browser is Netscape 4.0+ or Internet Explorer 4.0+ or equivalent.
Any questions about this manual should be addressed to cc_webmaster@postoffice.chrr.ohio-state.edu When you see the arrowhead below, it means that you should perform the action described in that paragraph so that your screen will resemble the images in the manual. Performing these operations will take you through a step-by-step process of familiarizing yourself with certain variables, while at the same time, building a tagset of those common variables that you will probably want to include in all of your extracted tagsets.
1.3 Getting StartedTo connect to the Investigator server, you must open the software in your browser window. If you have not already done so,
When you access the Data Extraction Web Site at http://www.chrr.ohio-state.edu/web-investigator/ you will be asked to choose the dataset from a list. Once you make that choice, the Investigator software opens a window that looks similar to Figure 1 below.
When you choose a data set the window shown in Figure 1 will open. Depending on the size of your monitor, it may be beneficial to maximize the window. If your Investigator window is not maximized,
Depending on your browser and the way you have configured it, you should be able to see three frames in the browser window along with the browser's command menu. (If you cannot see the menu commands and three frames, you may need to change your screen resolution. See your system administrator if do not know how to change the resolution or need to configure your browser.) 1.4 The Investigator Window and Its FramesWhen you open Closed Cases in the Investigator software, the first window that you will see is shown in Figure 1 below. The Investigator runs in a browser window and all the tool bars that you see across the top and bottom of the window belong to the browser and control only browser functions. The browser tool bar configuration shown in Figure 1 will differ from your browser's configuration. The commands that you are interested in are the Investigator's list of commands found in the left frame of the Investigator window.
2.0 The Investigator Menu
The window shown in Figure 1 is divided into 3 frames: the left frame contains the Investigator menu commands, the middle frame, Index Terms, will display the terms that are the contents of an index that it is selected, and the right frame, Individual Variables, will display the individual variables associated with a selected index term. Any of the frames may require you to scroll up and down or right and left, depending on the size and contents of the frame. Whenever you change the contents of a frame, look for the standard Windows scroll bars in each frame to verify whether some of the contents may be hidden. Each of the frames is discussed in the sections below. (3.0 The 'Index Terms' Frame and 4.0 The 'Individual Variables' Frame.)
The following indexes are provided by the Investigator.
The following items are additional commands on the Investigator menu. They will be briefly described here and discussed in detail later in this manual when it is time to use them. You may skip this brief discussion/section and proceed with the "tutorial" (to the indexes and how to use them). Click here to skip these menu commands and open an index.
2.5 Area of Interest IndexIn the next section, 3.0 The Index Terms Frame, we will look more closely at the Area of Interest and the Contextual Word indexes. We will skip the Survey Year, Reference Number, and Question Name indexes in this manual because they require that you know the year, reference number, or question name of a variable in order to find it in the index. When you know one of these pieces of information, the variable is easy to find in the appropriate index. However, the Contextual Word and Area of Interest indexes are 'textual' approaches to the variables and use the variable's description and 'topical' assignments to provide access to the variables. In the beginning the Area of Interest and Contextual Word indexes may provide better 'discovery' results when looking for a variable, and when a useful variable is found, its 'documentation' in these indexes will include the year, reference number, and question name for it.So let's begin, now, with the Area of Interest index.
When you left-click on the Area of Interest index, a list of Index Terms appears in the middle frame. The middle frame consists of two columns (look at the horizontal scroll bar at the bottom of the frame). In the left column are the terms of the index. You have to scroll the frame to the right to view the second column. In the right column are the numbers of like variables in each group. To display the individual variables in a group, you must open the group by left-clicking on it. Its contents will in the rightmost frame.
3.0 The Index Terms FrameThe middle frame is used to display the contents of an index that was selected in the menu (or left) frame. The contents of an index are the terms (groups of variables) associated with that index. In this case the index selected was "Area of Interest" and its contents are show there in Figure 2. Each index term contains a number of variables (Count column) associated with that term. Thus the contents of an index term are the individual variables. To view the contents of an index term, it must be opened, and its contents displayed in the Individual Variables frame. To open a term, left-click on it.In the beginning, you will probably use the Area of Interest and Contextual Word indexes the most to find variables. They provide the initial, easiest access to them although this access is abbreviated. Once you have isolated a variable in the Individual Variables frame, you can find out more about it by viewing its Codebook display. (See, 5.0 The Codebook Window.) In addition, you may find supplementary information about an Index Term (especially Area of Interest terms) in the User's Guide and other documentation for the cohort you are using.
The contents of COMMON will be displayed in the right frame of the window as shown below in Figure 3. "
4.0 The Individual Variables FrameThe individual variables that are displayed in the right frame when one of the index terms is opened are the 'pieces' that you will identify, extract, and use to perform data analysis. Each piece or datum is really a meta-datum that documents itself. It contains the question asked, the response collected, the response categories if multiple choice, the frequencies of these responses, the previous and next questions in the survey's logic, etc. (See, 5.0 The Codebook Window for more detail.)For the Investigator software, the contents of the Name field is derived from the Reference Number field minus the decimal point (.). The Reference Number (e.g., R00001.00) is a unique, longitudinal identifier for every variable in the NLSY79 and is machine-assigned, sequentially. The contents of the Name field (e.g., R0000100) may be changed by you to something more mnemonic in one of the statistical packages, but once it has been changed it may no longer be a unique identifier. Each individual variable is described briefly by five general categories found in the Individual Variables window. The original display is sorted in ascending order by variable Name. The categories, or descriptive headers, are explained below.
To get a more complete description of a variable, including its frequency, the codebook description of it must be opened in the Codebook window.
Before we discuss the Codebook window and its contents, let's look at how to group together the individual variables that resemble one another.
Notice how similar descriptions sort together. When you look at the descriptions that begin with the word 'date' you will notice how similar the descriptions are and that in the question column several question names begin with the word 'SYMBOL!'. This method of sorting is a good way to find similar questions and to discover the question names for similar variables.
Notice the question names that begin with the word 'SYMBOL!' Not all the question names that begin with 'SYMBOL!' are related. There are several groups of these whose relationship to each other can be determined in the description column. Now, look at the questions that begin with 'NUMKID...'. Notice how the questions are related in nature, but different in their descriptions and the data they collect. From these sorts you can find out additional and more detailed information about the variables by opening the Codebook window of each. 4.2 Variable FormatsYou will encounter two forms of reference to the variables in the Investigator. The first is the format displayed in the Individual Variables frame and the second is the Codebook format. The Individual Variables format uses Name (reference number), Question, Description, Year. The Codebook format uses Reference Number (variable Name), Question, Description, Year. The only difference is how Name and Reference Number information is formatted: the Reference Number is a decimal number while the Name is not.
The 'individual variable format' is one step in between the raw data and the statistical packages (SAS, SPSS, Stata Dictionary) that you will use to analyse the data. Individual variables are created after the Codebook. The creation process adds a unique Name to the variable and assigns it to various indexes for better access. The unique Name assigned to the variable is derived from the Reference Number of the variable without the decimal point; thus, Reference Number, R03907.00, becomes the Name, R0390700, the identifier that you will use in the statistical packages. Because Reference Numbers are unique, the Name of the variable is also a unique identifier. (You may change this Name in the statistical packages, but if you do, there is no guarantee that the Name will still be a unique identifier.) In the next section, 5.0 The Codebook Window, we will look in detail at three variables, CASEID, SEX, and RACE, from the Area of Interest index. Looking at a variable in the codebook is a good way to become familiar with it. The three variables above are ones that you will probably want to include in any extract you perform. With these in an extract you will be able to identify a particular case (one that may be an outlier, for example) and the common individual characteristics of each respondent.
5.0 The Codebook Window
To open an individual variable in the Codebook window you must click on the Name of the variables in the Individual Variables window. When you click on R0000100, the CASEID variable, the Codebook display opens in a new window as displayed in Figure 4. The codebook depicts the receptacle for the raw case datum when the survey is in the field and it is the meta-datum that documents itself. It contains virtually all 'raw' information about a variable such as the variable's identification characteristics: Reference Number, Question Name, Description, and the Year in which it was asked. In addition, the complete question text is spelled out [if applicable], notes about the variable are included, frequencies of the possible responses are tallied even when the respondent refused to answer or didn't know the answer, and the logic of the adjacent questions are listed as 'lead-in' and 'default next' questions. With practice even the genesis and form of the question can be determined by the format of the question text. The Reference Number determines the order of variables in the Codebook. It is unique, 'machine-assigned' sequentially, and indicates a 'chronological' order for the entire survey across many years.
The CASEID variable is a public identification code assigned to a particular respondent. It is a way of identifying an individual respondent without compromising the privacy of the respondent. You will notice the typical identifying characteristics of the variable (Name, Question, Description, Year) but you will notice that the Question is enclosed in braces, []. The braces simply set off the question name from the description so that there is no confusion between. The question's text appears below the Survey Year and can be displayed in several forms:
Because CASEID was assigned (not asked), the variable's codebook display is abbreviated and only the 'frequency' (3319) and logic (Lead In and Default Next question) are displayed.
Notice the Name, R0000100, appears in the Tagged Variables box in the left frame. This is the first step in building a tagged set. We will continue to do this for the other two variables discussed below.
Note: You can use the Windows convention Ctrl + F to open the Windows 'Find' dialogue box and type in the word 'gender' or 'sex' to quickly find the first occurrence of the word as shown in Figure 5 below. But don't stop with the first occurrence, there may be more so check for multiple occurrences.
When you open the codebook window for the SEX variable, you should see the display shown in Figure 6. Notice the question text, "Mom/Dad", is enclosed in braces, [] and there is a statement enclosed in /* Gender of R */.
Below the notes are the possible responses and their frequencies. The frequencies appear in the left column and the responses (plus their response codes)in the right column. If there were any conditional branches depending on the response, they will appear to the right of the response in braces. Below the frequencies/responses are the special responses of Refusal and Don't Know, a TOTAL of all responses, Valid Skips and Non-Interview along with their codes (-4, -5) and frequencies. Finally at the bottom of the codebook display is the default logic of the question chain from Lead In to Default Next questions. (See, the CASEID Figure 4 frequency of 3319 and this frequency of 1852 which results in a discrepancy of 1467, as represented in the Valid Skips field.)
Again, the Name R0392700 will appear in the Tagged Variables box in the left frame.
When you open the codebook window for the RACE variable, you should see the display shown in Figure 7. Notice the question text is not enclosed in braces, [] and it is represented in upper case. This means the question was not asked of the respondent, but rather was intended for the interviewer. Also notice that there is a note: "ANSWERED BY IN-PERSON INTERVIEWERS ONLY". This note means that the telephone interviewers did not 'ask' this question and were unable to ascertain the respondent's race. It also accounts for the drop in frequencies to 578 respondents. (See, the CASEID Figure 4 TOTAL frequency of 3319 and this frequency of 578 for a discrepancy of 2741 as represented in the Valid Skips field.)
Below the question text may appear any notes that could help the researcher. Such notes may include:
Before we discuss building the extract in Chapter 7, a brief summary of the 'process of variable identification' is outlined below in Chapter 6.0. If you are familiar with this process you may skip to 7.0 Setting Up the Extract.
6.0 Process of Variable IdentificationIn chapters 3.0, 4.0, and 5.0 several processes were described for opening indexes, sorting variables, and opening individual ones in the codebook. As you begin with Closed Cases, you will probably have to spend a certain amount of time finding the variables you want to use. To find the variables, you will have to look at each one in detail to see if it something for your extract. In order to find out about a variable in detail, you may have to attempt a series of steps to get the complete perspective. The Index(es), the Index Terms, the description of the variable in the Individual Variables frame, the codebook, and perhaps the documentation, all contain information about the variable. Below is a summary of the steps you may use in order to view all perspectives of a variable.
7.0 Setting Up the ExtractOnce you have discovered the variables you want to use, you will need to create a tagged set of these variables. Let's suppose that you want to create an extract of the three variables discussed above and already tagged by you. Normally, you would first find the proper index for each variable, then click on the 'Tag' field for each variable you want to include in the extract. When you click on the tag field, the variable's name will appear in the Tagged Variables box in the left frame. It's that easy, once you've done the discovery work outlined in chapters 3.0, 4.0, and 5.0 and summarized in chapter 6.0. Then you would click on the Review Tagset button in the left frame to display in the right frame only those variables selected by you. See Figure 8 below. After reviewing the variables, if you are satisfied with these selections, you would begin the process of performing the extract outlined in several steps below:
7.1 Review Tagset
When you click on the Review Tagset button only those variables you selected will appear in the right frame as shown in Figure 8 below. The tagged variables that appear in the Tagged Variables box must now be named before an extract can be performed on the remote server, because the extracted tagset will eventually be delivered to you via email and must have a name to be distinguished from others on the server. At this point if you are not satisfied with your selections, simply click on an index in the left frame or click on an index term in the middle frame and begin selecting more variables for your tagset, or 'de-select' individual variables by 'un-tagging' them in the right frame.
If you are satisfied with your choices,
You could now click on the Extract Tagset button, but it is a good idea to save the named tagset before pressing the Extract Tagset button. (You can always delete it from your PC, if it's not what you wanted.) That way, if anything should happen to your Internet connection, your email server, or the Investigator's server, you will have a saved copy of the extract and you can call it up on your local PC by browsing for it and restarting the 'extract' process. If you don't save it, you will have to begin again with the selection and construction processes. The Save Tagset and Browse buttons at the bottom of the left frame call up standard Windows dialogue boxes that you should already be familiar with. The Open Tagset button opens the filename when it appears in the Tagset File Name field. Files saved to your PC will be given the extension '.clocas'. Example, yourfilename.clocas.
Now, with the file named and saved,
You must specify a valid email address, otherwise the resulting extract file cannot be delivered to you. You must make the statistical package selections you desire for your output file. The ASCII data file excludes the Stata Dictionary file and vice versa.
After pressing the Submit Tagset button, you will see a message appear in the right frame that verifies your email address and tells you how to proceed. From here you should access your e-mail account and follow the prompts as outlined below.
Below are the series of steps you will go through to retrieve the extracted file you specified above. You should now go to your email account, open it, and look for a message sent by "Nobody". Figure 11 displays a message similar to the one you will receive. Follow the directions. Your extract has not yet begun, so you must click on the hyperlink to set it in motion.
When you click on the hyperlink the extract will begin to run on the server and when it has finished you will receive another e-mail notifying you of its completion and how to retrieve the results. Depending on the extract file size, the delay between the emails will vary.
When you click on the above hyperlink a standard Windows dialogue box shown in Figure 13 below will open and ask you to specify a location on your PC where the extracted file should be saved. The file will be a zip file and depending on the statistical package you chose will contain several data files for use with your statistical package.
Once you have saved the zipped file to your PC, you will need an archive utility, for example, WinZip, to extract and view the data files.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||