CHRR Web Investigator User's Manual

    Open 

    Data Extraction Web Site

      in a new window
    Comments? Email: editor.


    Table of Contents

    1.0  Overview
        1.1  Delivery of Extracted Variables
        1.2  Supplementary Documentation
        1.3  Getting Started
        1.4  The Investigator Window and Its Frames

    2.0  Investigator Menu
        2.1  The Indexes
              2.11  Contextual Word
              2.12  Area of Interest
              2.13  Survey Year
              2.14  Reference Number
              2.15  Question Name
        2.2  The Advanced Search
        2.3  Viewing a Case
        2.4  Working with Tagsets
              2.41  Tagset Name
              2.42  Tagged Variables
              2.43  Tagset Buttons
              2.44  Tagset File Name (scroll down)
        2.5  Area of Interest Index

    3.0  The Index Terms Frame

    4.0  The Individual Variables Frame
        4.1  Descriptive Headers
              4.11  Name
              4.12  Tag
              4.13  Question
              4.14  Description
              4.15  Year
        4.2  Variable Formats

    5.0  The Codebook Window
        5.1  CASEID Variable
        5.2  SEX Variable
        5.3  RACE Variable

    6.0  Process of Variable Identification

    7.0  Setting Up the Extract
        7.1  Review Tagset
        7.2  Save Tagset
        7.3  Extract Tagset
        7.4  Submit Tagset
        7.5  Tagset Delivery
        7.6  Data Files


    1.0 Overview

    The CHRR Investigator is a client/server software program designed and written by the Center for Human Resource Research (CHRR) that is available over the Internet using a client/server architecture. It allows the user (client) to connect to a database of variables (on the server) and to perform extracts of variables in a specific database. Several advantages to this type of architecture are that there is no installation on the client machine, no hard- or firmware is sent to the user, and the user spends no time in setting up the software.

    Several potential drawbacks to this architecture must be evaluated by the user. Among these drawbacks might be the delay time it takes to access the server and return the resulting query to the client's machine or the limitations of the browser used on the client's machine. These drawbacks are 'client-side' shortcomings and must be resolved by the user. The delay time is a function of the speed of the client's access to the Web and its accompanying bandwidth. Browser limitations should not be of concern if the client's browser is Netscape 4.0+ or Internet Explorer 4.0+ or equivalent.

    1.1 Delivery of Variables

    The purpose of this manual is to briefly demonstrate how to use the Investigator software, build a 'tagset' of variables, and perform an extract of data from it. Once the user has designed and created a set of variables, the set must be extracted from the database and the resulting file will be delivered to the user. Delivery of the extracted file to the user will occur via e-mail to the user's email account. The file that is delivered will be a zipped file and the user should have the archive utility, WinZip. A full discussion of the delivery procedure can be found in 7.5 Tagset Delivery.

    1.2 Supplementary Documentation

    Supplementary documentation, including this manual, is available on-line. The user should explore the following documents and determine how each can aid in the formation of an extract. Click on the links below to view the documents and bookmark them as necessary. References and links to these documents will appear throughout the manual. Any questions about the contents of the following documents should be addressed to cc_questions@postoffice.chrr.ohio-state.edu.
    • <User's Guide
    • Glossary of Terms
    • Created Variables
    • Area of Interest Definitions

    Any questions about this manual should be addressed to cc_webmaster@postoffice.chrr.ohio-state.edu

    When you see the arrowhead below, it means that you should perform the action described in that paragraph so that your screen will resemble the images in the manual. Performing these operations will take you through a step-by-step process of familiarizing yourself with certain variables, while at the same time, building a tagset of those common variables that you will probably want to include in all of your extracted tagsets.

    Perform the operation stated here in the Investigator software.

    1.3 Getting Started

    To connect to the Investigator server, you must open the software in your browser window. If you have not already done so,

    Open the Investigator software in your browser window now by left-clicking on the URL below.

    When you access the Data Extraction Web Site at http://www.chrr.ohio-state.edu/web-investigator/ you will be asked to choose the dataset from a list. Once you make that choice, the Investigator software opens a window that looks similar to Figure 1 below.

    Click on Closed Cases in the Investigator window now.

    When you choose a data set the window shown in Figure 1 will open. Depending on the size of your monitor, it may be beneficial to maximize the window. If your Investigator window is not maximized,

    Maximize it now by clicking on the square in the upper right corner of your browser.

    Depending on your browser and the way you have configured it, you should be able to see three frames in the browser window along with the browser's command menu. (If you cannot see the menu commands and three frames, you may need to change your screen resolution. See your system administrator if do not know how to change the resolution or need to configure your browser.)

     

    1.4 The Investigator Window and Its Frames

    When you open Closed Cases in the Investigator software, the first window that you will see is shown in Figure 1 below. The Investigator runs in a browser window and all the tool bars that you see across the top and bottom of the window belong to the browser and control only browser functions. The browser tool bar configuration shown in Figure 1 will differ from your browser's configuration. The commands that you are interested in are the Investigator's list of commands found in the left frame of the Investigator window.


    Go to Top
    Comments? Email: editor.

    2.0 The Investigator Menu

    The window shown in Figure 1 is divided into 3 frames: the left frame contains the Investigator menu commands, the middle frame, Index Terms, will display the terms that are the contents of an index that it is selected, and the right frame, Individual Variables, will display the individual variables associated with a selected index term. Any of the frames may require you to scroll up and down or right and left, depending on the size and contents of the frame. Whenever you change the contents of a frame, look for the standard Windows scroll bars in each frame to verify whether some of the contents may be hidden. Each of the frames is discussed in the sections below. (3.0 The 'Index Terms' Frame and 4.0 The 'Individual Variables' Frame.)  

    2.1 The Indexes

    In the Investigator Menu, the indexes provide multifaceted methods of access to variables collected in a survey. Each index assembles Index Terms that are assigned to the individual variables. Variables may have multiple Index Terms assigned to it. An Index Term is a classification of variables that contains similar criteria to which the variables belong. To open an Index and view its contents in the middle frame, left-click on the index name in the left frame. As you become more familiar with the Investigator you will want to open each index, to view the terms available in it, and to familiarize yourself with the contents of each index.  


    Figure 1 The Initial Investigator Window

    The following indexes are provided by the Investigator.

    • Contextual Word
      This index groups variables by all the words used in a question's text. A 'word' is considered to be any term, number, or symbol used in the question. These may include the '#' sign, the '$' sign, the '%' sign, '<' or '>' , integers, dates, words, etc. Special symbols and numbers sort before alphabetic characters. The number of variables that contain the same contextual word can be found in the Count column in the middle frame. Later in this manual we will learn how and when to use this index to find variables that we need. To view an open index see, Figure 2, middle frame.

    • Area of Interest
      This index groups variables that share a common factor, such as a topic, research use, or source. Variables which have been asked identically over time are stored in organizational units called "areas of interest". Each variable is assigned to an "area of interest" group. In some cases variables have been created from other variables for your convenience. The number of variables assigned to an index term can be found in the Count column in the middle frame. To view an open index see, Figure 2, middle frame. Later in this manual we will be looking more closely at three variables in the Area of Interest index that are of importance to all users.)

    • Survey Year
      This index groups variables by the year in which they were collected or created. The number of variables collect and created in a year can be found in the Count column in the middle frame. To view an open index see, Figure 2, middle frame.

    • Reference Number
      This index groups variables by reference numbers, i.e., unique, numbers machine-assigned to variables in order to locate them across survey years. The number of variables referenced by an index term can be found in the Count column in the middle frame. In order to use this index, you must know the Reference Number you wish to find. The format of reference numbers is 'R' plus 7 numbers or 'Rxxxxxxx'. The two rightmost places in the number may be represented as decimal places ('Rxxxxx.xx') in some of the documentation. (See, 5.0 The Codebook Window.)

      Reference number 'index terms' are groups of references numbers that fall in the 'hundreds' category. Thus, the range of groups begins with the first group, R000, the second R002, to the last grouping, R057. To view an open index see, Figure 2, middle frame.

    • Question Name
      This index groups variables by question name and locates them across survey years. In order to use this index, you must know the Question Name you wish to find.

    The following items are additional commands on the Investigator menu. They will be briefly described here and discussed in detail later in this manual when it is time to use them. You may skip this brief discussion/section and proceed with the "tutorial" (to the indexes and how to use them). Click here to skip these menu commands and open an index.

    2.2 The Advanced Search

    This search provides the user with the capability to perform a Boolean search for variables across multiple indexes. The indexes that may be searched are Contextual Word, Area of Interest, Survey Year, and Question Name. The process should be self-explanatory when you become more familiar with the variables.

    2.3 The View Case

    This feature allows the user to view an individual case from the survey. This feature is used most often to investigate an outlier case or, in other words, one that deviates dramatically from other cases in the extract. To view a case, click in the View Case box and enter a 'sequential case number' to view. Then, make a selection in the Index Terms frame or if you are reviewing a data set and want to view a case of only those variables, click on the Review Case button. In order to view a particular case you must know the caseid and display a complete list of caseids in an editor (one case per line) that displays line numbers. When you find the caseid in the list, the line number is the 'sequential case number' you need to enter in the View Case field and then make your selection. It is a good idea to include the caseid in all your extracts so that you can identify aberrant cases. After viewing an outlying case you may want to exclude it from the extracted data set.

    2.4 Working with Tagsets

    In order to perform statistical function on the data it must first be extracted from the data set. To extract data, a set of variables must be selected and placed into a file called a tagged set. Multiple tagsets may be created by the user. The information given in this section will be discussed later in detail when it is time to perform an extract of data. The "see references" to the figures below, hyperlink to the appropriate detailed sections of this manual. For more detailed information, see Setting Up the Tagsets below.

    • The Tagset Name
      Type the name of the file to be created. Creating the file involves tagging variables then extracting or saving them. See Figure 9. For more detailed information, see Review Tagset below.

    • The Tagged Variables
      Variables that are 'tagged' by placing a check in the check box of the Individual Variables frame are also entered into the Tagged Variables box for easy review. When the selection is complete, use one of three buttons to work on the tagset. For more detailed information, see Review Tagset below.
      • Review Tagset. See Figure 8.
      • Save Tagset. For more detailed information, see Figure 9.
      • Extract Tagged Variables. See Figure 10.

    • Tagset File Name (scroll down)
      Use this entry form to find an already existing Tagset.
      • If you know the tagset name and location, you may type it in and press the Open Tagset button.
      • If you need to find the tagset, you may use a standard Windows dialogue box to browse. Click on the Browse button to open the browse dialogue box. See Figure 9, lower left frame.
     

    2.5 Area of Interest Index

    In the next section, 3.0 The Index Terms Frame, we will look more closely at the Area of Interest and the Contextual Word indexes. We will skip the Survey Year, Reference Number, and Question Name indexes in this manual because they require that you know the year, reference number, or question name of a variable in order to find it in the index. When you know one of these pieces of information, the variable is easy to find in the appropriate index. However, the Contextual Word and Area of Interest indexes are 'textual' approaches to the variables and use the variable's description and 'topical' assignments to provide access to the variables. In the beginning the Area of Interest and Contextual Word indexes may provide better 'discovery' results when looking for a variable, and when a useful variable is found, its 'documentation' in these indexes will include the year, reference number, and question name for it.

    So let's begin, now, with the Area of Interest index.

    Left-click on the Area of Interest index term now to display its contents.

    When you left-click on the Area of Interest index, a list of Index Terms appears in the middle frame. The middle frame consists of two columns (look at the horizontal scroll bar at the bottom of the frame). In the left column are the terms of the index. You have to scroll the frame to the right to view the second column. In the right column are the numbers of like variables in each group. To display the individual variables in a group, you must open the group by left-clicking on it. Its contents will in the rightmost frame.


    Go to Top
    Comments? Email: editor.

    3.0 The Index Terms Frame

    The middle frame is used to display the contents of an index that was selected in the menu (or left) frame. The contents of an index are the terms (groups of variables) associated with that index. In this case the index selected was "Area of Interest" and its contents are show there in Figure 2. Each index term contains a number of variables (Count column) associated with that term. Thus the contents of an index term are the individual variables. To view the contents of an index term, it must be opened, and its contents displayed in the Individual Variables frame. To open a term, left-click on it.

    In the beginning, you will probably use the Area of Interest and Contextual Word indexes the most to find variables. They provide the initial, easiest access to them although this access is abbreviated. Once you have isolated a variable in the Individual Variables frame, you can find out more about it by viewing its Codebook display. (See, 5.0 The Codebook Window.)

    In addition, you may find supplementary information about an Index Term (especially Area of Interest terms) in the User's Guide and other documentation for the cohort you are using.  


    Figure 2 The Contents of an Index

    Click on the Index Term COMMON to view the individual variables associated with that term.

    The contents of COMMON will be displayed in the right frame of the window as shown below in Figure 3. "


    Go to Top
    Comments? Email: editor.

    4.0 The Individual Variables Frame

    The individual variables that are displayed in the right frame when one of the index terms is opened are the 'pieces' that you will identify, extract, and use to perform data analysis. Each piece or datum is really a meta-datum that documents itself. It contains the question asked, the response collected, the response categories if multiple choice, the frequencies of these responses, the previous and next questions in the survey's logic, etc. (See, 5.0 The Codebook Window for more detail.)

    For the Investigator software, the contents of the Name field is derived from the Reference Number field minus the decimal point (.). The Reference Number (e.g., R00001.00) is a unique, longitudinal identifier for every variable in the NLSY79 and is machine-assigned, sequentially. The contents of the Name field (e.g., R0000100) may be changed by you to something more mnemonic in one of the statistical packages, but once it has been changed it may no longer be a unique identifier.

    Each individual variable is described briefly by five general categories found in the Individual Variables window. The original display is sorted in ascending order by variable Name. The categories, or descriptive headers, are explained below.

    4.1 Descriptive Headers

    • Name: A 'reference' number (minus the decimal point) that uniquely identifies a variable across years. The contents of a variable's Name field will be used in SAS, SPSS, and Stata Dictionary packages to identify the variable. The 'reference' number is also an index.
    • Tag: A field that allows you to mark the variable for extract.
    • Question: The name of the question asked, that is, a unique identifier that is assigned to a question by a designer during the design process. It may be mnemonic in nature and contain a reference to the section of the survey in which the question appears. The 'question name' is also an index and locates the variables across years, when applicable.
    • Description: A brief description that summarizes the 'essence' of the essence of the information contained in the Codebook window.
    • Year: The survey Year in which the question was asked. The Year is also an index.
    If you click on one of the column heads, the list of variables will resort in ascending order by that identifier. Sorting the variables can be a useful way of bringing together longitudinal data from various years, or those with characteristics (Description, Question Name) similar to each other. Sorting may also show you which variables in a group were created for you or were responses to questions. This process of sorting the variables can help you identify longitudinal items and the scope of their frequency. It may help you distinguish between questions with like Question Names but with differing Descriptions or it may show a correlation between the Question Names and Descriptions.

    To get a more complete description of a variable, including its frequency, the codebook description of it must be opened in the Codebook window.  


    Figure 3 The Individual Variables

    Before we discuss the Codebook window and its contents, let's look at how to group together the individual variables that resemble one another.

    Left-click on the Description header to sort the variables by Description. Now, scroll down through the list.

    Notice how similar descriptions sort together. When you look at the descriptions that begin with the word 'date' you will notice how similar the descriptions are and that in the question column several question names begin with the word 'SYMBOL!'. This method of sorting is a good way to find similar questions and to discover the question names for similar variables.

    Now, left-click on the Question header to sort the variables by it and scroll down through it.

    Notice the question names that begin with the word 'SYMBOL!' Not all the question names that begin with 'SYMBOL!' are related. There are several groups of these whose relationship to each other can be determined in the description column. Now, look at the questions that begin with 'NUMKID...'. Notice how the questions are related in nature, but different in their descriptions and the data they collect.

    From these sorts you can find out additional and more detailed information about the variables by opening the Codebook window of each.  

    4.2 Variable Formats

    You will encounter two forms of reference to the variables in the Investigator. The first is the format displayed in the Individual Variables frame and the second is the Codebook format. The Individual Variables format uses Name (reference number), Question, Description, Year. The Codebook format uses Reference Number (variable Name), Question, Description, Year. The only difference is how Name and Reference Number information is formatted: the Reference Number is a decimal number while the Name is not.

    Individual Variables Frame
    NameQuestionDescriptionYear
    R0000100CASEIDCASEID(PUBLIC IDENTIFICATION CODE)1999
    R0392700SYMBOL!RESP!GENDERSEX OF R1999
    R0390700Q15-1INT REMARKS - RACE OF R1999
     
    Codebook Format
    Reference No.QuestionDescriptionYear
    R00001.00[CASEID]CASEID (PUBLIC IDENTIFICATION CODE)Survey Year: 1999
    R03927.00[SYMBOL!RESP!GENDER]SEX OF RSurvey Year: 1999
    R03907.00[Q15-1]INT REMARKS - RACE OF RSurvey Year: 1999

    The 'individual variable format' is one step in between the raw data and the statistical packages (SAS, SPSS, Stata Dictionary) that you will use to analyse the data. Individual variables are created after the Codebook. The creation process adds a unique Name to the variable and assigns it to various indexes for better access. The unique Name assigned to the variable is derived from the Reference Number of the variable without the decimal point; thus, Reference Number, R03907.00, becomes the Name, R0390700, the identifier that you will use in the statistical packages. Because Reference Numbers are unique, the Name of the variable is also a unique identifier. (You may change this Name in the statistical packages, but if you do, there is no guarantee that the Name will still be a unique identifier.)

    In the next section, 5.0 The Codebook Window, we will look in detail at three variables, CASEID, SEX, and RACE, from the Area of Interest index. Looking at a variable in the codebook is a good way to become familiar with it. The three variables above are ones that you will probably want to include in any extract you perform. With these in an extract you will be able to identify a particular case (one that may be an outlier, for example) and the common individual characteristics of each respondent.

    Click on the R0000100 CASEID variable now, to open the Codebook window for that variable.


    Go to Top
    Comments? Email: editor.

    5.0 The Codebook Window

    To open an individual variable in the Codebook window you must click on the Name of the variables in the Individual Variables window. When you click on R0000100, the CASEID variable, the Codebook display opens in a new window as displayed in Figure 4. The codebook depicts the receptacle for the raw case datum when the survey is in the field and it is the meta-datum that documents itself. It contains virtually all 'raw' information about a variable such as the variable's identification characteristics: Reference Number, Question Name, Description, and the Year in which it was asked. In addition, the complete question text is spelled out [if applicable], notes about the variable are included, frequencies of the possible responses are tallied even when the respondent refused to answer or didn't know the answer, and the logic of the adjacent questions are listed as 'lead-in' and 'default next' questions. With practice even the genesis and form of the question can be determined by the format of the question text. The Reference Number determines the order of variables in the Codebook. It is unique, 'machine-assigned' sequentially, and indicates a 'chronological' order for the entire survey across many years.

    5.1 CASEID

    The CASEID variable is a public identification code assigned to a particular respondent. It is a way of identifying an individual respondent without compromising the privacy of the respondent. You will notice the typical identifying characteristics of the variable (Name, Question, Description, Year) but you will notice that the Question is enclosed in braces, []. The braces simply set off the question name from the description so that there is no confusion between. The question's text appears below the Survey Year and can be displayed in several forms:

    1. If the entire text is in braces, the question was not asked but assigned (as with the CASEID) or derived from other questions;
    2. If there are no braces, this is the actual text of the question asked; if the question text is in lower case the question was read to the respondent, if in upper case, it was intended for the interviewer alone;
    3. If braces appear somewhere within the question text, they denote a substitution [such as the name of a child] that was used in the question's text;
    4. If you see statements surrounded by slash, star, slash which looks like: /* */ (for example, /* Is the respondent male or female? */), they are used to document and explain the internal check functions (machine questions) used in the survey. These questions are not seen by the interviewer or the respondent.
     


    Figure 4 CASEID Variable: COMMON Area of Interest

    Because CASEID was assigned (not asked), the variable's codebook display is abbreviated and only the 'frequency' (3319) and logic (Lead In and Default Next question) are displayed.

    Now, go back to the Investigator window and in the Individual Variables frame, click on the check box next to the Name R0000100, the CASEID variable.

    Notice the Name, R0000100, appears in the Tagged Variables box in the left frame. This is the first step in building a tagged set. We will continue to do this for the other two variables discussed below.

    Now, find the variable R0392700 the SEX variable and click on it to open it in the Codebook window.

    Note: You can use the Windows convention Ctrl + F to open the Windows 'Find' dialogue box and type in the word 'gender' or 'sex' to quickly find the first occurrence of the word as shown in Figure 5 below. But don't stop with the first occurrence, there may be more so check for multiple occurrences.


    Figure 5 Using the Browser’s Find Command (CTRL + F)

    5.2 SEX

    When you open the codebook window for the SEX variable, you should see the display shown in Figure 6. Notice the question text, "Mom/Dad", is enclosed in braces, [] and there is a statement enclosed in /* Gender of R */.


    Figure 6 SEX of Respondent: COMMON Area of Interest

    Below the notes are the possible responses and their frequencies. The frequencies appear in the left column and the responses (plus their response codes)in the right column. If there were any conditional branches depending on the response, they will appear to the right of the response in braces.

    Below the frequencies/responses are the special responses of Refusal and Don't Know, a TOTAL of all responses, Valid Skips and Non-Interview along with their codes (-4, -5) and frequencies. Finally at the bottom of the codebook display is the default logic of the question chain from Lead In to Default Next questions. (See, the CASEID Figure 4 frequency of 3319 and this frequency of 1852 which results in a discrepancy of 1467, as represented in the Valid Skips field.)

    Now, go back to the Investigator window and, in the Individual Variables frame, click on the check box next to the Name R0392700, the SEX variable.

    Again, the Name R0392700 will appear in the Tagged Variables box in the left frame.

    Next, in the Index Terms frame (middle frame), open the 'Interviewer Remarks' group by left-clicking on it. In Individual Variables frame (right frame) find the RACE variable, R0390700, and click on it to open it in the Codebook window.

    5.3 RACE

    When you open the codebook window for the RACE variable, you should see the display shown in Figure 7. Notice the question text is not enclosed in braces, [] and it is represented in upper case. This means the question was not asked of the respondent, but rather was intended for the interviewer. Also notice that there is a note: "ANSWERED BY IN-PERSON INTERVIEWERS ONLY". This note means that the telephone interviewers did not 'ask' this question and were unable to ascertain the respondent's race. It also accounts for the drop in frequencies to 578 respondents. (See, the CASEID Figure 4 TOTAL frequency of 3319 and this frequency of 578 for a discrepancy of 2741 as represented in the Valid Skips field.)


    Figure 7 Codebook Display of RACE Variable

    Below the question text may appear any notes that could help the researcher. Such notes may include:

    1. To whom was the question asked: respondent, interviewer, machine question, etc.;
    2. How the variable was derived; and/or
    3. The genesis of the variable or question.

    Now, go back to the Investigator window and in the Individual Variables frame, and click on the check box next to the Name R0390700, the RACE variable, to move its name to the Tagged Variables dialogue box.

    Before we discuss building the extract in Chapter 7, a brief summary of the 'process of variable identification' is outlined below in Chapter 6.0. If you are familiar with this process you may skip to 7.0 Setting Up the Extract.


    Go to Top
    Comments? Email: editor.

    6.0 Process of Variable Identification

    In chapters 3.0, 4.0, and 5.0 several processes were described for opening indexes, sorting variables, and opening individual ones in the codebook. As you begin with Closed Cases, you will probably have to spend a certain amount of time finding the variables you want to use. To find the variables, you will have to look at each one in detail to see if it something for your extract. In order to find out about a variable in detail, you may have to attempt a series of steps to get the complete perspective. The Index(es), the Index Terms, the description of the variable in the Individual Variables frame, the codebook, and perhaps the documentation, all contain information about the variable. Below is a summary of the steps you may use in order to view all perspectives of a variable.

    1. In the Investigator's menu (left frame), open the Index of choice by left-clicking on it.
    2. Look at the terms or groupings in the Index Terms frame. These groups contain the variables.
    3. Open a group of Index Terms by left-clicking on it. The individual variables will appear in the Individual Variables frame (right frame).
    4. Look through the individual variables in the list displayed in the Individual Variables frame.
    5. Sort them by Description (click on the hyperlinked Description header) and study the textual description of like sorts.
    6. Sort them by Question (click on the Question header), see how the sorts change, and examine the Question names of like sorts.
    7. Left-click on a variable's Name to view its details in the Codebook window.
    8. Study the codebook display and observe at the bottom if there is more information hyperlinked in "Documentation Links". If so, link to it.
    9. Try the corresponding User's Guide to find out if more information on the scope, genesis, etc. is available. Look first in the Table of Contents for a 'topic' of interest.


    Go to Top
    Comments? Email: editor.

    7.0 Setting Up the Extract

    Once you have discovered the variables you want to use, you will need to create a tagged set of these variables. Let's suppose that you want to create an extract of the three variables discussed above and already tagged by you. Normally, you would first find the proper index for each variable, then click on the 'Tag' field for each variable you want to include in the extract. When you click on the tag field, the variable's name will appear in the Tagged Variables box in the left frame. It's that easy, once you've done the discovery work outlined in chapters 3.0, 4.0, and 5.0 and summarized in chapter 6.0.

    Then you would click on the Review Tagset button in the left frame to display in the right frame only those variables selected by you. See Figure 8 below. After reviewing the variables, if you are satisfied with these selections, you would begin the process of performing the extract outlined in several steps below:

    1. Name the Tagset and Save it (you may always delete it from your PC, if it isn't what you want.)
    2. Extract the Tagset (fill out the pertinent information in the right frame.)
    3. Submit the Tagset (press the Submit Tagset button in the right frame.)
    4. Go to the e-mail account you specified in the e-mail Address you specified and follow the directions which will be a series of point-and-click instructions.
     

    7.1 Review Tagset

    Click on the Review Tagset button in the left frame.

    When you click on the Review Tagset button only those variables you selected will appear in the right frame as shown in Figure 8 below. The tagged variables that appear in the Tagged Variables box must now be named before an extract can be performed on the remote server, because the extracted tagset will eventually be delivered to you via email and must have a name to be distinguished from others on the server.

    At this point if you are not satisfied with your selections, simply click on an index in the left frame or click on an index term in the middle frame and begin selecting more variables for your tagset, or 'de-select' individual variables by 'un-tagging' them in the right frame.  


    Figure 8 Review Tagset Button

    If you are satisfied with your choices,

    Type the name of your tagset in the Tagset Name field in the left frame.

     

    7.2 Save Tagset

    You could now click on the Extract Tagset button, but it is a good idea to save the named tagset before pressing the Extract Tagset button. (You can always delete it from your PC, if it's not what you wanted.) That way, if anything should happen to your Internet connection, your email server, or the Investigator's server, you will have a saved copy of the extract and you can call it up on your local PC by browsing for it and restarting the 'extract' process. If you don't save it, you will have to begin again with the selection and construction processes. The Save Tagset and Browse buttons at the bottom of the left frame call up standard Windows dialogue boxes that you should already be familiar with. The Open Tagset button opens the filename when it appears in the Tagset File Name field.

    Files saved to your PC will be given the extension '.clocas'. Example, yourfilename.clocas.

    So, go ahead, now, and click on the Save Tagset button and follow the Windows prompts.

     

    7.3 Extract Tagset

    Now, with the file named and saved,

    Click on the Extract Tagset button and fill out the information requested in the right frame.

    You must specify a valid email address, otherwise the resulting extract file cannot be delivered to you. You must make the statistical package selections you desire for your output file. The ASCII data file excludes the Stata Dictionary file and vice versa.  


    Figure 9 Extract Tagset Button

     

    7.4 Submit Tagset

    Now, click on the Submit Tagset button in the right frame and follow the prompts that appear.
     


    Figure 10 Submit Tagset Button

    After pressing the Submit Tagset button, you will see a message appear in the right frame that verifies your email address and tells you how to proceed. From here you should access your e-mail account and follow the prompts as outlined below.  

    7.5 Tagset Delivery

    Below are the series of steps you will go through to retrieve the extracted file you specified above. You should now go to your email account, open it, and look for a message sent by "Nobody". Figure 11 displays a message similar to the one you will receive. Follow the directions. Your extract has not yet begun, so you must click on the hyperlink to set it in motion.  


    Figure 11 E-mail Message: Run Extract

    When you click on the hyperlink the extract will begin to run on the server and when it has finished you will receive another e-mail notifying you of its completion and how to retrieve the results. Depending on the extract file size, the delay between the emails will vary.  


    Figure 12 E-mail Message: Retrieve Results

    When you click on the above hyperlink a standard Windows dialogue box shown in Figure 13 below will open and ask you to specify a location on your PC where the extracted file should be saved. The file will be a zip file and depending on the statistical package you chose will contain several data files for use with your statistical package.  


    Figure 13 Save Results to Client PC

    Once you have saved the zipped file to your PC, you will need an archive utility, for example, WinZip, to extract and view the data files.  

    7.6 Data Files

    Depending on what packages you chose for the output files of your extract, the two figures below, Figures 13 and 14, show the contents of the zipped container file. You will have to unzip them before you can view the contents of the files. The files are plain text files and should be viewed in Notepad.exe, WordPad.exe, or similar plain text editors. Wordpad or similar text editor file is recommended because it can accommodate large text files.  


    Figure 13 ASCII Output for SAS and SPSS Statistical Packages  


    Figure 14 Stata Dictionary Output Files


    Go to Top
    Comments? Email: editor.