Years of Social Science Data Services : A Case Study from the University of Wisconsin-Madison

The Data and Information Services Center (DISC), formerly known as the Data and Program Library Services (DPLS) has provided learning, teaching and research support to students, staff and faculty in social sciences at the University of Wisconsin-Madison for 50 years. What changes have our organization, collections, and services experienced? How has DISC evolved with the advancement of technology? What role does DISC play in the current and future landscape of social science data services on our campus and beyond? This paper gives answers to these questions and recommends a few simple steps in adding social science data services in academic libraries. To cite this article: Chou, C.L. (2017). 50 years of social science data services: A case study from the University of Wisconsin-Madison. International Journal of Librarianship, 2(1), 42-52. https://doi.org/10.23974/ijol.2017.vol2.1.23 To submit your article to this journal: Go to http://ojs.calaijol.org/index.php/ijol/about/submissions INTERNATIONAL JOURNAL OF LIBRARIANSHIP, 2(1), 42-52


PROLOGUE
Reading books can prolong your life, according to this Washington Post story, "The best reason for reading?Book lovers live longer, scientists say" (Nutt, 2016).This finding comes from a scholarly publication, "A chapter a day: Association of book reading with longevity".Researchers in the School of Public Health at Yale University used the Health and Retirement Study (HRS) to investigate "whether those who read books have a survival advantage over those who do not read books and over those who read other types of materials, and if so, whether cognition mediates this book reading effect" (Bavishi, Slade, & Levy, 2016, p. 44 ).They found that reading books provided a 23-month survival advantage.This is one example of secondary data analysis in social sciences.Researchers used two questions asked in the HRS to examine the outcome of a leisure activity/reading from different people.Their findings can be applied to policy making and behavior changes.
"Secondary analysis is the process of re-examining existing data to address new questions or use methods not previously employed" as defined in Glossary of Social Science Terms at the Inter-university Consortium for Political and Social Research (ICPSR) website (ICPSR, n.d., #S).Using existing social science datasets to conduct analysis is cheaper and faster than collecting one's own data.The Data and Information Services Center (DISC) provides social science data services to faculty, staff, researchers and students at the University of Wisconsin-Madison for their secondary data analysis work.This article describes DISC's current role in supporting research, teaching, and learning on our campus.It also explains how our services have evolved with the technology over the last five decades.It then presents different levels of social science data services that can be implemented by other libraries.

HISTORY
In the 1960's, the availability of machine-readable social science data began to increase rapidly, and traditional campus libraries did not have the expertise to acquire and manage these data files.Social science faculty members, especially in economics, political science, and sociology, needed a facility for collecting, managing, and preserving social science data.They also wanted the computer analysis and data management programs used with the data to be collected, preserved, and disseminated.The Data and Program Library Services (DPLS) was created as an "experiment" by the University of Wisconsin Graduate School in September of 1966.By the early 70s many of Wisconsin's social science departments were particularly strong in quantitative analysis, and multivariate data analysis constituted an essential component of graduate training and, increasingly, of undergraduate education.In 1974, funding support for DPLS was transferred to the College of Letters and Science (L&S).In January of 2007, DPLS was merged with the data services operations in the Center for Demography and Ecology (CDE) and the Center for Demography of Health and Aging (CDHA), two federal grant supported research centers on campus.The new unit was named the Data and Information Services Center (DISC) and is now part of the Social Science Research Services within the L&S.

COLLECTIONS
DISC is a member of the Inter-university Consortium for Political and Social Research (ICPSR), a major social science data archive and the Roper Center for Public Opinion Research, a leading archive of public opinion datasets.In addition, our collection includes major surveys from other distributors, U.S. Government data, and locally produced archival datasets.Internet Crossroads is a reference database containing more than 800 annotated links to data-related social science resources on the Internet.It includes sites with social science data, social science statistics, organizations that collect social science data, data libraries, social science research methods, and more.These sites are grouped into 26 categories for easy browsing.Country Statistical Yearbooks provides links to country statistical yearbooks or similar collections (census, "facts and figures") for 154 countries worldwide.It was viewed over 15,000 times in the 2015 calendar year and was the second most popular library guide on campus.The DISC Online Data Archive is a subset of datasets from the DISC collection that are made publicly available for direct Internet download.Many of these 40 plus datasets came from research conducted at UW-Madison or cover Wisconsin-related topics.The BADGIR (Better Access to Data for Global Interdisciplinary Research) Catalog is a data discovery and analysis tool powered by the NESSTAR software suite.NESSTAR's extensive metadata allows users to browse and search data documentation down to the variable level.Users can conduct preliminary checks on usability and relevancy of variables using its online analysis tools.

SERVICES Reference and Instruction
The DISC staff is experienced in matching datasets and other data resources with researchers and students who are utilizing data for term papers, course work, dissertations, or research.We discuss with our patrons the scope of their research.We are then able to identify, locate, obtain, and sometimes accession and preserve the most appropriate dataset(s) for their projects.Our librarians regularly provide individual and classroom instruction on where to look for appropriate research data, how to acquire them, how to use them efficiently and how to access the data with existing software and hardware.Some of the academic courses for which we have guest lectured are Econ 580: Honors Tutorial in Research Project Design, PoliSci 544: Introduction to Survey Research, Journalism 658: Communication Research Methods, Sociology 360: Statistics for Sociologists, and UW Libraries Graduate Support Series Workshops.In addition, we have been invited by the School of Information and Library Science to teach SLIS graduate students about the management of machine readable numeric data for social sciences and data librarianship since 2001.DISC News and two weekly research reports are our venues informing patrons about seminars, workshops, new data products and current topics related to data issues.

Restricted Data Services
Advances in PC computing power makes it much easier to link data from multiple different data sources, which presents a real threat of re-identifying business entities or individuals from public use microdata and tabular data (Chou, 2004).To protect the confidentiality of the respondents and reduce disclosure risk, data producers must remove and alter certain aspects of the data in their public use data.Detailed geography, DNA or other personally-identifiable medical data and administrative data from federal agencies such as the Social Security Administration or Medicare are not available in the public-use file.Yet, researchers often need to use richer and more detailed restricted data for their complex statistical analysis.Restricted data is available to researchers only under certain conditions and agreements.In recent years DISC has added restricted data service to facilitate researchers' applications to access restricted data.DISC staff provide assistance in legal contracts, human subjects review by UW-Madison's institutional review boards, and various application requirements for gaining access to restricted data.We work with the Social Science Computing Cooperative (SSCC) to provide facilities for secure data storage and analysis.In the 2015-2016 academic year, DISC has initiated, processed, maintained, and formally closed out restricted data contracts for 19 researchers.

Archival Services
DISC has a long tradition of assisting principal investigators in the production and preservation of public use files that facilitate the dissemination and utilization of their research data by others.We work with researchers to decide on the most useful format for their public-use files.To make their data useful to other researchers, their public use files need to be adequately documented.We ensure the accompanying data documentation covers study design, methodology, coding procedures, variable descriptions and other supplementary information.Archived studies are freely disseminated from DISC Online Data Archive and the BADGIR catalog.

Additional Research Support for CDE and CDHA
Because DISC receives funding from CDE and CDHA, we carry out certain tasks to meet their federal funding requirements1 .For example, center affiliates' scholarly publications are identified and stored in a bibliographical database.DISC is also the custodian of several survey datasets conducted by our affiliates and has provided user support after these projects ended. 2 In the last few years, we have answered over 165 reference questions related to these studies.Most requests were from researchers not affiliated with UW-Madison.Most requests come from researchers who are new to use secondary data and need some guidance.Data Sources for Research in Aging is compiled to aid researchers find cross-sectional studies, time series, contextual data, and other data relevant to their research.Statistical Data Enclave (SDA) provides CDHA and CDE affiliates with secure facilities for their needs in storing, gaining access to restricted data, and analyzing restricted data via remote computing or cold rooms.

CAMPUS PARTNERS
UW-Madison is a large R1 doctoral university.Campus units often coordinate and collaborate when they deliver services to over 43,000 enrolled students in 13 schools and colleges.On this large campus, computing and statistical support is de-centralized by design to achieve autonomy and efficiency.This model works well for DISC as we provide custom data services to patrons who need social science data in their learning, teaching, and research.The College of Letters and Science SSCC offers computing and statistical software resources to the faculty, staff, and students in the Social Science division.It has a classroom lab, a drop-in lab, and a mobile lab.Its professional staff provides statistical consultation, training on statistical computing, PC support, and secure data storage.We regularly refer patrons with statistical and methodological questions to the SSCC statistical consultants.Library services at UW-Madison are provided by over 40 libraries.DISC pays for campus-wide subscriptions to ICPSR and the Roper Center.We work closely with other campus librarians in reference referrals.DISC has a focus on social science quantitative data, while qualitative data support is provided by the School of Education Library.
The National Science Foundation (NSF) mandated that researchers applying for funding must include a data management plan as a part of their grant application starting in January 2011 (National Science Foundation, 2010).John Holdren, Director of the Office of Science and Technology Policy (OSTP) in his memorandum for the heads of executive departments and agencies on February 22, 2013 directs "each Federal agency with over $100 million in annual conduct of research and development expenditures to develop a plan to support increased public access to the results of research funded by the Federal Government."(Office of Science and Technology Policy, 2013, p. 2).Many other federal funding agencies started to require applicants to include a data management plan in their applications.In recent years, research data management has become a vital service area for academic libraries.Research Data Services (RDS) at UW-Madison is an interdisciplinary organization.Its website provides researchers with tools and resources guiding them through the public access requirements on the data and publications funded by grants.RDS consultants help patrons draft and review their data management plans.They offer consultations, training and referral in data work flow, file format, metadata standards, data sharing, digital preservation and curation.DISC's role in RDS is providing research data services in social sciences.

FIVE DECADES OF OPERATION EVOLUTION
As a special library of machine-readable social science data, DISC's operations and services are intimately connected to computing technology.Our data storage, maintenance and processing, administrative record keeping procedure, collection development, reference service, and archival service have evolved with the computing landscape in the last five decades.Reviewing DISC's operation over the years presents a historical perspective on the social science data librarianship.
When DPLS was created in September of 1966, data were distributed and stored on half inch magnetic tapes and UNIVAC mainframe computers were used to read data off the tapes and to conduct data analysis.Significant time and resources were required to obtain and manage numerical files and tapes in mainframe computing environment.In the early 90s, mainframe computers were phased out on campus, and in 1990, we purchased a NeXT Workstation to build a platform to host both data and metadata in-house via a user-friendly interface.Meanwhile, our datasets were migrated from magnetic tapes to 4mm DAT tapes.For almost 20 years, DISC has relied on PC and network drives to manage our collections and deliver our services.Currently, our collection of datasets is also stored in two USB flash drives and stored off-site.In case a disaster strikes our building, our collection can be safely recovered from these external storage devices.

Administrative Record Keeping/Holding Catalog
Numerical datasets have special elements which are quite different from books and journals.From the beginning of our operation, when each dataset was acquired a permanent data record (PDR) was created for administrative record keeping.Advances in technology and bibliographical control rules have shaped our record keeping practice.In 1978, DPLS adopted cataloging rules for Machine Readable Data Files (MRDF) in Anglo-American Cataloging Rules (Gorman & Winkler, 1978) and started to create detailed bibliographic records for our datasets.DISC holdings catalog has been migrated from various computing platforms over the years.A graphic user interface was built on our Holding Data Base (HDB) system and it was added to our website in 1996.HDB became dynamic when an Active Server Page (ASP) application was implemented on the database via web interface in the early 2000s.Descriptive metadata in our holdings catalog makes search and locate studies possible from our library website.It also allows us to effectively conduct data curation and delivery support.

Responsibilities Evolve with Technology
In the era of magnetic tapes, our staff spent approximately 25 to 40 hours each week on tape maintenance and additional time in managing administrative records of data files.When CD-ROMs became the main storage medium and dissemination tool for datasets, we managed various user interface programs on our PCs, wrote users guides and conducted one-on-one instruction to assist patrons to gain access to data on CD-ROMs.As data producers/vendors built online user interfaces on their websites, we worked with the General Library System (GLS) IT department to set up domain authentication to allow campus users to access subscription-based data sources.Our role in locating data sources has been enhanced by search engines and other discovery tools, such as the ICPSR's database of publications and variables and the Roper Center's questionnaire database.
In the previous generation, researchers spent their whole career analyzing data from a handful of studies.As computers become faster and more powerful, researchers now conduct their data analysis across time and geographic boundaries.They often link public-use data to restricted data and administrative records for their research.In the last few years we have worked closely with our patrons by assisting them in restricted data applications.In addition to our traditional role of conducting reference interviews and assisting in locating datasets, we now help patrons with their research data management and data curation as they adopt open data and data sharing practice.Recently, we have started to assist patrons in getting their peer-reviewed journal articles in compliance with public access policies required by federal funding agencies.

Advance of Service Delivery Tools
In the early years, we relied on printed catalogs obtained from various data archives to locate studies for our patrons.We also maintained a collection of methodology books describing survey research, secondary analysis, sample surveys, and more.In 1996, as the Internet grew to become the primary source of information, DPLS started to evaluate, compile and annotate a list of useful websites called Internet Crossroads in Social Science.By 2002, this list became too long to browse and too cumbersome to search.We designed a relational database and an ASP application to allow users to query and browse these websites efficiently.Internet Crossroads has been a valuable tool for our visitors and our staff to locate data sources for research topics.It is often linked by other academic libraries as a valuable reference source for social science data.
DPLS served as the campus repository and reference center for quantitative social science data in its first 35 years.Machine-readable data files were acquired and stored for the social science community on campus.Considerable staff time was spent in developing and managing our local collection of datasets.As the Internet propelled information dissemination, ICPSR, our main source of datasets made their studies available via FTP.We started to weed out our collections stored on magnetic tapes in late 1990s, as it was no longer economical to maintain a local copy of datasets that were readily available from ICPSR and other archives.ICPSR, the Roper Center, International Monetary Fund (IMF), the Organisation for Economic Co-operation and Development (OECD), and the World Bank all made their data downloadable to individual users during the early 2000s.DPLS' function in data acquisition was replaced by securing campuswide IP validation for subscriptions we maintain.The Internet changed our collection development fundamentally.Instead of obtaining and managing local copy of datasets, we now guide patrons to data producers' websites and help them navigate online data extracting tools.To facilitate patrons' access to popular social science data sites, DISC's website now serves as a portal with link shortcuts to various data sources.DPLS was one of the 13 beta-test sites for an ICPSR project, the Data Documentation Initiative (DDI)-Document Type Definition (DTD), to examine metadata in creating structured codebook files in 1999.DDI version 1 was released as a standard to contextually describe social science datasets in the study and data levels.DDI's descriptive metadata allows the contents and structure of datasets to be maneuvered by computers.DDI metadata compliant with XML (Extensible Markup Language) permits the development of new web applications for search and discovery, online data analysis, and data visualization (Rasmussen & Blank, 2007).One of these tools is NESSTAR (Networked Social Science Tools and Resources), developed by the Norwegian Social Science Data Services, UK Data Archive, and the Danish Data Archives (Ryssevik & Musgrave, 2001).
In 2003, we implemented the NESSTAR suite to run our Better Access to Data for Global Interdisciplinary Research (BADGIR) catalog.It has since provided a friendly interface to access social science datasets produced by UW-Madison researchers.Within BADGIR, users can browse and search study description, codebooks, and summary statistics, such as mean, variance, and frequency counts of all the studies.Registered users can select variables, run data analyses, and create subsets.Customized data subsets can be downloaded in Microsoft Excel, SPSS, SAS dBase and Stata formats.Nowadays IPUMS, OCED, United Nations, World Bank, and other social science data sites have all implemented metadata-based interfaces to let visitors navigate, explore, and access their data collections.

SOCIAL SCIENCE DATA SERVICES 101 Professional Development
There are different levels of social science data services in learning, teaching, research, and data curation and preservation, as DISC's story has illustrated.It takes time to appraise the data need on campus and to design a suitable social science data service model.The first and most important step is to have a professional development program for library staff to advance their competency and expand their skill set.Below are some resources for professional development self-study in social science data services.

Service Levels
After professional development, librarians can confidently expand reference services to include social science data services.A reference librarian can conduct in-depth interviews with a patron to understand his/her research topic and connect him/her to the appropriate data sources.Social science data sources can be effectively identified when a librarian is familiar with data discovery tools and data access procedures.To understand the components of a dataset, check out the methodology report and codebook.These two documents provide important information on its research design, data collecting method, its subject content, and its file format.Data citation is a newly required practice in publishing research.Librarians are experienced in helping patrons with citations.They can show their patrons how to cite their data sources appropriately using citation styles like APA and Chicago.
For data analysis support, a library can start with data preparation assistance and statistical analysis referral.A reference librarian can help a patron determine the file format and suggest appropriate software to read a data file.Then a patron can be referred to a computing technical support unit on campus.Statistical analysis requires expertise.Find out where statistical consultation is currently offered on the campus.Network and team up with these units to establish and facilitate analysis service on behalf of your patrons.
Curation service is an advanced level of data service.It requires special knowledge to appraise, manage, organize, preserve, and disseminate datasets.ICPSR's Guide to Social Science Data Preparation and Archiving: Best Practice throughout the Data Life Cycle, 5th edition is a comprehensive guide in data curation.DISC's setup has evolved over many years to meet the unique needs of our institution and it allows us to offer an array of social science data services.Hopefully this case study with historical reflection and practical suggestions can inspire fellow librarians to design their own social science data services to support research, teaching, and learning on their campuses.

EPILOGUE
"…I thank you for all the assistance you gave me so far and any future help I may ask you for.In fact, I am astonished by you being a librarian and taking the role as counsel for data users -something that would not happen in other countries.That is exactly why class analysts usually assign librarians in the US the highest social class whereas in other countries they go for the second highest!On the side note of the PIs being retired, that is also the nice thing with data.If kept alive, it may in future times be gainfully analyzed with a completely different purpose than it was originally collected for.So, thanks a lot for the service and I hope to add to the list of gainful analysis performed with the National Survey of Families and Households (NSFH)." This compliment from a researcher best demonstrates the importance of social science data librarianship.


ICPSR.(2017).Providing social science data services: Strategies for design and operation.Retrieved from https://www.icpsr.umich.edu/icpsrweb/sumprog/courses/0041 Kellam, L. (2011).Numeric data services and sources for the general reference librarian.YouTube.(2017a).ICPSR.Retrieved from https://www.youtube.com/user/icpsrweb YouTube.(2017b).Help!I'm an accidental government information librarian webinars.Retrieved from https://www.youtube.com/channel/UC6CfualeU8N77us06prY10QIt is beneficial to conduct a needs assessment to examine what social science data services are desired from patrons.Eleanor Read shared her method and findings from the Data Services Awareness and Use Survey she conducted at the University of Tennessee (UT) in Fall 2003.Her article gave an informative overview on social science data services and how the UT libraries promote the quantitative data component through outreach activities (Read, 2007).Once service needs are identified, you can solicit and secure funding sources to design your data service model.Communication is crucial in developing new social science data services.Collaborate with campus partners and invite them to design and implement new services identified from your service needs assessment exercise.