dcsimg
 

Polling and Survey Data Transparency

Monday Jul 1st 2019 by David Leininger
Polling and Survey Data Transparency

Polls and surveys are increasing in use. If you are a DBA, developer, or data scientist, then it is good to understand how to structure your systems for transparency and proper use.

Polls and surveys are used to obtain or confirm opinions. Polling and survey databases provide order for questions, replies, and analysis. Polls and surveys are conducted by news, educational, and commercial organizations such as The Wall Street Journal, CNN, Fox News, Marist College, Quinnipiac University, Pew Research Center, and Hart Research.

How Does a Poll Differ from a Survey?

Polls and surveys are not exactly the same. Polls ask a single, simple question. There are usually multiple predefined answers. Polls are a subset of surveys. Surveys generally seek information through a wide range of questions and statements, including binary, multiple choice and freeform text responses.

Polls are generally conducted via telephone and/or face-to-face interviews. Surveys are often conducted via questionnaires through mail and on-line applications.

Polls and surveys might be completed anonymously. Demographics appropriate for the poll or survey are the basis for selecting participants from data sources.

Topics for Polls and Surveys

Organizations seek information on a variety of topics. They conduct their own research or engage a firm to conduct polling and surveys. Topics with an archive of results from which to draw design and results vary by industry include:

  • Brand
  • Climate change
  • Cost of living
  • Crime
  • Economy
  • Education
  • Employment
  • Energy
  • Global activity
  • Healthcare
  • Immigration
  • Morality
  • Product
  • Quality of life
  • Service
  • Terrorism
  • Wages
  • War

It's a best practice to have reusable libraries of code, data models, and content. For a survey database, the reusable content would contain questions and statements for polls and surveys. Configurable attributes might include topic, priority, weight, and more.

Surveys and polls have user experiences for the respondents, authors, analysts, and developers. Each of these audiences expects to interact with a survey for different reasons. Developers see surveys as code and data. Authors create the content for a survey, including questions and statements that might appear in multiple surveys. The metadata for these questions is used by analysts to examine topics, adjust weights, and change priorities within a survey.

Polls and surveys are used to extrapolate positions and trends. A few hundred participants can represent the position of millions of people with a margin of sampling errors of a few percentage points. Poll results rarely differ with tens of thousands of respondents instead of a few thousand. Accuracy depends on participant sources, anonymity, language, technology, screening, and confirming data.

Data Modeling

Data modeling to implement a poll or survey system results in conceptual, logical, and physical representations. The conceptual data model is formed from discussions with people that will interact with the system. The people include system administrators, survey authors, analysts, and respondents. A conceptual data model from such discussions can result in a conceptual model with requirements for Survey Administration, Account Administration, Questions, and Analysis. Further examination of these data concepts follows.

Polls and Surveys

Survey Administration

Individual statements during requirements gathering sessions provide demographics for a given survey. Age is always considered. Address, vocation, family status, and specific activities related to the poll topic are identified. Rinse and repeat for the next poll or survey topic. Over time, many demographic elements are defined. In later data modeling, the demographics are several entities and attributes. The standard operating procedure for addresses, for instance, includes separating the number from direction, roadway name, and roadway type. So, "764 E. Walnut St." is stored in four fields in an Address table. Should "E." best stored as stated, or transformed to "East?"

Attributes might be non-textual, such as image or video or audio files. These files each have a range of associated metadata such as date/time created, size in bytes, resolution, location, recognized instruments, and known colors. The decision on the modeling and implementation of physical models in a technology or product or product version are too early for conceptual modeling.

Surveys carry administrative information. Eventually, storage of the survey name, sponsor, question collection, priority, and dependencies on other activities will be expected.

Dates of survey creation, initiation, first used, last used, and ranges of dates are valuable to be available to analysts after polls and surveys are completed. How often do you see poll results for a poll taken recently, over a few days?

Methodology

Each of the organizations mentioned earlier follow a specific methodology for each poll or survey conducted. The methodology becomes a "pick list" for developers and analysts, as seen in this sample:

  • Each respondent was asked to rate a total of 40 randomly selected brands
  • Each brand received approximately 1,000 ratings
  • Data was weighted to be representative of the entire U.S. population of consumers
    • ages 15 and over
    • based on age by sex
      • education
      • race/ethnicity
      • region
      • income
      • data from respondents ages 18 and over weighted for their online presence

Account Administration

Account Administration

Surveys are generally conducted on behalf of a sponsor. Details of the organization, industry, size, logos, images, brands, and colors must be maintained. The interactions with the sponsor form an account management process. The people and their roles within the process are defined in this data concept. Note that conceptual data management allows the downstream technology to be defined later. Modern external systems always have a way to communicate and provide data in packets via API calls. The need for extracting and transforming millions of rows hourly or daily are disappearing.

Questions

Questions

Polls and surveys might be conducted in multiple human languages. Questions that are asked in multiple surveys are seen as collections. Collections of questions are useful for meeting time and governance requirements. Formats determine the handling of binary, multiple choice, freeform text, and non-textual replies.

Survey

Analysis

Aggregating data across surveys is invalid, generally, due to varying methodologies. A requirement on weighting of replies within a survey is seen in the analysis stage. Reports and visualizations are popular outputs of data elements. The ability to run a SQL statement against a few tables with proper levels of joins is a developer task. The physical data likely does not exist in a single technology or network.

Logical and Physical Data Models

Logical data modeling for a domain as broad as polls and surveys has been refined for decades by the organizations named herein. New delivery and analysis technologies cause reviews. Customer needs for including modern elements of audio, video, images, and social media streams are implemented in physical models. Underlying programming concepts bring data values from disparate systems to the analysis layer.

Data Transparency

Who has your data? Is it your data when aggregated without identifying data? This is at the heart of the General Data Protection Regulation (GDPR), a regulation to strengthen and unify data protection for individuals within the European Union. Will it become a standard elsewhere?

Identification, export, and deletion of personal data is a task to be implemented in any survey containing personal information. Social Security numbers, financial accounts, addresses, birthdates, and interaction dates must be stored in a manner known to the participants whose data is being handled. Distribution of raw data should be noted in the methodology and account management processes.

Next Steps

Polls and surveys are increasing in use. The trust between a customer and supplier of a service or product enable the supplier to ask the customer for feedback. There is a whole range of studies on the likelihood of a customer to click a link in an email or text message to provide feedback though an online survey. The surveys obtain or confirm opinions. Polling and survey databases provide order for questions, replies, and analysis. Build a conceptual model with input from participants, and then test the model by building a logical model to manage the objects.

About the Author

Dave Leininger has been a Data Consultant for 30 years. In that time, he has discussed data issues with managers and executives in hundreds of corporations and consulting companies in 20 countries. Mr. Leininger has shared his insights on data warehouse, data conversion, and knowledge management projects with multi-national banks, government agencies, educational institutions and large manufacturing companies. Reach him at dave@box160.com.

Home
Mobile Site | Full Site