Tuesday 14 March 2023

  • Late late post on Data Science - Online surveys and data analytics in R

    29th October 2020 - Raviteja Gullapalli



    By now, pandemic has creeped into our lives and made it a new normal. I hope you are all safe and sound, where ever you are. My last post was a brief overview about my journey towards Data Science. I've covered guiding topics on the surface level for a non-programmer to start their journey towards Data Science. Specifically for people like me with diverse interests and a non-coding background. In this post I'll try addressing to the same crowd from a managerial and a business point of view.  

    Data Science is a never ending journey in terms of learning. Over the past few months I've deep dived into solving data riddles in surveys using statistics in R. The statistics course st095 did help me in understanding basic elements in statistical modelling and hypothesis testing. So, in this post I'll try explaining a simple method that can be applied with minimum coding to understand effect of certain parameters on a certain behavior from customer surveys.

    Let me start with examples. Before that, why surveys?

    Surveys are an un-biased way of collecting inputs from customers that help businesses towards decision making. From here on, I'm calling us all customers for a hypothetical ice-cream company called 'real-ice'. On an average, we must have filled at-least 10-20 surveys every year for such businesses knowingly or unknowingly to grow. There are days of brainstorming and a lot of minds at work for the 15-30 second survey you add your opinion about. What we'll be talking about is how this survey data can be analyzed further using statistical methods.

    Before jumping directly intpo the code, let's look into some steps to follow for analyzing online survey data in R:

    Import the data: You need to import the survey data into R using the read.csv() function, which allows you to read in data from a CSV file or a spreadsheet.

    Data cleaning: Before starting the analysis, it is essential to clean and prepare the data. This involves checking for missing values, duplicates, and outliers, and transforming the data if necessary.

    Descriptive analysis: Descriptive statistics can provide an overview of the data, including measures such as mean, median, mode, standard deviation, and frequency distributions. These can be generated using the summary() and table() functions in R.

    Inferential analysis: Inferential statistics can help to draw conclusions and make predictions about the survey population. Techniques such as hypothesis testing, regression analysis, and ANOVA can be performed in R using built-in functions or third-party packages.

    Data visualization: Data visualization is a crucial step in any analysis, and R offers a wide range of tools for creating charts, graphs, and plots. This helps to explore the data, identify patterns and relationships, and communicate findings to stakeholders.

    Here are the R-Code libraries that can be used for your statistical model on survey data.

    library(ggplot2)        # plotting & data
    library(dplyr)          # data manipulation
    library(tidyr)          # data re-shaping
    library(magrittr)       # pipe operator
    library(gridExtra)      # provides side-by-side plotting

    getwd()
    setwd("Link to your survey data here")
    get_data <- read.csv("Survey.CSV")

    #ttest for row named ttest_1

    res <- t.test(get_data$data ~ get_data$ttest_1, paired = TRUE, alternative = "two.sided")
    res

    est <- lm(get_data$data ~ get_data$lm_1, data=get_data)  # build linear regression model on full data
    #print(linearMod)
    mean(est$residuals)
    summary(est)

    #moderator

    est_mod <- lm(formula = get_data$data ~ get_data$lm_local + get_data$moderator + get_data$lm_local * get_data$moderator)
    est_mod
    summary(est_mod)

    #anova

    res.aov <- aov(get_data$data ~ get_data$question, data = get_data)
    res.aov
    summary(res.aov)

    #reliablity test
    scale1 <- data.frame(get_data)
    alpha(scale1)


    Overall, R provides a flexible and powerful platform for analyzing online survey data using statistical methods. By leveraging the capabilities of R, researchers and analysts can gain deeper insights into their survey results and make data-driven decisions.


  • 0 comments:

    Post a Comment

    Hey, you can share your views here!!!

    Have something for me?

    Let us have a chat, schedule a 30 min meeting with me. I am looking forward to hear from you.

    * indicates required
    / ( mm / dd )