MA12009 Exploratory Data and Statistical Analysis of Airbnb and Spotify Datasets

University University of Bath (UOB)
Subject MA12009 Data Analysis for Business

Exploratory Data Analysis [20/50]

There are two workbooks available on Moodle:

  1. “bristol_airbnb_listings.xlsx”
    • These data pertain to Airbnb listings in Bristol from the last few months of 2024 (last recorded on 25/12/2024).
    • The Excel file contains two worksheets:
      1. “Data description”: This worksheet includes the legend and description of the variables collected.
      2. “Bristol Airbnb Listings”: This worksheet contains the raw data (approximately 2,700 listings and 51 variables).
  2. “spotify_songs.xlsx”
    • These data relate to the technical and artist descriptions of tracks stored in Spotify from various playlists.
    • The Excel file contains two worksheets:
      1. “Data description”: This worksheet includes the legend and description of the variables collected.
      2. “Spotify Songs”: This worksheet contains the raw data (approximately 30,000 tracks and 23 variables).

Choose only one of the two datasets and perform exploratory data analysis on the data of your choice. Your exploratory data analysis should include at least the following:

  1. Graphical, tabular and / or numerical summaries for the core variables of interest that you find relevant. Discuss your findings.
  2. Explore possible associations, either by graphical, tabular and / or numerical summaries, between the core variables of interest that you have found relevant. Discuss your findings.
  3. Report a general section for final conclusions and possible point of discussions for improving the analysis in the future. For example, are there any other variables that might be collected? Are there any weaker aspects in the analysis that might be strengthened in the future?

Do You Need Assignment of This Question

Probability and Statistical Inference Analysis [30/50]

  1. Let 𝑋 denote the random variable “number_of_reviews” in a year for a randomly selected property from the “bristol_airbnb_listings.xlsx” file. [5/30]
    1. What is the distribution of 𝑋? (I.e., choose between Binomial, Poisson, Uniform, Exponential, Normal, no-one of the above.) Explain the reasons of your choice.
    2. Suppose 𝑋 follows a Poisson distribution with mean πœ‡ = 52 reviews in a year.
      1. What is the probability that a randomly selected property has more than 60 reviews?
      2. What is the probability that a randomly selected property has at most 50 reviews?
      3. What is the probability that a randomly selected property will have a number of reviews between 40 and 80?
      4. Compare the probabilities in (i), (ii) and (iii) with the empirical probabilities (i.e., relative frequencies) of “number_of_reviews”. Are they comparable or are they different? Explain any differences you might see.
  2. Let 𝑋 denote the random variable that count the number of super hosts out of the 2709 listings in the “bristol_airbnb_listings.xlsx” file. [5/30]
    1. What is the distribution of 𝑋? (I.e., choose between Binomial, Poisson, Uniform, Exponential, Normal, no-one of the above.) Explain the reasons of your choice.
    2. Suppose 𝑋 follows a Binomial distribution with probability 𝑝 calculated from the proportion of super hosts you have in the data set.
      1. What is the probability that there are more than 350 super-host?
      2. What is the probability that there are less than 500 a super-host?
      3. What is the average number of super hosts? What is the standard deviation of the number of super hosts?
  3. Let 𝑋 denote the random variable “price” that describes the daily price of a randomly selected property in the “bristol_airbnb_listings.xlsx” file. [10/30]
    1. What is the distribution of 𝑋? (I.e., choose between Binomial, Poisson, Uniform, Exponential, Normal, no-one of the above.) Explain the reasons of your choice.
    2. What is the distribution of 𝑋 (the average daily price of a property)? (I.e., choose between Binomial, Poisson, Uniform, Exponential, Normal, no-one of the above.) Explain the reasons of your choice.
    3. Suppose the data in “price” in “bristol_airbnb_listings.xlsx” is the population data. What are the values of the population mean and the population standard deviation for 𝑋?
    4. Let 𝑋 be normally distributed with population mean πœ‡ and standard deviation 𝜎/βˆšπ‘› (i.e., standard error), where the values of πœ‡ and 𝜎/βˆšπ‘› are defined as in (c).
      1. What is the probability that the average daily prices are above Β£250?
      2. What is the probability that the average daily prices are between Β£150 and Β£276?
      3. What is the minimum average price for the top 1% most expensive properties listed on Airbnb in Bristol?
      4. What is the maximum average price for the top 1% less expensive properties listed on Airbnb in Bristol?
  4. Now work on the “spotify_songs.xlsx”. [10/30]
    1. Generate a simple random sample of size 𝑛 = 25 of the “danceability” of Spotify’s pop songs.
    2. Calculate a 90%, 95% and 99% confidence interval for πœ‡, the average danceability of a pop-song on Spotify. Do you detect any differences in the confidence intervals? If yes, what are those differences?
    3. From previous research studies in the musical sector, average danceability of pop-songs should be at least 0.60. It has been shown that the population standard deviation of danceability is 𝜎 = 0.10. Run a hypothesis test to challenge the null hypothesis 𝐻0: πœ‡ β‰₯ 0.60 using a significance level of 𝛼 = 0.05. What is your conclusion?
    4. The hypothesis test and the confidence interval in (c) and (b), respectively, are reliable if and only if the population from which the sample is taken is normally distributed. Is it true in this case? Discuss.

Buy Answer of This Assessment & Raise Your Grades

Are you stuck on MA12009: Data Analysis for Business? Don’t worry! Our business analysis assignment helpers are the best for you. Our expert PhD writers provide you 24/7 online assignment help. Yes, you will also get free assignment samples that will give you a perfect idea of ​​writing a great-quality assignment. Don’t worry about the deadlines, as we guarantee timely delivery. Contact us now for high-quality and plagiarism-free work and boost your grades!

 

Answer
img-blur-answers
WhatsApp Icon