Order for this Paper or Similar Assignment Writing Help

Fill a form in 3 easy steps - less than 5 mins.

Posted: February 9th, 2023

Homework Assignment 2: [15 points] Prepare a line graph

Homework Assignment 2:
1. [15 points] Prepare a line graph for the ridership of Amtrak data from the beginning of 1991 to March 2004 with the labels indicating the axes. Print your R command and line graph. After observing the behavior of the ridership from your graph, answer the following questions using related statistics:
Which year/month does max/min belong to?
What are the range and IQR of the ridership? Are there any outliers?

2. [15 points] Prepare a boxplot of the ridership as well as a histogram. Print the boxplot/histogram with R commands to generate them. What can you say about the outliers (are there outliers) and distribution (is the distribution symmetric or skewed, if skewed, is it skewed to right or left)?

3. [20 points] Use the Excel data set Pollution (the variables of this data set are explained below). Prepare a heatmap to observe the correlations of any two numeric variables of Pollution with correlation coefficients printed in cells. Print the heatmap and R command(s) to produce it. Which variable pairs have the highest/lowest positive/negative correlation. How do you interpret the highest positive and negative correlations as a data analyst, do these correlations make sense?

The Pollution.xlsx data set includes regional climate, pollution, and population demographic statistics from 1960 in the United States. Below are the descriptions for these variables.

Variable Description
PREC Average annual precipitation in inches
JANT Average January temperature in degrees F
JULT Average July temperature in degrees F
OVR65 Percent population aged 65 or older
POPN Average household size
EDUC Median school years completed by those over 22
HOUS Percent housing units, which are in good repair and with all facilities
DENS Population per square mile in urbanized areas, 1960
NONW Percent non-white population in urbanized areas, 1960
WWDRK Percent employed in white collar occupations
POOR Percent of families with income < $3000
HC Relative hydrocarbon pollution potential
NOX Relative nitric oxides pollution potential
SO2 Relative sulfur dioxide pollution potential
HUMID Annual average % relative humidity at 1:00 pm
MORT Total age-adjusted mortality rate per 100,000

4. [15 points] Select any five variables of Pollution you wish and prepare a matrix scatter plot with these five variables. Print the matrix scatter plot and the R command(s) to produce it. Interpret diagonal and off-diagonal elements of this matrix. Are the correlation coefficients you computed in line with your answer to Question 3?
Hint: Install GGally package first and then use ggpairs command like the one in your textbook.
5. [15 points] Use the Pollution data set for PCA. How many components are required for us to explain at least 99.9% of variance. Provide a table from the output to support your claim.
Hint: Refer to related lecture notes and textbook section. You can use an R command line like
“pcs <- prcomp(data.frame(Pollution))” to run the PCA.

6. [20 points] Run the PCA one more time with scaling, and print its output. How many components are required for us to explain at least 80% of variance.
Hint: Modify the R command you used in Q6 by a line like “pcs <- prcomp(data.frame(Pollution), scale. =T)”.

7. [10 points, BONUS] Return the observations of the Pollution in terms of its principal components by setting the “scores” equal to these, i.e. use a command like “scores = pcs$x”. Print first five observations of the data set in terms of its principal components, i.e. type an R command like “head(scores, 5)”. Compute the correlation between any two columns of scores to see that the principal components of scores are not correlated, i.e. type an R command like “cor(scores[ ,1], scores[ ,2])”. Keep in mind that the correlation coefficient like “-5.360244e-17” in R is practically 0.


Tags: , , , ,

Homework Help For You!

Special Offer! Get 20-30% Off on Every Order!

Why choose us?

Every student wants the best grades and that’s our Focus

Top Essay Writers

We carefully choose the most exceptional writers to become part of our team, each with specialized knowledge in particular subject areas and a background in academic research writing.

Affordable Prices

Our service prioritizes recruiting the most talented writers at an affordable cost. We facilitate the lowest possible pricing without compromising the quality of our services. Our costs are student friendly and competitive in comparison to other writing services in the industry.

100% Plagiarism-Free

The service guarantees that our final work is 100% original and plagiarism-free, ensuring this through a thorough scan of every draft copy using advanced plagiarism detection software before releasing it to be delivered to our valued customers.

How it works

When you decide to place an order with Nursing Assignment Answers, here is what happens:

Complete the Order Form

You will complete our order form, filling in all of the fields and giving us as much detail as possible.

Assignment of Writer

We analyze your order and match it with a writer who has the unique qualifications to complete it, and he begins from scratch.

Order in Production and Delivered

You and,the support and your writer communicate directly during the process, and, once you receive the final draft, you either approve it or ask for revisions.

Giving us Feedback (and other options)

We want to know how your experience went. You can read other clients’ testimonials too. And among many options, you can choose a favorite writer.