# Assignment 3

1
Assignment 3
Due: November 8th (Saturday) end of day
Total points: 100
Instructions
“SAMA provides reliable, up-to-date property assessment valuations – the pillar upon which governments set tax policy and property tax rates.”
You are a data analyst working for Saskatchewan Assessment Management Agency (SAMA). In order to properly assess property tax, municipalities must have an accurate idea of what the market price is for all homes in their respective municipalities. SAMA provides those property assessments.
SAMA uses regression analysis for determining the market prices of all houses. Your boss has asked you to make recommendations on what variables to include in the SAMA regression model used to determine property values for the next assessment period for all homes in Regina.
For this analysis look at how the below variables:
? Building Type
? Built in
? Bathrooms (Total)
? Cooling
? Floor Space
Impacts the housing prices. For your analysis report in a table the summary statistics (mean, median, mode, max, min, sd and the coefficient or variation) for each variables above in addition to the asking price variable.
You are provided with a raw data file from the MLS website.
You may assume:
? The data you have is a simple random sample of housing in Regina
? The asking price is the same as the sold price.
Structure your analysis the same way as below
ISSUE (5pts)
Short one or two sentence explanation of your research.
2
BACKGROUND (5pts)
A short description of the expected result and the economic theory behind it.
A short description of the data you are using (source, quality, sampling method, bias).
ANALYSIS
Access (20pts)
a) Make a query to pull your data from the database with the correct variables call it qryhousing(Lab2.7)
a. Make a new ‘Floor space’ variable that is a number (sqft removed) and call it ‘Floor space_Number’
Excel(20pts)
a) Link in your query data from access into an excel table (do not copy and paste data – use Lab 2.8 as a guide) (note do not make a pivot table make a table) call this sheet ‘cleaning data’
a. Make a new variable called ‘Asking_price_Bin’ that will be used to make histograms have the bin ranges increment by 100,000
b. Make a new variable for ‘Bathrooms(Total)’ with 1,2 and over 3 as data points and call it ‘Bathrooms(Total)_Categorical’
c. Make a categorical variable for the floor space variable
c) Create tables and graphs in an analysis sheet
a. Make a frequency distribution for the asking price
b. Create a summary statistics table for each variable including the (mean, median, mode, max, min, sd and the coefficient or variation)
c. Make a relative frequency distribution for asking price comparing the distributions for different values of (lab3 .2)
i. Building Type
ii. Bathrooms(Total)
iii. Cooling
iv. Floor Space
d. Report the conditional probabilities of
i. (Asking price between 200-300k | >3 Bathrooms)
ii. (Asking price between 200-300k | Air Conditioning)
iii. (Asking price between 300-400k | House)
e. Test the relationships below for dependence/independence
i. (Asking price between 100-200k | >3 Bathrooms)
ii. (Asking price between 200-300k | Air Conditioning)
3
iii. (Asking price between 0-100k | House)
Written(20pts)
Discuss you frequency distributions as well as you summary statistics. What variable would you recommend putting in the regression model.
CONCLUSION (10pts)