Tuesday, May 5, 2020

Hypothesis Testing free essay sample

Introduction The data for the first test to be conducted by our group consists of the prices of residential properties in various locations. The locations are Toronto, San Francisco and Montreal. The values of the samples are all represented in Canadian Dollars. The data taken are based on the residential property prices on January 8th 2012. Our group will execute a test to determine if there is a significant difference in the mean residential property prices for Toronto, San Francisco and Montreal. Furthermore, if the tests conclude that there is a difference in mean prices, our group will indicate where the prices are higher or lower. Hypothesis Testing For this data set, our group has chosen to conduct a one way Analysis of Variance F test (one-way ANOVA F-test). A one-way ANOVA F-test is appropriate in this example since it is a hypothesis technique that is used to compare means from three or more populations. Since the data set reflects the mean prices of residential properties in Toronto, San Francisco and Montreal, a one way ANOVA F-test is sufficient. By having at least three samples in the data, our group has eliminated the idea of testing the claim by using different tests, such as a â€Å"two sample T-test†, a â€Å"paired sample T-test† or a â€Å"two sample Z test. † In order for a one way Analysis of Variance F test to be conducted, the following conditions must be met: (1) Each sample must be selected from a normal, or approximately normal, population. (2) The samples must be independent and randomly selected. (3) Each population must have the same variance. Looking at the conditions stated above, all the samples provided by the Toronto Real Estate Board reflect data from that are randomly selected, which are independent of each other. That is, there is no correlation between the sample groups. Our group has constructed three box plots to test the normality of the sample values, one for each location. Similar to a t test, the F test is fairly non-sensitive to slight departure from normality. Since the box plots do not indicate extreme differences from a normal distribution, we can assume that the samples are selected from a normal population. The third condition states that the variances of the sample groups are equal. Therefore, our group will conduct a Levene’s Test for Homogeneity of Variance using SPSS program to test whether the data set satisfies the third assumption. Results from Levene Test Null Hypothesis:? 12 = ? 22 = ? 32 Alternative Hypothesis:? 12, ? 22, ? 32are not all equal *Ho for this instance is the claim, since Ho is a statement of equality ?12 represents the variance for the population of residential properties in Toronto, ? 22 represents the variance for the population of residential properties in San Francisco and ? 32 represents the variance for the population of residential properties in Montreal. (? =0. 05) Using the data from SPSS output, the P-value (represented by â€Å"Sig. † – Oneway DataSet 1\residential sales. sav) found on the first table – Test of Homogeneity of Variances is 0. 549. Since P-value ? ; fail to reject Ho Therefore, at 5% level of significance, there is insufficient evidence to indicate that the claim that all the variances of the samples provided are equal is false. All the conditions are therefore satisfied, and our group can proceed with the one way analysis of variance F test. Since all the conditions for a one way analysis of variance are satisfied, then the sampling distribution can now be approximated by the F distribution. Our group can now execute a one way Analysis of Variance F test by using a Post-Hoc Comparison Procedure to test the claim that â€Å"there is a significant difference in the mean residential property prices for Toronto, San Francisco and Montreal. † Null Hypothesis: µ1 =  µ2 =  µ3 Alternative Hypothesis:At least one mean is different. *Ha for this instance is the claim, since Ha is a statement of inequality Parameters  µ1 represents the mean residential property price in Toronto.  µ2 represents the mean residential property price in San Francisco, while  µ3 represents the mean residential property price in Montreal. The null hypothesis suggests that there is no difference between the means of the three samples, while the claim in the alternative hypothesis suggests that at least one mean is different. Since no level of significance was given, we assume that: ? = 0. 05 Conclusion Using the data from SPSS output, the P-value (represented by â€Å"Sig. † – One Way DataSet 1\residential sales. sav) found on the second table – ANOVA is 0. 140. Since P-Value ? ; fail to reject Ho Therefore, at 5% level of significance, there is insufficient evidence to indicate that the claim that there is a significant difference in the mean residential property prices for Toronto, San Francisco and Montreal is true. *Full SPSS Output can be found in the appendix section of the report. Part B – Difference in Lot Sizes for Residential Properties in Toronto and Vancouver Introduction The data for the second test to be conducted by our group consists of lot sizes of the residential properties that are up for sale in Toronto and Vancouver. The samples are represented in m2 (metres squared; area of the land in which the residential properties are built on). The data taken are based on the properties that are up for sale as of January 8th 2012. Our group will execute a test to determine if there is a significant difference in the lot sizes for the residential properties for sale in Toronto and Vancouver, as commissioned by the Toronto Real Estate Board. Hypothesis Testing For this data set, our group has chosen to conduct a two sample T-test. A two sample T-test is appropriate in this case because of the attempt in determining the difference between two population means when the population standard deviations are unknown. Furthermore, the data given reflects independent samples. That is, the sample selected from the population in Toronto is not related to the sample from the population in Vancouver. In order for a two sample T-test for difference of means with small independent samples to be conducted, the following conditions must be met: (1) The samples must be randomly selected. (2) The samples must be independent. (3) Each population must have a normal distribution with an unknown standard deviation. Since there is no correlation between the sample groups (Toronto and Vancouver lot sizes), a paired T-test cannot be conducted for this data set. Also, since there are exactly only two means that are being compared in the given case, and not means between three or more populations, a one-way analysis of variances test (one way ANOVA) cannot be used. Looking at the conditions stated above, the samples provided by the Toronto Real Estate Board are randomly selected and independent. By checking the normality in each of the populations, our group constructed two separate box plots for Toronto and Vancouver respectively. There is no significant evidence to conclude that both the populations are not normally distributed since the box plots resemble a normal distribution. Having the conditions satisfied, our group can proceed to execute a two sample T-test for difference of means with small independent samples in testing the claim that â€Å"there is a significant difference in the lot sizes for the residential properties for sale in Toronto and Vancouver,† as commissioned by the Toronto Real Estate Board. Null Hypothesis: µ1 =  µ2 Alternative Hypothesis: µ1 ?  µ2 *Ha for this instance is the claim, since Ha is a statement of inequality Parameters  µ1 represents the mean lot size for the first population, Toronto.  µ2 represents the mean lot size for properties for sale in Vancouver. The alternative hypothesis states that there is a significant difference between the lot sizes for the properties for sale in Toronto and Vancouver. Consequently, the null hypothesis represents a statement of equality, that the lot sizes of the properties in Toronto and Vancouver are equal. Since no level of significance was given, we assume that: ? = 0. 05 In order to distinguish whether the variances are equal or not equal, which is significant in attempting a two sample T test for difference of means for small independent samples, our group will use the SPSS output from the Levene’s Test of Equality of Variances. Results from Levene Test Null Hypothesis:? 12 = ? 22 Alternative Hypothesis:? 12 ? ?22 *Ho for this instance is the claim, since Ho is a statement of equality ?12 represents the variance for the population of lot sizes of properties for sale in Toronto while ? 22 represents the variance for the population of lot sizes of properties in San Francisco. (? =0. 05) For the purpose of this test, the claim states that the variances of the two populations are equal, as represented by Ho Using the data from SPSS output, the P-value (represented by â€Å"Sig. † – T-Test DataSet 0\LotSizes. sav) found on the second table table – Levene’s Test for Equality of Variances is 0. 000. Since P-value ? ; reject Ho Therefore, at 5% level of significance, there is sufficient evidence to indicate that the claim that the variances of the two populations are equal is false. For the purpose of the two sample T test, all results will be based on the assumption that variances are not equal. Conclusion Using the data from SPSS output, the P-value (represented by â€Å"Sig. † – T-Test DataSet 0\LotSizes. sav) found on the third table – t-test for Equality of Means (Equal Variances not assumed) is 0. 0455. (calculated as 0. 091/2, since it is a two-tailed test) Since P-Value ? ; reject Ho Therefore, at 5% level of significance, there is sufficient evidence to indicate that the claim that there is a significant difference in the lot sizes for residential properties in Toronto and Vancouver is true. *Full SPSS Output can be found in the appendix section of the report. Part C – Difference in Incomes: New York homebuyers vs Toronto homebuyers Introduction The data for the third test to be conducted by our group consists of family incomes in Toronto and New York. The samples are paired by the value of the homes purchased. (For example, the first pair in the data set shows the income of the household in Toronto and New York, whether the home cost $500 000, $200 000 or $750 000, etc. It is of great significance to point out that for each pair, the homes purchased in Toronto are of the same value as the homes purchased in New York. ) The data only reflects the incomes of the homebuyers in which, the values are represented in Canadian Dollars. Our group will execute a test for the purpose of distinguishing whether the incomes of families who had purchased homes in New York was significantly higher than the incomes of families from Toronto who had purchased homes of the same value. Hypothesis Testing For this data set, our group have chosen to conduct a paired T-test. A t-test is a statistical test that compares the means of two groups of observations. For this instance, the data are classified into the groups: family income in Toronto, and family income in New York. In order for a paired T-test to be conducted, the following conditions must be met: (1) Samples must be randomly selected. (2) Samples must be dependent. (3) Both populations must be normally distributed. Unlike the two sample T-test for small independent samples conducted in the second data set: ‘Part B’, our group have used the paired T-test to account for the correlation between the groups; that the family incomes displayed in the data set are used under the assumption that the same value of the homes purchased are used to gather the sample in Toronto and New York. A one way ANOVA F-test cannot be used for this data set since only two means are being evaluated. A different method of valuation would be used if one of the conditions listed above are not met. For example, if the data set reflects values that are independent of each other, similar to the case in ‘Part B’, then a paired T-test cannot be used. However, in this case, the conditions stated above are met. The samples are randomly selected and as stated before, dependent variables. Assuming that the family incomes are normally distributed, a paired T-test can be used. To check for the assumption of normality in each of the two populations, our group has created a box plot for each of the sample groups. For both of the samples, there appears to be only a slight offset from normality. Therefore, the assumption that these populations are normally distributed can be assumed. Having all three conditions satisfied, our group believes that a paired T-test is the best method of valuation to distinguish, as required by the Toronto Real Estate Board whether the claim that â€Å"the incomes of families who purchased houses in New York are significantly higher than the incomes of families who purchased houses of similar value in Toronto. † Null Hypothesis: µd ? 0 Alternative Hypothesis: µd 0 *Ha for this instance is the claim, since Ha is a statement of inequality Parameters  µd represents the mean difference. The mean difference is calculated by subtracting the total  µ of the incomes of homebuyers in Toronto (off the sample) from the total  µ of the incomes of homebuyers in New York (off the sample). In the claim as stated in the alternative hypothesis, the mean difference between the data on New York and Toronto (respectively) is greater than zero. Consequently, the null hypothesis represents the mean difference between the data on New York and Toronto (respectively) is less than or equal to zero. Since no level of significance was given, we assume that: ? = 0. 05 Conclusion Using the data from SPSS output, the P-value (represented by â€Å"Sig. † – T-Test DataSet 0\incomes. sav) found on the fifth table – Paired Samples Test is 0. 00002. (calculated as 0. 00004/2) Since P-Value ? ; reject Ho Therefore, at 5% level of significance, there is sufficient evidence to support the claim that the incomes of families who purchased houses in New York are significantly higher than the incomes of families who purchased houses of similar value in Toronto. *Full SPSS Output can be found in the appendix section of the report. Appendix – A Data Sets Part A – Residential Property Prices, on January 8th 2012 (in Canadian Dollars) Toronto San Francisco Montreal 720001 597114 260976 250025 350000 1141734 531968 693303 150024 391546 350140 157679 253440 397924 150000 251929 684874 1021251 1181788 350000 1123663 1040350 372008 158139 259516 432300 150000 250000 1194071 237032 1299055 350073 153574 279536 920792 150160 250001 935680 150044 257339 1394330 994698 253829 350000 150000 256582 350053 167171 250002 350748 968300 509638 755749 536302 375351 1083741 170861 268679 350056 1192483 250661 365259 303747 250000 399823 497641 1281119 350045 151925 721976 786536 150000 250002 409025 150359 255660 358312 150004 665974 352342 979221 1236283 1225525 150003 250003 790511 299575 609906 350000 159163 Part B – Lot Sizes of Properties for Sale, on January 8th 2012 (in m2) Toronto Vancouver 114 117 262 129 329 120 104 118 285 128 101 159 194 212 112 222 187 114 98 129 100 116 251 197 99 123 333 116 235 137 148 115 211 175 299 118 107 117 106 127 108 104 130 102 147 Part C – Family Income Paired by Purchase Price (in Canadian Dollars) Pair Toronto New York 1 72068 124174 2 70336 68999 3 106144 113291 4 66032 38411 5 68221 75876 6 68241 106390 7 72555 83540 8 107401 131762 9 107633 121399 10 65647 60630 11 73041 100185 12 101180 158397 13 69264 77775 14 120293 127590 15 81531 99192 16 165996 179133 17 105039 123537 18 67512 80347 19 97143 129711 20 71947 92019 21 77992 77580 22 90858 107446 23 142215 203356 24 101219 128540 25 92541 122134 Appendix B – SPSS Data Output The following are printed from the SPSS program. The following tables show results from various tests conducted in an attempt to determine the validity of three separate claims, as commissioned by the Toronto Real Estate Board. The tables are arranged in accordance with the order in which the tests were conducted in, with the order being the following: (1) One way ANOVA test, containing a Test of Homogeneity of Variances and Post Hoc Comparison Method – Part A (Residential Property Price – Multiple Comparisons; Toronto, San Francisco and Montreal) (2) T-test for Equality of Means for Independent Samples, containing Levene’s Test for Equality of Variances – Part B (Lot Sizes of Properties for Sale in Toronto and Vancouver) (3) T-test for Paired Samples – Part C (Family Incomes paired by Purchase Price in Toronto and New York) *All SPSS Outputs are located in the pages following.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.