Performance Prediction and Analysis using Decision Tree Algorithms

A Literature Review from 2011 to 2014 on Student’s Academic Performance Prediction and Analysis using Decision Tree Algorithms

Abstract— Success of any educational institute depends upon the success of the students of institute. Student’s performance prediction and its analysis are essential for improvement in various attributes of students like final grades, attendance etc. This prediction helps teachers in identification of weak students and to improve their scores. Various data mining techniques like classification, clustering, are used to perform analysis. In this paper implementation of various decision tree algorithms ID3, J48/C4.5, random tree, Multilayer Perception, Rule Based and random forest have been studied for student’s performance prediction and analysis. The WEKA tool is used to perform evaluation. To evaluate the performance percentage split method or cross validation method is used. Main objective behind this analysis is to improve student’s performance. This review paper explores the use of various decision tree algorithms for student’s academic performance prediction and its analysis.

Keywords— EDM, Decision tree, J48, random tree, ID3, Multilayer Perception, CART, IBI.

I. Introduction

A. Data Mining and Educational Data Mining(EDM)

Data mining is a process of taking out useful information and patterns from large amount of data. Data Mining is used for solving problems by analyzing data that is present in the databases. [1]

Educational Data Mining (EDM) is a process which is concerned with developing various techniques or methods for extracting the different types of data that come from educational settings, and use of those methods for better understanding of students. Main uses of EDM include student performance prediction and studying students learning to suggest improvements in current educational practice. [2]

B. Student Performance Prediction and Analysis

In student performance prediction, we predict the unknown value of a variable that defines the student. In educational sector, the mostly predicted values are student’s performance, their marks, knowledge or score. Student’s performance prediction is very popular application of DM in education sector. Different techniques and models are applied for prediction and analysis of student’s performance like decision trees, neural networks, rule based systems, Bayesian networks etc. This analysis is helpful for someone in predicting student’s performance i.e. prediction about student’s success in a course and prediction about student’s final grade on the basis of features taken from logged data. [2][3]

This paper is organized as follows: In section II we present work related to student performance prediction and analysis. In section III we present comparative study of survey. Conclusion is presented in section IV. In section V we discuss future scope.

II. RELATED WORK

Considering the improvements required in students grades or scores, literature survey has been surveyed based on student performance prediction and analysis using decision tree algorithms.

Brijesh Kumar Baradwaj, Saurabh Pal [5] (2011) have discussed that students performance is examined by internal marks and final results. Data set of 50 students was used in this study which was taken from MCA department of VBS Purvanchal University, Uttar Pradesh. Information like previous semester marks, attendance, and assignment and class test marks from previous database of students. They have used decision tree algorithms for student performance prediction and analysis. This overall study will help faculty members in improving student’s scores for future examinations.

R. R. Kabra, R. S. Bichkar [11] (Dec. 2011) collected data from S.G.R. college of engineering and management, Maharashtra. They collected data from 346 students of engineering first year. Evaluation was performed using J48 algorithm by 10 fold cross validation. The accuracy of J48 algorithm was 60.46%. This model is successful in identifying the students who are likely to fail. So it will be helpful for increasing performance of students.

Read also  Philosophy And Purpose Of A Early Childhood Education Education Essay

Surjeet Kumar Yadav, Saurabh Pal [6] (2012) conducted analysis on 90 students of engineering department (session 2010) from VBS Purvanchal University, Uttar Pradesh. ID3, C4.5 and CART decision tree algorithms were used for evaluation. Evaluation was performed using 10 fold cross validation method. It has been found that C4.5 has higher accuracy 67.7778% than ID3 and CART algorithm. Model’s True Positive rate for class Fail is high 0.786 for ID3 and C4.5 which means it will successfully identify the fail students. This study will be helpful for those students that need special attention from teachers.

Manpreet Singh Bhullar, Amritpal Kaur [10] (2012) have taken data set of 1892 students from various colleges for student performance prediction and evaluation. J48 algorithm was chosen for evaluation using 10 fold cross validation. Success rate of J48 algorithm was 77.74%. In this way it will be helpful in identifying weak students so that teachers can help them before failure.

Mrinal Pandey, Vivek Kumar Sharma [4] (Jan. 2013) compared J48, Simple Cart, Reptree and NB tree algorithms for predicting performance of engineering students. They have taken data of 524 students for 10 fold cross validation and 178 students for percentage split method. It has been found that J48 decision tree algorithm achieved higher accuracy 80.15% using 10 fold cross validation method. By using percentage split method higher accuracy 82.58% is achieved by J48 algorithm. From this comparison it has been found that J48 performs best than other algorithms in both the cases. J48 decision tree algorithm will be useful for teachers in improving performance of weak students.

Anuja Priyam, Abhijeet, Rahul Gupta, Anju Rathee, and Saurabh Srivastava [12] (June 2013) compared ID3, C4.5 and CART decision tree algorithms on the basis of students data. Evaluation was performed using 10 fold cross validation method. It shows that the CART algorithm has higher accuracy 56.2500%. Model’s True Positive rate for class Fail is high 0.786 for ID3 and C4.5 which means it will successfully identify the fail students. So this model will help teachers in reducing failure rates.

Ramanathan L, Saksham Dhanda, Suresh Kumar D [14] (June-July 2013) performed analysis on 50 students data. They were used nave bayes, J48 and proposed algorithm (Weighted ID3) for evaluation. It shows that WID3 has higher accuracy 93% than J48 and nave bayes. In future you can made user friendly software using WID3 which will be very helpful for teachers.

Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao [7] (September 2013) performed analysis on data set of 182 students using ID3 and C4.5 decision tree algorithms. When they performed bulk evaluation on data set of 173 students both algorithms have same accuracy of 75.145% and when they performed singular evaluation on data set of 9 students then both algorithms have accuracy 77.778%. For 182 students accuracy was approximately 75.257.

Mrs. M.S. Mythili, Dr. A.R.Mohamed Shanavas [9] ( Jan. 2014) compared J48, Random Forest, Multilayer Perception, IBI and decision tree algorithms using data set of 260 students from various schools. 10 fold cross validation was chosen for evaluation. It has been found that Random Forest has higher accuracy 89.23% and less execution time amongst all other algorithms. This study will be helpful for educational institutions.

Jyoti Namdeo, Naveenkumar Jayakumar [13] (Feb. 2014) collected 51 students data from MCA 2007 batch. Decision tree algorithms used in evaluation were Nave Bayes, Multilayer Perception, J48 and Random Forest. These algorithms were trained on 2007 batch data and tested on 2008 batch data. Evaluation was performed using training, cross validation, percentage split and test on 2008 data. After testing on 2008 data it has been found that nave bayes has higher accuracy 31.57% amongst other algorithms but this accuracy is not according to requirement.

Read also  My personal philosopgy of education

Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad [8] (September 2014) conducted analysis on 399 records of students using nave bayes, rule based and J48 decision tree algorithm. They have used cross validation and percentage split method for evaluation. In cross validation 3, 5, 10 fold cross validation was performed and in percentage split method training: testing 10:90, 20:80, 30:70, 40:60, 50:50, 40:60, 30:70, 20:80, 10:90 percentage split were used. After comparison of 3 classification algorithms it has been found that rule based and J48 decision tree algorithm has higher accuracy 68.8%.

III. COMPARATIVE STUDY OF SURVEY

  1. Comparison of survey work based on different parameters

Paper Name

Year of Publication

Size of Data Set

(No. of students)

Algorithms Used

Test Options Used

Algorithm with Higher Accuracy

Accuracy (in %) of Algorithm

Performance Prediction of Engineering Students using Decision Trees

Dec. 2011

346

J48

Cross Validation

J48

60.46%

Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification

2012

90

ID3

C4.5

CART

Cross Validation

C4.5

67.7778%

Use of Data Mining in Education Sector

2012

1892

J48

Cross Validation

J48

77.74%

A Decision Tree Algorithm Pertaining to the Student Performance Analysis and Prediction

Jan. 2013

524

J48

Simple cart

Reptree

NB tree

Cross Validation

J48

80.15%

178

J48

Simple cart

Reptree

NB tree

Percentage Split

J48

82.58%

Comparative Analysis of Decision Tree Classification Algorithms

June 2013

____________

ID3

C4.5

CART

Cross Validation

CART

56.2500%

Predicting Students’ Performance using Modified ID3 Algorithm

June-July 2013

50

Nave bayes

J48

Weighted ID3

____________

Weighted ID3

93%

Predicting Students Performance using ID3 and C4.5 Classification Algorithms

September 2013

173

ID3

C4.5

for bulk evaluation

Cross Validation

ID3

C4.5

75.145%

9

ID3

C4.5

for singular evaluation

Cross Validation

ID3

C4.5

77.778%

An Analysis of students’ performance using classification algorithms

Jan. 2014

260

J48

Random Forest

Multilayer Perception

IBI

Cross Validation

Random Forest

89.23%

Predicting Students Performance Using Data Mining Technique with Rough Set Theory Concepts

Feb. 2014

51

J48

Random Forest

Multilayer Perception

Nave Bayes

Training

Cross Validation

Percentage Split

Test

Nave Bayes

31.57%

First Semester Computer Science Students’ Academic Performances Analysis by Using Data Mining Classification Algorithms

September 2014

399

Nave Bayes

J48

Rule Based

Cross Validation

Percentage Split

J48

68.8%

IV. CONCLUSION

Educational data mining’s (EDM) importance is increasing day by day as the student’s performance prediction and analysis requirements are increasing for improvement of student’s academic performance. As given above various authors have implemented different decision tree algorithms: J48, random forest, multilayer perception, nave bayes, rule based, IBI, reptree, NB tree and CART using different data sets. Some authors performed comparison of algorithms to find out the best algorithm from them on the basis of accuracy. The survey done in this paper shows that most probably J48/C4.5 decision tree algorithm is considered best algorithm in terms of accuracy for different data sets. So it is clear from survey that J48 performs well for any size of data set. This is the reason behind wide use of J48 algorithm amongst all decision tree algorithms.

Survey done in the section II will be helpful to various researchers that are working in the field of student’s performance prediction and analysis using decision tree algorithms.

Read also  Personality And Academic Achievement Education Essay

V. FUTURE WORK

For growth of any educational institute, student’s academic performance is main contributor. If students perform well academically then institution growth rate goes high. It is necessary in these days to focus on the student’s results so there is a wide scope in this field. To increase student’s performance, student performance prediction and analysis is used. For this purpose decision tree algorithms are used mainly. Various researchers have done lot of research in this field by performing evaluation using single algorithm or by comparing three or four algorithms.

In future researchers can enhance the research by comparing large number of algorithms using large size data sets. So there is a wide scope for researchers in this field.

ACKNOWLEDGMENT

First of all I express my sincerest debt of gratitude to the Almighty God who always supports me in my endeavors.

I would like to thank Prof. Neena Madan for their encouragement and support. Then, I would like to thank my family and my friends. I am thankful to all those who helped me in one way or the other at every stage of my work.

REFERENCES
  1. Nikita Jain, Vishal Srivastava, “Data mining techniques: A survey paper”, IJRET: International Journal of Research in Engineering and Technology, Volume: 02 Issue: 11, Nov-2013.
  2. Mrs. M.S. Mythili, Dr. A.R.Mohamed Shanavas, “An Analysis of students’ performance using classification algorithms”, IOSR Journal of Computer Engineering, Volume 16, Issue 1, January 2014.
  3. Dr. Mohd Maqsood Ali, “Role of data mining in education sector”, International Journal of Computer Science and Mobile Computing Vol. 2, Issue. 4, April 2013.
  4. Mrinal Pandey, Vivek Kumar Sharma, “A Decision Tree Algorithm Pertaining to the Student Performance Analysis and Prediction”, International Journal of Computer Applications Volume 61, No.13, January 2013.
  5. Brijesh Kumar Baradwaj, Saurabh Pal, “Mining Educational Data to Analyze Students Performance”, International Journal of Advanced Computer Science and Applications, Vol. 2, No. 6, 2011.
  6. Surjeet Kumar Yadav, Saurabh Pal, “Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification”, World of Computer Science and Information Technology Journal Vol. 2, No. 2, 2012.
  7. Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao, “Predicting Students Performance using ID3 and C4.5 Classification Algorithms”, International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013.
  8. Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad, “First Semester Computer Science Students’ Academic Performances Analysis by Using Data Mining Classification Algorithms”, Proceeding of the International Conference on Artificial Intelligence and Computer Science(AICS 2014), September 2014.
  9. Mrs. M.S. Mythili, Dr. A.R.Mohamed Shanavas, “An Analysis of students’ performance using classification algorithms”, IOSR Journal of Computer Engineering (IOSR-JCE) Volume 16, Issue 1, Jan. 2014.
  10. Manpreet Singh Bhullar, Amritpal Kaur, “Use of Data Mining in Education Sector”, Proceedings of the World Congress on Engineering and Computer Science (WCECS), San Francisco, USA, October 2012.
  11. R. R. Kabra, R. S. Bichkar, “Performance Prediction of Engineering Students using Decision Trees”, International Journal of Computer Applications Volume 36, No.11, December 2011.
  12. Anuja Priyam, Abhijeet, Rahul Gupta, Anju Rathee, and Saurabh Srivastava, “Comparative Analysis of Decision Tree Classification Algorithms”, International Journal of Current Engineering and Technology, Volume 3, No .2, June 2013.
  13. [13] Jyoti Namdeo, Naveenkumar Jayakumar, “Predicting Students Performance Using Data Mining Technique with Rough Set Theory Concepts”, International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 2, February 2014.
  14. [14] Ramanathan L, Saksham Dhanda, Suresh Kumar D, “Predicting Students’ Performance using Modified ID3 Algorithm”, International Journal of Engineering and Technology (IJET) Volume 5, No. 3, Jun-Jul 2013.
Order Now

Order Now

Type of Paper
Subject
Deadline
Number of Pages
(275 words)