Performance Prediction and Analysis using Decision Tree Algorithms
A Literature Review from 2011 to 2014 on Student’s Academic Performance Prediction and Analysis using Decision Tree Algorithms
Abstract— Success of any educational institute depends upon the success of the students of institute. Student’s performance prediction and its analysis are essential for improvement in various attributes of students like final grades, attendance etc. This prediction helps teachers in identification of weak students and to improve their scores. Various data mining techniques like classification, clustering, are used to perform analysis. In this paper implementation of various decision tree algorithms ID3, J48/C4.5, random tree, Multilayer Perception, Rule Based and random forest have been studied for student’s performance prediction and analysis. The WEKA tool is used to perform evaluation. To evaluate the performance percentage split method or cross validation method is used. Main objective behind this analysis is to improve student’s performance. This review paper explores the use of various decision tree algorithms for student’s academic performance prediction and its analysis.
Keywords— EDM, Decision tree, J48, random tree, ID3, Multilayer Perception, CART, IBI.
I. Introduction
A. Data Mining and Educational Data Mining(EDM)
Data mining is a process of taking out useful information and patterns from large amount of data. Data Mining is used for solving problems by analyzing data that is present in the databases. [1]
Educational Data Mining (EDM) is a process which is concerned with developing various techniques or methods for extracting the different types of data that come from educational settings, and use of those methods for better understanding of students. Main uses of EDM include student performance prediction and studying students learning to suggest improvements in current educational practice. [2]
B. Student Performance Prediction and Analysis
In student performance prediction, we predict the unknown value of a variable that defines the student. In educational sector, the mostly predicted values are student’s performance, their marks, knowledge or score. Student’s performance prediction is very popular application of DM in education sector. Different techniques and models are applied for prediction and analysis of student’s performance like decision trees, neural networks, rule based systems, Bayesian networks etc. This analysis is helpful for someone in predicting student’s performance i.e. prediction about student’s success in a course and prediction about student’s final grade on the basis of features taken from logged data. [2][3]
This paper is organized as follows: In section II we present work related to student performance prediction and analysis. In section III we present comparative study of survey. Conclusion is presented in section IV. In section V we discuss future scope.
II. RELATED WORK
Considering the improvements required in students grades or scores, literature survey has been surveyed based on student performance prediction and analysis using decision tree algorithms.
Brijesh Kumar Baradwaj, Saurabh Pal [5] (2011) have discussed that students performance is examined by internal marks and final results. Data set of 50 students was used in this study which was taken from MCA department of VBS Purvanchal University, Uttar Pradesh. Information like previous semester marks, attendance, and assignment and class test marks from previous database of students. They have used decision tree algorithms for student performance prediction and analysis. This overall study will help faculty members in improving student’s scores for future examinations.
R. R. Kabra, R. S. Bichkar [11] (Dec. 2011) collected data from S.G.R. college of engineering and management, Maharashtra. They collected data from 346 students of engineering first year. Evaluation was performed using J48 algorithm by 10 fold cross validation. The accuracy of J48 algorithm was 60.46%. This model is successful in identifying the students who are likely to fail. So it will be helpful for increasing performance of students.
Surjeet Kumar Yadav, Saurabh Pal [6] (2012) conducted analysis on 90 students of engineering department (session 2010) from VBS Purvanchal University, Uttar Pradesh. ID3, C4.5 and CART decision tree algorithms were used for evaluation. Evaluation was performed using 10 fold cross validation method. It has been found that C4.5 has higher accuracy 67.7778% than ID3 and CART algorithm. Model’s True Positive rate for class Fail is high 0.786 for ID3 and C4.5 which means it will successfully identify the fail students. This study will be helpful for those students that need special attention from teachers.
Manpreet Singh Bhullar, Amritpal Kaur [10] (2012) have taken data set of 1892 students from various colleges for student performance prediction and evaluation. J48 algorithm was chosen for evaluation using 10 fold cross validation. Success rate of J48 algorithm was 77.74%. In this way it will be helpful in identifying weak students so that teachers can help them before failure.
Mrinal Pandey, Vivek Kumar Sharma [4] (Jan. 2013) compared J48, Simple Cart, Reptree and NB tree algorithms for predicting performance of engineering students. They have taken data of 524 students for 10 fold cross validation and 178 students for percentage split method. It has been found that J48 decision tree algorithm achieved higher accuracy 80.15% using 10 fold cross validation method. By using percentage split method higher accuracy 82.58% is achieved by J48 algorithm. From this comparison it has been found that J48 performs best than other algorithms in both the cases. J48 decision tree algorithm will be useful for teachers in improving performance of weak students.
Anuja Priyam, Abhijeet, Rahul Gupta, Anju Rathee, and Saurabh Srivastava [12] (June 2013) compared ID3, C4.5 and CART decision tree algorithms on the basis of students data. Evaluation was performed using 10 fold cross validation method. It shows that the CART algorithm has higher accuracy 56.2500%. Model’s True Positive rate for class Fail is high 0.786 for ID3 and C4.5 which means it will successfully identify the fail students. So this model will help teachers in reducing failure rates.
Ramanathan L, Saksham Dhanda, Suresh Kumar D [14] (June-July 2013) performed analysis on 50 students data. They were used nave bayes, J48 and proposed algorithm (Weighted ID3) for evaluation. It shows that WID3 has higher accuracy 93% than J48 and nave bayes. In future you can made user friendly software using WID3 which will be very helpful for teachers.
Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao [7] (September 2013) performed analysis on data set of 182 students using ID3 and C4.5 decision tree algorithms. When they performed bulk evaluation on data set of 173 students both algorithms have same accuracy of 75.145% and when they performed singular evaluation on data set of 9 students then both algorithms have accuracy 77.778%. For 182 students accuracy was approximately 75.257.
Mrs. M.S. Mythili, Dr. A.R.Mohamed Shanavas [9] ( Jan. 2014) compared J48, Random Forest, Multilayer Perception, IBI and decision tree algorithms using data set of 260 students from various schools. 10 fold cross validation was chosen for evaluation. It has been found that Random Forest has higher accuracy 89.23% and less execution time amongst all other algorithms. This study will be helpful for educational institutions.
Jyoti Namdeo, Naveenkumar Jayakumar [13] (Feb. 2014) collected 51 students data from MCA 2007 batch. Decision tree algorithms used in evaluation were Nave Bayes, Multilayer Perception, J48 and Random Forest. These algorithms were trained on 2007 batch data and tested on 2008 batch data. Evaluation was performed using training, cross validation, percentage split and test on 2008 data. After testing on 2008 data it has been found that nave bayes has higher accuracy 31.57% amongst other algorithms but this accuracy is not according to requirement.
Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad [8] (September 2014) conducted analysis on 399 records of students using nave bayes, rule based and J48 decision tree algorithm. They have used cross validation and percentage split method for evaluation. In cross validation 3, 5, 10 fold cross validation was performed and in percentage split method training: testing 10:90, 20:80, 30:70, 40:60, 50:50, 40:60, 30:70, 20:80, 10:90 percentage split were used. After comparison of 3 classification algorithms it has been found that rule based and J48 decision tree algorithm has higher accuracy 68.8%.
III. COMPARATIVE STUDY OF SURVEY
- Comparison of survey work based on different parameters
Paper Name |
Year of Publication |
Size of Data Set (No. of students) |
Algorithms Used |
Test Options Used |
Algorithm with Higher Accuracy |
Accuracy (in %) of Algorithm |
Performance Prediction of Engineering Students using Decision Trees |
Dec. 2011 |
346 |
J48 |
Cross Validation |
J48 |
60.46% |
Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification |
2012 |
90 |
ID3 C4.5 CART |
Cross Validation |
C4.5 |
67.7778% |
Use of Data Mining in Education Sector |
2012 |
1892 |
J48 |
Cross Validation |
J48 |
77.74% |
A Decision Tree Algorithm Pertaining to the Student Performance Analysis and Prediction |
Jan. 2013 |
524 |
J48 Simple cart Reptree NB tree |
Cross Validation |
J48 |
80.15% |
178 |
J48 Simple cart Reptree NB tree |
Percentage Split |
J48 |
82.58% |
||
Comparative Analysis of Decision Tree Classification Algorithms |
June 2013 |
____________ |
ID3 C4.5 CART |
Cross Validation |
CART |
56.2500% |
Predicting Students’ Performance using Modified ID3 Algorithm |
June-July 2013 |
50 |
Nave bayes J48 Weighted ID3 |
____________ |
Weighted ID3 |
93% |
Predicting Students Performance using ID3 and C4.5 Classification Algorithms |
September 2013 |
173 |
ID3 C4.5 for bulk evaluation |
Cross Validation |
ID3 C4.5 |
75.145% |
9 |
ID3 C4.5 for singular evaluation |
Cross Validation |
ID3 C4.5 |
77.778% |
||
An Analysis of students’ performance using classification algorithms |
Jan. 2014 |
260 |
J48 Random Forest Multilayer Perception IBI |
Cross Validation |
Random Forest |
89.23% |
Predicting Students Performance Using Data Mining Technique with Rough Set Theory Concepts |
Feb. 2014 |
51 |
J48 Random Forest Multilayer Perception Nave Bayes |
Training Cross Validation Percentage Split Test |
Nave Bayes |
31.57% |
First Semester Computer Science Students’ Academic Performances Analysis by Using Data Mining Classification Algorithms |
September 2014 |
399 |
Nave Bayes J48 Rule Based |
Cross Validation Percentage Split |
J48 |
68.8% |
IV. CONCLUSION
Educational data mining’s (EDM) importance is increasing day by day as the student’s performance prediction and analysis requirements are increasing for improvement of student’s academic performance. As given above various authors have implemented different decision tree algorithms: J48, random forest, multilayer perception, nave bayes, rule based, IBI, reptree, NB tree and CART using different data sets. Some authors performed comparison of algorithms to find out the best algorithm from them on the basis of accuracy. The survey done in this paper shows that most probably J48/C4.5 decision tree algorithm is considered best algorithm in terms of accuracy for different data sets. So it is clear from survey that J48 performs well for any size of data set. This is the reason behind wide use of J48 algorithm amongst all decision tree algorithms.
Survey done in the section II will be helpful to various researchers that are working in the field of student’s performance prediction and analysis using decision tree algorithms.
V. FUTURE WORK
For growth of any educational institute, student’s academic performance is main contributor. If students perform well academically then institution growth rate goes high. It is necessary in these days to focus on the student’s results so there is a wide scope in this field. To increase student’s performance, student performance prediction and analysis is used. For this purpose decision tree algorithms are used mainly. Various researchers have done lot of research in this field by performing evaluation using single algorithm or by comparing three or four algorithms.
In future researchers can enhance the research by comparing large number of algorithms using large size data sets. So there is a wide scope for researchers in this field.
ACKNOWLEDGMENT
First of all I express my sincerest debt of gratitude to the Almighty God who always supports me in my endeavors.
I would like to thank Prof. Neena Madan for their encouragement and support. Then, I would like to thank my family and my friends. I am thankful to all those who helped me in one way or the other at every stage of my work.
REFERENCES
- Nikita Jain, Vishal Srivastava, “Data mining techniques: A survey paperâ€, IJRET: International Journal of Research in Engineering and Technology, Volume: 02 Issue: 11, Nov-2013.
- Mrs. M.S. Mythili, Dr. A.R.Mohamed Shanavas, “An Analysis of students’ performance using classification algorithmsâ€, IOSR Journal of Computer Engineering, Volume 16, Issue 1, January 2014.
- Dr. Mohd Maqsood Ali, “Role of data mining in education sectorâ€, International Journal of Computer Science and Mobile Computing Vol. 2, Issue. 4, April 2013.
- Mrinal Pandey, Vivek Kumar Sharma, “A Decision Tree Algorithm Pertaining to the Student Performance Analysis and Predictionâ€, International Journal of Computer Applications Volume 61, No.13, January 2013.
- Brijesh Kumar Baradwaj, Saurabh Pal, “Mining Educational Data to Analyze Students Performanceâ€, International Journal of Advanced Computer Science and Applications, Vol. 2, No. 6, 2011.
- Surjeet Kumar Yadav, Saurabh Pal, “Data Mining: A Prediction for Performance Improvement of Engineering Students using Classificationâ€, World of Computer Science and Information Technology Journal Vol. 2, No. 2, 2012.
- Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao, “Predicting Students Performance using ID3 and C4.5 Classification Algorithmsâ€, International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013.
- Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad, “First Semester Computer Science Students’ Academic Performances Analysis by Using Data Mining Classification Algorithmsâ€, Proceeding of the International Conference on Artificial Intelligence and Computer Science(AICS 2014), September 2014.
- Mrs. M.S. Mythili, Dr. A.R.Mohamed Shanavas, “An Analysis of students’ performance using classification algorithmsâ€, IOSR Journal of Computer Engineering (IOSR-JCE) Volume 16, Issue 1, Jan. 2014.
- Manpreet Singh Bhullar, Amritpal Kaur, “Use of Data Mining in Education Sectorâ€, Proceedings of the World Congress on Engineering and Computer Science (WCECS), San Francisco, USA, October 2012.
- R. R. Kabra, R. S. Bichkar, “Performance Prediction of Engineering Students using Decision Treesâ€, International Journal of Computer Applications Volume 36, No.11, December 2011.
- Anuja Priyam, Abhijeet, Rahul Gupta, Anju Rathee, and Saurabh Srivastava, “Comparative Analysis of Decision Tree Classification Algorithmsâ€, International Journal of Current Engineering and Technology, Volume 3, No .2, June 2013.
- [13] Jyoti Namdeo, Naveenkumar Jayakumar, “Predicting Students Performance Using Data Mining Technique with Rough Set Theory Conceptsâ€, International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 2, February 2014.
- [14] Ramanathan L, Saksham Dhanda, Suresh Kumar D, “Predicting Students’ Performance using Modified ID3 Algorithmâ€, International Journal of Engineering and Technology (IJET) Volume 5, No. 3, Jun-Jul 2013.