OPTICAL CHARACTER RECOGNITION IMPLEMENTATION FOR ADMISSION SYSTEM IN UNIVERSITAS PERTAMINA

Meredita Susanty
Herminarto Nugroho

Abstract


Starting in 2019, prospective college students require to take Computer-Based Writing Exam (UTBK) to register for the state universities in Indonesia. Some private university also adopts this exam as a requirement for admission. One of the private university that adopts it is Universitas Pertamina. UTBK consist of several exam group score printed in a digital certificate in image format (jpg). The university admission team must download the UTBK certificate that has uploaded by applicants, read and record the score for each exam group then make a calculation to make a decision whether the applicant is accepted in a certain school in the university. This research proposes to replace the manual process performed by the admission team with optical character recognition (OCR). The OCR engine will extract text from an image. Some information from the extracted text is calculated to provide an acceptance decision. The research shows that OCR cannot accurately convert text from an image when there is a grayscale background in the image. However, image preprocessing can improve overall accuracy. Lastly, Tesseract performs better in converting black text with white-background than white text with a black background.

Keywords


artificial intelligence; computer vision; pattern recognition; machine learning; optical character recognition

Teks Lengkap:

PDF

Referensi


[1] S. Mori, H. Nishida, and H. Yamada, Optical character recognition. J. Wiley, 1999.

[2] J. M. White and G. D. Rohrer, “Image Thresholding for Optical Character Recognition and Other Applications Requiring Character Image Extraction,” IBM J. Res. Dev., vol. 27, no. 4, pp. 400–411, Jul. 1983.

[3] M. T. Qadri and M. Asif, “Automatic Number Plate Recognition System for Vehicle Identification Using Optical Character Recognition,” in 2009 International Conference on Education Technology and Computer, 2009, pp. 335–338.

[4] N. Arica and F. T. Yarman-Vural, “Optical character recognition for cursive handwriting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 6, pp. 801–813, Jun. 2002.

[5] “pytesseract 0.2.6,” 2018. [Online]. Available: https://pypi.org/project/pytesseract/. [Accessed: 30-May-2019].

[6] “Tesseract OCR – opensource.google.com.” [Online]. Available: https://opensource.google.com/projects/tesseract. [Accessed: 30-May-2019].

[7] “OCR - Community Help Wiki.” [Online]. Available: https://help.ubuntu.com/community/OCR. [Accessed: 29-May-2019].

[8] “Sertifikat Nilai Hasil UTBK 2019/2020 | SOAL UTBK SBMPTN 2019 DAN PEMBAHASAN [PREDIKSI].” [Online]. Available: https://www.e-sbmptn.com/2019/04/sertifikat-nilai-hasil-utbk.html. [Accessed: 30-May-2019].

[9] Shreeshrii, “Tesseract - Background and Limitations,” 2018. [Online]. Available: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract2#background-and-limitations. [Accessed: 30-May-2019].

[10] C. Cahyono, G. Prasetyo, A. Yoza, and R. Hani, “Multithresholding In Grayscale Image Using Pea Finding Approach And Hierarchical Cluster Analysis,” J. Ilmu Komput. dan Inf., vol. 7, no. 2, p. 83, Aug. 2014.

[11] R. C. Gonzalez and R. E. (Richard E. Woods, Digital image processing. Prentice Hall, 2008.




DOI: https://doi.org/10.24176/simet.v11i1.3838

Article Metrics

Abstract views : 407| PDF views : 436

Refbacks

  • Saat ini tidak ada refbacks.


free hit counter View My Stats

Indexed by:

Dimensions logo

 

Flag Counter

Creative Commons License
Simetris : Jurnal Teknik Mesin, Elektro dan Ilmu Komputer is licensed under a Creative Commons Attribution 4.0 International License.

Dedicated to: