XI INTERNATIONAL CONFERENCE
“INFORMATION TECHNOLOGY AND IMPLEMENTATION” (IT&I-2024)
Towards the Information Technology for Online Citizen Services Detection and Assessment on E-Government National Portals
Andrii Kopp 1 and Oleksandr Chornenkyi 2
1 National Technical University “KhPI”�2 V.N. Karazin Kharkiv National University
Agenda and the main purpose
1. Motivation
2. Related Work
3. National Web Portals Data Preparation
The process starts with the converting data from Excel-based spreadsheet format to JSON format for further processing and storage using Python programming language. The “pandas” library is used, which allowed to load data from an Excel file. The required dataset is contained in the file “egov_data_2024.xlsx”.
The data preparation process could be formally described as the set of operations that ensure the transformation of the Excel spreadsheet into the JSON format file:
where
Dxlsx – the spreadsheet represented as the set of records;
Ddf – the data frame;
Tjson – the function used to transform the data frame into JSON
set of objects Djson;
Ddict – the Python dictionary to which Djson is deserialized;
Wjson – the writing operation to the JSON file Fjson.
4. National Web Portals Data Processing
Using the obtained data stored as the JSON document “egov_data_2024.json”, each country’s national web portal is accessed by making Hypertext Transfer Protocol (HTTP) requests.
All hyperlinks are extracted from the resulting (Hypertext Markup Language) HTML content, which is then analyzed for thematic keywords relevant to major government service areas and citizen services according to Integrated Architecture Framework for E-Government (IAFEG).
Example of UK national portal data scraping
Services | Keywords |
Taxation | tax, finance, income, money, debt, credit |
Education | education, school, study, child, training, student |
Health | health, insurance, care, sick, medical, funeral |
Immigration | immigration, citizen, travel, visa, residence, international |
Employment | employment, work, job, business, license, certification |
5. National Web Portals Data Analysis
The general process of e-government national portals data processing could be formally represented as following:
where:
Li – the set of hyperlinks extracted from the web portal page;
Pi – the set of thematic categories and hyperlinks
that, as we assume, provide the access
to corresponding citizen services;
Si – the set of detected citizen services based
on the introduced thematic categories
and keywords;
SRi – the service richness of the national portal
with e-government services.
The Microsoft Power BI is used for the further analysis of the obtained web scraping results.
Developed Python component for national portal scraping and analysis
6. Results and Discussion (1)
The stages of EGDI dataset discovery, preliminary check (to manually remove countries with not accessible or non-English language interface), and processing using the proposed technology:
Almost 55% of country records were removed from the initial dataset because of the inaccessible national portals or absence of English versions.
The remaining 87 records were processed using the proposed solution. However, only 79% of the available national portals were successfully scraped.
Failed processing: Bulgaria, Spain, Iran (Islamic Republic of), Jordan, Lithuania, Eritrea, Ghana, Ireland, Israel, Kyrgyzstan, Malta, Namibia, Philippines, New Zealand, Morocco, Palau, Thailand, and Zimbabwe.
Stage | Countries | Remark |
Discovery | 193 | The initial EGDI [7] list consists of the 193 countries |
Preliminary check | 87 | Removed 106 records describing countries, which national portals are either not accessible or do not provide English versions |
Processing | 69 | Failed to process national web portals of 18 countries |
6. Results and Discussion (2)
The Power BI dashboard consolidates information about countries, Online Service Index (OSI) measures of these countries, as well as introduced measures.
7. Conclusions
Future work
THANK YOU FOR YOUR ATTENTION!