1 of 1

[1] H. Döring and P. Manow, "Is Proportional Representation More Favourable to the Left? Electoral Rules and Their Impact on Elections, Parliaments and the Formation of Cabinets," British Journal of Political Science.

[2] S. Sugiyarto, J. Eliyanto, N. Irsalinda, Z. Putri and M. Fitrianawat, "A Fuzzy Logic in Election Sentiment Analysis: Comparison Between Fuzzy Naïve Bayes and Fuzzy Sentiment using CNN," JTAM.

[3] J. M. Carey and S. Hix, "The Electoral Sweet Spot: Low-Magnitude Proportional Electoral Systems," American journal of political science.

[4] L. Aguiar-Conraria, P. C. Magalhães and M. J. Soares, "The nationalization of electoral cycles in the United States: a wavelet analysis," Public choice.

[5] Zolghadr, Niaki, S. A. A., & Niaki, S. T. A. “Modeling and forecasting US presidential election using learning algorithms,” Journal of Industrial Engineering International.

[6] Singh, Sawhney, R. S., & Kahlon, K. S. “Forecasting the 2016 US Presidential Elections Using Sentiment Analysis,” Digital Nations – Smart Cities, Innovation, and Sustainability.

[7] M. M. Skoric, J. Liu, and K. Jaidka, “Electoral and Public Opinion Forecasts with Social Media Data: A Meta-Analysis”.

We considered utilizing a Neural network model using tensorflow libraries but decided against using it due to the poor interpretability of NN models. The Random forest model was the best model for our application as it had better interpretability and would also give us data on which exact census demographics were highly related to the ideological leanings within a county.

Progressive-Conservative Tendencies of Electoral Systems in US localities

Project Advisor: Fabio Di Troia

Introduction

Machine learning has gained recent popularity due to no its knack of finding details that are easily missed by humans. It has brought about promising results when analyzing large quantities of data or data that has variations on extremely minute scales. Electoral analysis using real time data was one of the first mainstream applications of Machine Learning algorithms. We noticed that the experiments conducted, and studies done so far have been limited to electoral analyses. Virtually no analysis that we have come across so far deals with studying the movement of the overton window within an electoral system or with trajectories of political directions to and from ideologies.

Methodology

Analysis and Results

Summary/Conclusions

Key References

Acknowledgements

Our Project was able to successfully quantify the connection between census demographics and ideological leanings. This is important as it can help us make informed and collective decisions when it comes to casting our votes but can also help political factions reach out and better serve neglected age groups. The models can be further improved upon by adding more data to them in terms of census years as well as more political figures with their labels and by establishing a weightage for positions on an ideological scale.

We extend our heartfelt gratitude to our professors Prof. Fabio Di Troia and Dr. William T Armaline for their unwavering guidance and support to our project. We also acknowledge the support of our institution for providing resources. We sincerely appreciate the combined efforts of our team members in the project. Each member's dedication, collaboration, and unique skills played a crucial role in our success. We are sincerely grateful for the opportunity to work alongside such exceptional individuals, and we look forward to future collaborations.

Our research focuses on the movements of the politics within a system with shifting age demographics. This is important to analyze since a political trajectory too extreme in one direction can destabilize a system and make it fall apart; or it could slowly creep into dangerous states such as totalitarianism, fascism, etc. Using age related data helps us observe and document changes occurring in a system that may correlate to an aging population.

Baseline Approach

The project aims to create an election prediction model utilizing logistic regression and large language processing methods. The project uses a logistic regression classifier trained on a subset of their data to establish a baseline for the model. The goal variable was the election result, and the candidate's progressive and conservative tendencies were employed as characteristics. The model shall create labels for the political opinions that are associated with a vote.

Label Generation model

This study represents the administration of the powerful election results prediction model that employs several advanced methodologies. Neural networks, a machine learning algorithm that has the ability to mimic the working of the Human brain by investigating the association in the dataset has been employed, which focuses on the key functionality of the project, predicting election results based upon ideological representation rather than statistical knowledge. For training the neural network, we use a correlated dataset that contains information regarding voters’ persona to grasp an idea of who is voting, the demographic details to study its impact over voters’ and the political party’s ideological affiliation this data is correlated to the county and candidate information, respectively. To generate these ideological labels, which denote the political beliefs of a representative or a candidate, a large language model, ChatGPT, developed by OpenAI has been deployed over Google collab.

Our model’s performance metrics were far from ideal but were successful in quantifying and highlighting the influence of age demographics in someone’s ideological leaning. The random forest classifier gave us a 30% accuracy metric but that was largely due to the limited labels and candidates captured in our dataset. 30% doesn’t look like much but it is promising when we consider that the model, in the state that it was run, was treating all ideological labels as independent and unrelated values. The results could be developed into something much more useful if we used techniques such as embeddings or if we establish a weightage for each label along a political-ideological scale. Another factor that could improve the performance of our model would be the use of more data from election cycles and the incorporation of state-level election data.

Using the textual information linked with each candidate, such as speeches, interviews, and social media posts, NLP is utilized to extract pertinent traits. These characteristics would be used to train more complex classifiers, like neural networks or ensemble models, together with the candidate's progressive and conservative tendencies. The objective is to better anticipate election results by gathering more detailed information about each candidate's political views.

Methodology

Classification model

We focused on 2 approaches to classify our data. Our baseline implementation utilized a multi-class logistic regression model. In order to better quantify and highlight the correlations between our data points, the other model that we utilized was a random forest classifier model that is a part of the scikit-learn library.

Once the LLM model’s generated labels were merged with their respective census statistics for the relevant time period, the model was split into train and test data. A validation dataset is good practice but was not used in our case as our dataset was too small to reap the benefits of it. The data that was finally processed by both our classification models was tested using the test data. The test split contained about 33% of the values from the main dataset. The features used for training were the age-percentage values for age groups 20 and above, as provided in the census data. We did not utilize the data below 20 years of age as extracting the percentage of 18-year-old individuals from the census data was beyond the scope of the project.

It is essential to note that the project challenged the features to be integrated to the random forest model. The importance score is generated by the algorithm that reads each input feature and associates it with its respective column in the data frame. The feature importance is used to represent the contribution of each individual feature to the overall output of the prediction model in various metrics.

Computer Engineering Department

Shiva Prasad, Guru Karthik (MS Software Engineering)

Talluri, Hema Aishwarya (MS Computer Engineering)

Uppalapati, Prerana (MS Software Engineering)

Mansahia, Shahbaz (MS Software Engineering)

This project is meant to explore novel applications of statistics and machine learning technologies that were not possible before. Tools such as LLMs enabled us to expand on our expertise. This application of LLMs combined with a classifier model was not found implemented anywhere as a part of our literature review studies. Such an application can potentially help us determine the health of a political system’s democratic apparatus. This project will also aid political candidates and parties with their campaigns as they have an insight into the future results, which can be used to design strategies and make data-driven decisions to reach out to the public better and help serve their constituents better while reducing age-based demographic neglect.