SCRAPERS
1
Scrapers
Team Members:
Guankun Li : guankun@uchicago.edu
Jiazheng Li : jiazheng123@uchicago.edu
Tianyue Cong : tianyuec@uchicago.edu
Weiwu Yan : weiwuyan@uchicago.edu
GitHub: https://github.com/macs30122-winter24/final-project-scrapers.git
The Impact of National Science Foundation (NSF)�Funding on Academic Output of Scholars
SCRAPERS
2
Background
Funding facilitates the acquisition of resources necessary for research. This project examines the pivotal role of the National Science Foundation (NSF) funding in shaping the academic output of scholars within the realms of Behavioral and Cognitive Sciences.
The focus on Behavioral and Cognitive Sciences is deliberate and strategic. This domain stands at the crossroads of natural sciences and social sciences. The inherent diversity within this field, with its clear subdivision into various subfields, provides a unique opportunity to analyze the impact of funding across different areas of study.
SCRAPERS
3
Social Science Significance
In the contemporary landscape of scientific development, research funding has emerged as one of the most crucial public resources. Research funding, by allocating financial resources to research projects, institutions, and researchers, is instrumental in catalyzing innovation, facilitating growth within academic disciplines, advancing the careers of scientists, and contributing to the broader socio-economic development (Lane, 2009). The contribution of research funding to knowledge production is substantial.
For instance, between 2009 and 2010, out of 2,060,838 research papers indexed in the Science Citation Index (SCI), 1,165,276 papers (56.54\%) received at least one form of research funding support (Mahesh, 2012). This underscores the significance of evaluating the outcomes and impact of research funding.
SCRAPERS
4
Research Questions
RQ1
Does NSF funding increase the quantity or quality of academic output of scholars?
RQ2
Which subfields within Behavioral and Cognitive Sciences are more likely to be awarded funding?
RQ3
Which subfields within Behavioral and Cognitive Sciences are more significantly affected by NSF funding?
SCRAPERS
5
Data Collection
Descriptive Statistics
SCRAPERS
6
Variable | Obs | Mean | SD | Min | Median | Max |
year | 20640 | 2015.5 | 2.872 | 2011 | 2015 | 2020 |
award amount | 20640 | 1.30e+05 | 2.70e+05 | 0 | 0 | 3.24e+06 |
ln(award amount) | 20640 | 5.280 | 5.974 | 0 | 0 | 14.990 |
award(binary) | 20640 | 0.444 | 0.497 | 0 | 0 | 1 |
total citations | 20640 | 8797.634 | 17447.221 | 1 | 3902.5 | 3.62e+05 |
h-index | 20640 | 33.613 | 22.731 | 1 | 29 | 192 |
citations | 20113 | 476.106 | 965.345 | 1 | 210 | 24631 |
publications | 20640 | 5.472 | 10.209 | 0 | 4 | 1129 |
citations of top 3 cited papers | 20004 | 84.654 | 305.650 | 0 | 33 | 22244 |
Table 1:Descriptive Statistics
Award Distribution by Amount
SCRAPERS
7
Figure 1 displays the distribution of NSF funding award amounts, which is observed to be highly skewed. Therefore, in subsequent regressions, we have log-transformed the award amounts to address this skewness.
Figure 1 Distribution of Award Amount
Average Award Amount by Year
SCRAPERS
8
Figure 2 presents the trend of cumulative NSF funding award amounts over time. It is evident that there is a clear upward trend in funding as the years progress. This observation justifies our model specification that incorporates controlling for time-fixed effects.
Figure 2 Variation of Award Amount by Year
Scatter Plot
SCRAPERS
9
Figures 3 and 4 respectively depict scatter plots of NSF funding award amounts against the quantity and quality of publications, indicating a positive correlation. However, whether this relationship changes upon the inclusion of control variables requires further determination through regression analysis.
Figure 3 Scatter Plot (funding and pub quantity)
Figure 4 Scatter Plot (funding and pub quality)
OLS Linear Regression
SCRAPERS
10
Dependent variables
Independent variables
Fixed effects
Control variables
OLS Linear Regression (baseline)
SCRAPERS
11
| (1) | (2) | (3) | (4) |
| quantity | quality | quantity | quality |
ln(award amount) | 0.0909*** | -1.689*** |
|
|
| (0.00997) | (0.413) |
|
|
h-index | 0.143*** | 0.258 | 0.144*** | 0.252 |
| (0.00565) | (1.106) | (0.00564) | (1.106) |
total citations | 0.000522*** | 0.104** | 0.000522*** | 0.104** |
| (0.000185) | (0.0405) | (0.000184) | (0.0405) |
award(binary) |
|
| 1.086*** | -20.18*** |
|
|
| (0.149) | (4.646) |
constant | -0.0455 | 36.09* | -0.0590 | 36.33* |
| (0.136) | (20.87) | (0.125) | (20.90) |
N | 20113 | 19511 | 20113 | 19511 |
R2 | 0.129 | 0.118 | 0.129 | 0.118 |
adj. R2 | 0.128 | 0.118 | 0.128 | 0.118 |
Standard errors in parentheses
* p < 0.1, ** p < 0.05, *** p < 0.01
Table 2: Baseline Regression
OLS Linear Regression (panel data)
SCRAPERS
12
| (1) | (2) | (3) | (4) |
| quantity | quality | quantity | quality |
ln(award amount) | 0.0961*** | -0.0108 |
|
|
| (0.0126) | (0.377) |
|
|
h-index | 0.142*** | -0.0970 | 0.142*** | -0.0951 |
| (0.00599) | (1.129) | (0.00599) | (1.129) |
total citations | 0.000558*** | 0.114*** | 0.000555*** | 0.114*** |
| (0.000187) | (0.0410) | (0.000187) | (0.0410) |
award(binary) |
|
| 1.135*** | -1.280 |
|
|
| (0.183) | (4.143) |
year | controlled | controlled | controlled | controlled |
constant | -0.113 | 83.94*** | -0.134 | 84.00*** |
| (0.183) | (27.70) | (0.181) | (27.70) |
N | 20113 | 19511 | 20113 | 19511 |
R2 | 0.130 | 0.132 | 0.129 | 0.132 |
adj. R2 | 0.129 | 0.131 | 0.129 | 0.131 |
Standard errors in parentheses
* p < 0.1, ** p < 0.05, *** p < 0.01
Table 3: Regression with Time Fixed Effect Controlled
Explanation and Answer to RQ 1
SCRAPERS
13
K-means Clustering with TF-IDF
SCRAPERS
14
We employed cluster analysis to divide the field of Behavioral and Cognitive Sciences into eight subfields, as detailed in Figure 5. We have identified eight subfields within the Behavioral and Cognitive Sciences: Environmental Studies, Psychology, Cognitive Neuroscience, Phonetics, Human Biology, Linguistics, Cultural Studies, and Archaeology.
Figure 3 Cluster Result
Word Clouds by Cluster
SCRAPERS
15
Word Clouds by Cluster
SCRAPERS
16
Word Bar Graphs
SCRAPERS
17
Word Bar Graphs
SCRAPERS
18
Answer to RQ2
SCRAPERS
19
Next, based on the results of our clustering, we can calculate the Average Award Amounts by Cluster for each subfield, and then make comparisons. This allows us to answer RQ 2: Even within the same research division of Behavioral and Cognitive Sciences, there are clear disparities in the funding received by different subfields. Cognitive Neuroscience obtains the most funding, followed by Psychology and Environmental Studies with moderate amounts, and Human Biology, Archaeology, Phonetics, and Cultural Studies receive the least.
Figure 7 Percentage of Different Subfields
Answer to RQ2
SCRAPERS
20
Figure 8 Average Funding of Different Subfields
Regression Grouped by Subfield (RQ 3)
SCRAPERS
21
Based on the results of the clustering, we conducted grouped regressions for each subfield. Table 4 presents the grouped results. It can be observed that the funding has a significant positive impact on the number of publications across all groups. The coefficients and significant levels for Cognitive Neuroscience, Human Biology, and Environmental Studies are the largest, indicating that funding has a stronger effect within these three subfields.
Response to Comments
SCRAPERS
22
Q1 Is that a "real" effect or a measurement effect in RQ1?
A1 This is a measurement effect since this is not a structural model.
Q2 the reason of getting funding might be because of the previous performance
A2 We control sholars’ ability(h-index) to address this problem.
Q3 whether you accounted for the size of the academic discipline when considering awards?
A3 Yes. In Table 5 and 6 in appendix, we control the discipline fixed effect.
Q4 You chose citations of top three most cited papers to represent the quality, would the result change, if we choose another number, like 5?
A4 No. The result is fairly robust.
Q5 How do you measure quality? If it is measured by the number of references, it might be subject to a large time lag/bias - work that got published earlier has a higher chance of getting more references.
A5 We used the citations of the top three most cited papers, taking into account the large time lag/bias. This is why we control for time-fixed effects. As long as we ensure that the quality data for Author A in year X is all sourced from year X, along with the inclusion of time-fixed effects, we can ensure they are comparable.
Conclusion
SCRAPERS
23
References
SCRAPERS
24
Appendix
SCRAPERS
25
Figure 9 Comparison of Average Yearly Citation
Appendix
SCRAPERS
26
Figure 10 Research Interests Word Cloud
Appendix
SCRAPERS
27
Standard errors in parentheses
* p < 0.1, ** p < 0.05, *** p < 0.01
| (1) | (2) | (3) | (4) |
| quantity | quality | quantity | quality |
ln(award) | 0.102*** | 0.176 |
|
|
| (0.0135) | (0.379) |
|
|
h-index | 0.146*** | 1.388*** | 0.146*** | 1.390*** |
| (0.00733) | (0.459) | (0.00737) | (0.458) |
total citations | 0.000170 | 0.0576*** | 0.000172 | 0.0576*** |
| (0.000222) | (0.0191) | (0.000223) | (0.0191) |
award(binary) |
|
| 1.206*** | 0.755 |
|
|
| (0.196) | (4.084) |
year | controlled | controlled | controlled | controlled |
subfield_1 | controlled | controlled | controlled | controlled |
constant | -0.451** | 39.85*** | -0.483** | 39.90*** |
| (0.194) | (11.42) | (0.192) | (11.38) |
N | 18972 | 18405 | 18972 | 18405 |
R2 | 0.118 | 0.088 | 0.118 | 0.088 |
adj. R2 | 0.117 | 0.087 | 0.117 | 0.087 |
Table 5: Panel with K-means Clustering Subfields
Appendix
SCRAPERS
28
| (1) | (2) | (3) | (4) |
| quantity | quality | quantity | quality |
ln(award amount) | 0.104*** | 0.112 |
|
|
| (0.0131) | (0.381) |
|
|
h-index | 0.153*** | 1.337*** | 0.153*** | 1.339*** |
| (0.00625) | (0.495) | (0.00627) | (0.493) |
total citations | 0.0000459 | 0.0581*** | 0.0000439 | 0.0581*** |
| (0.000218) | (0.0196) | (0.000218) | (0.0196) |
award(binary) |
|
| 1.212*** | 0.404 |
|
|
| (0.187) | (4.155) |
year | controlled | controlled | controlled | controlled |
subfield_2 | controlled | controlled | controlled | controlled |
constant | 0.0777 | 36.48*** | 0.0416 | 36.49*** |
| (0.423) | (12.54) | (0.417) | (12.48) |
N | 18972 | 18405 | 18972 | 18405 |
R2 | 0.117 | 0.087 | 0.117 | 0.087 |
adj. R2 | 0.117 | 0.087 | 0.116 | 0.087 |
Standard errors in parentheses
* p < 0.1, ** p < 0.05, *** p < 0.01
Table 6: Panel with K-means Clustering Subfields