1 of 28

SCRAPERS

1

Scrapers

Team Members:

Guankun Li : guankun@uchicago.edu

Jiazheng Li : jiazheng123@uchicago.edu

Tianyue Cong : tianyuec@uchicago.edu

Weiwu Yan : weiwuyan@uchicago.edu

GitHub: https://github.com/macs30122-winter24/final-project-scrapers.git

The Impact of National Science Foundation (NSF)�Funding on Academic Output of Scholars

2 of 28

SCRAPERS

2

Background

Funding facilitates the acquisition of resources necessary for research. This project examines the pivotal role of the National Science Foundation (NSF) funding in shaping the academic output of scholars within the realms of Behavioral and Cognitive Sciences.

The focus on Behavioral and Cognitive Sciences is deliberate and strategic. This domain stands at the crossroads of natural sciences and social sciences. The inherent diversity within this field, with its clear subdivision into various subfields, provides a unique opportunity to analyze the impact of funding across different areas of study.

3 of 28

SCRAPERS

3

Social Science Significance

In the contemporary landscape of scientific development, research funding has emerged as one of the most crucial public resources. Research funding, by allocating financial resources to research projects, institutions, and researchers, is instrumental in catalyzing innovation, facilitating growth within academic disciplines, advancing the careers of scientists, and contributing to the broader socio-economic development (Lane, 2009). The contribution of research funding to knowledge production is substantial.

For instance, between 2009 and 2010, out of 2,060,838 research papers indexed in the Science Citation Index (SCI), 1,165,276 papers (56.54\%) received at least one form of research funding support (Mahesh, 2012). This underscores the significance of evaluating the outcomes and impact of research funding.

4 of 28

SCRAPERS

4

Research Questions

RQ1

Does NSF funding increase the quantity or quality of academic output of scholars?

RQ2

Which subfields within Behavioral and Cognitive Sciences are more likely to be awarded funding?

RQ3

Which subfields within Behavioral and Cognitive Sciences are more significantly affected by NSF funding?

5 of 28

  • National Science Foundation: downloaded award records from 2011 to 2020 from the official website

SCRAPERS

5

Data Collection

  • Google Scholar: scraped academic backgrounds and achievements based on names and emails of award receivers
  • Academic Journal Sites: dynamically scraped abstracts of papers published within a three-year period both before and after the funding based on Google Scholar pages

6 of 28

Descriptive Statistics

SCRAPERS

6

Variable

Obs

Mean

SD

Min

Median

Max

year

20640

2015.5

2.872

2011

2015

2020

award amount

20640

1.30e+05

2.70e+05

0

0

3.24e+06

ln(award amount)

20640

5.280

5.974

0

0

14.990

award(binary)

20640

0.444

0.497

0

0

1

total citations

20640

8797.634

17447.221

1

3902.5

3.62e+05

h-index

20640

33.613

22.731

1

29

192

citations

20113

476.106

965.345

1

210

24631

publications

20640

5.472

10.209

0

4

1129

citations of top 3 cited papers

20004

84.654

305.650

0

33

22244

Table 1:Descriptive Statistics

7 of 28

Award Distribution by Amount

SCRAPERS

7

Figure 1 displays the distribution of NSF funding award amounts, which is observed to be highly skewed. Therefore, in subsequent regressions, we have log-transformed the award amounts to address this skewness.

Figure 1 Distribution of Award Amount

8 of 28

Average Award Amount by Year

SCRAPERS

8

Figure 2 presents the trend of cumulative NSF funding award amounts over time. It is evident that there is a clear upward trend in funding as the years progress. This observation justifies our model specification that incorporates controlling for time-fixed effects.

Figure 2 Variation of Award Amount by Year

9 of 28

Scatter Plot

SCRAPERS

9

Figures 3 and 4 respectively depict scatter plots of NSF funding award amounts against the quantity and quality of publications, indicating a positive correlation. However, whether this relationship changes upon the inclusion of control variables requires further determination through regression analysis.

Figure 3 Scatter Plot (funding and pub quantity)

Figure 4 Scatter Plot (funding and pub quality)

10 of 28

OLS Linear Regression

SCRAPERS

10

Dependent variables

  • number of publications (quantity)
  • citations of top three most cited papers (quality)

Independent variables

  • natural log of award amount
  • award (binary encoding)

Fixed effects

  • year

Control variables

  • h-index & total citations

11 of 28

OLS Linear Regression (baseline)

SCRAPERS

11

(1)

(2)

(3)

(4)

quantity

quality

quantity

quality

ln(award amount)

0.0909***

-1.689***

(0.00997)

(0.413)

h-index

0.143***

0.258

0.144***

0.252

(0.00565)

(1.106)

(0.00564)

(1.106)

total citations

0.000522***

0.104**

0.000522***

0.104**

(0.000185)

(0.0405)

(0.000184)

(0.0405)

award(binary)

1.086***

-20.18***

(0.149)

(4.646)

constant

-0.0455

36.09*

-0.0590

36.33*

(0.136)

(20.87)

(0.125)

(20.90)

N

20113

19511

20113

19511

R2

0.129

0.118

0.129

0.118

adj. R2

0.128

0.118

0.128

0.118

Standard errors in parentheses

* p < 0.1, ** p < 0.05, *** p < 0.01

Table 2: Baseline Regression

12 of 28

OLS Linear Regression (panel data)

SCRAPERS

12

(1)

(2)

(3)

(4)

quantity

quality

quantity

quality

ln(award amount)

0.0961***

-0.0108

(0.0126)

(0.377)

h-index

0.142***

-0.0970

0.142***

-0.0951

(0.00599)

(1.129)

(0.00599)

(1.129)

total citations

0.000558***

0.114***

0.000555***

0.114***

(0.000187)

(0.0410)

(0.000187)

(0.0410)

award(binary)

1.135***

-1.280

(0.183)

(4.143)

year

controlled

controlled

controlled

controlled

constant

-0.113

83.94***

-0.134

84.00***

(0.183)

(27.70)

(0.181)

(27.70)

N

20113

19511

20113

19511

R2

0.130

0.132

0.129

0.132

adj. R2

0.129

0.131

0.129

0.131

Standard errors in parentheses

* p < 0.1, ** p < 0.05, *** p < 0.01

Table 3: Regression with Time Fixed Effect Controlled

13 of 28

Explanation and Answer to RQ 1

SCRAPERS

13

  • Table 2 presents the results of the baseline regression, revealing that funding award amounts have a significant positive impact on the quantity of scientific output but exhibit a significant negative influence on the quality of scientific output.

  • Table 3 displays the results of the regression after controlling for time-fixed effects, specifically in models (1) and (2) from Section 1. It is observed that funding award amounts have a significant positive impact on the quantity of scientific output, yet they do not have a significant effect on the quality of scientific output.

  • Thus, we are able to address Research Question 1: NSF funding significantly enhances the quantity of academic research output in the Behavioral and Cognitive Sciences. However, NSF funding does not have a significant impact on the quality of academic research output in the Behavioral and Cognitive Sciences.

14 of 28

K-means Clustering with TF-IDF

SCRAPERS

14

We employed cluster analysis to divide the field of Behavioral and Cognitive Sciences into eight subfields, as detailed in Figure 5. We have identified eight subfields within the Behavioral and Cognitive Sciences: Environmental Studies, Psychology, Cognitive Neuroscience, Phonetics, Human Biology, Linguistics, Cultural Studies, and Archaeology.

Figure 3 Cluster Result

15 of 28

Word Clouds by Cluster

SCRAPERS

15

16 of 28

Word Clouds by Cluster

SCRAPERS

16

17 of 28

Word Bar Graphs

SCRAPERS

17

18 of 28

Word Bar Graphs

SCRAPERS

18

19 of 28

Answer to RQ2

SCRAPERS

19

Next, based on the results of our clustering, we can calculate the Average Award Amounts by Cluster for each subfield, and then make comparisons. This allows us to answer RQ 2: Even within the same research division of Behavioral and Cognitive Sciences, there are clear disparities in the funding received by different subfields. Cognitive Neuroscience obtains the most funding, followed by Psychology and Environmental Studies with moderate amounts, and Human Biology, Archaeology, Phonetics, and Cultural Studies receive the least.

Figure 7 Percentage of Different Subfields

20 of 28

Answer to RQ2

SCRAPERS

20

Figure 8 Average Funding of Different Subfields

21 of 28

Regression Grouped by Subfield (RQ 3)

SCRAPERS

21

Based on the results of the clustering, we conducted grouped regressions for each subfield. Table 4 presents the grouped results. It can be observed that the funding has a significant positive impact on the number of publications across all groups. The coefficients and significant levels for Cognitive Neuroscience, Human Biology, and Environmental Studies are the largest, indicating that funding has a stronger effect within these three subfields.

22 of 28

Response to Comments

SCRAPERS

22

Q1 Is that a "real" effect or a measurement effect in RQ1?

A1 This is a measurement effect since this is not a structural model.

Q2 the reason of getting funding might be because of the previous performance

A2 We control sholars’ ability(h-index) to address this problem.

Q3 whether you accounted for the size of the academic discipline when considering awards?

A3 Yes. In Table 5 and 6 in appendix, we control the discipline fixed effect.

Q4 You chose citations of top three most cited papers to represent the quality, would the result change, if we choose another number, like 5?

A4 No. The result is fairly robust.

Q5 How do you measure quality? If it is measured by the number of references, it might be subject to a large time lag/bias - work that got published earlier has a higher chance of getting more references.

A5 We used the citations of the top three most cited papers, taking into account the large time lag/bias. This is why we control for time-fixed effects. As long as we ensure that the quality data for Author A in year X is all sourced from year X, along with the inclusion of time-fixed effects, we can ensure they are comparable.

23 of 28

Conclusion

SCRAPERS

23

  • NSF funding significantly enhances the quantity of academic research output in the Behavioral and Cognitive Sciences. However, NSF funding does not have a significant impact on the quality of academic research output in the Behavioral and Cognitive Sciences.

  • Even within the same research division of Behavioral and Cognitive Sciences, there are clear disparities in the funding received by different subfields. Linguistics obtains the most funding, followed by Psychology, Cognitive Neuroscience, Phonetics, and Cultural Studies with moderate amounts, and Human Biology, Archaeology, and Environmental Studies receive the least.

  • NSF funding has a stronger effect within Cognitive Neuroscience, Human Biology, and Environmental Studies than that within Archaeology, Psychology, Linguistics, Phonetics, and Cultural Studies

24 of 28

References

SCRAPERS

24

25 of 28

Appendix

SCRAPERS

25

Figure 9 Comparison of Average Yearly Citation

26 of 28

Appendix

SCRAPERS

26

Figure 10 Research Interests Word Cloud

27 of 28

Appendix

SCRAPERS

27

Standard errors in parentheses

* p < 0.1, ** p < 0.05, *** p < 0.01

(1)

(2)

(3)

(4)

quantity

quality

quantity

quality

ln(award)

0.102***

0.176

(0.0135)

(0.379)

h-index

0.146***

1.388***

0.146***

1.390***

(0.00733)

(0.459)

(0.00737)

(0.458)

total citations

0.000170

0.0576***

0.000172

0.0576***

(0.000222)

(0.0191)

(0.000223)

(0.0191)

award(binary)

1.206***

0.755

(0.196)

(4.084)

year

controlled

controlled

controlled

controlled

subfield_1

controlled

controlled

controlled

controlled

constant

-0.451**

39.85***

-0.483**

39.90***

(0.194)

(11.42)

(0.192)

(11.38)

N

18972

18405

18972

18405

R2

0.118

0.088

0.118

0.088

adj. R2

0.117

0.087

0.117

0.087

Table 5: Panel with K-means Clustering Subfields

28 of 28

Appendix

SCRAPERS

28

(1)

(2)

(3)

(4)

quantity

quality

quantity

quality

ln(award amount)

0.104***

0.112

(0.0131)

(0.381)

h-index

0.153***

1.337***

0.153***

1.339***

(0.00625)

(0.495)

(0.00627)

(0.493)

total citations

0.0000459

0.0581***

0.0000439

0.0581***

(0.000218)

(0.0196)

(0.000218)

(0.0196)

award(binary)

1.212***

0.404

(0.187)

(4.155)

year

controlled

controlled

controlled

controlled

subfield_2

controlled

controlled

controlled

controlled

constant

0.0777

36.48***

0.0416

36.49***

(0.423)

(12.54)

(0.417)

(12.48)

N

18972

18405

18972

18405

R2

0.117

0.087

0.117

0.087

adj. R2

0.117

0.087

0.116

0.087

Standard errors in parentheses

* p < 0.1, ** p < 0.05, *** p < 0.01

Table 6: Panel with K-means Clustering Subfields