BrainBench - LLMs Evaluation
Daniella Seum and Orion Powers
Faculty Advisor: Dr. Khaled Slhoub, Dept. of Electrical Engineering and Computer Science,
Florida Institute of Technology
LOCAL LLMS
MOTIVATION
METHODS
METRICS
As large language models become more widely used, there is a growing need for reliable and consistent methods to evaluate their performance. Existing benchmarks often lack transparency, consistency across runs, or the ability to handle varied answer formats. BrainBench was developed to address these gaps by providing an automated, reproducible evaluation framework that enables fair comparison of models and deeper insight into their strengths and limitations.
DISCUSSION
SYSTEM ARCHITECTURE
RESULTS
Correctness | Extracted answer matching the expected solution |
Latency | Time breakdowns from request to response |
Token Usage | Input/output token counts, generation, and processing speed |
Hardware | CPU/GPU utilization, power draw, and RAM/VRAM usage |
Run Summary | Aggregate accuracy, question counts, and total and average processing time |
FIT LOGO
INSTRUCTIONS
Your Poster file should be named as follows:
SHOWCASE_SPRING2024_POSTER_CAPSTONE MAJOR_YOURTEAMNAME*
(Example: SHOWCASE_SPRING2024_POSTER_ME_SUNNUCLEAR)
Note: The project name should be exactly as the registered project name.
*If this is an individual project, please place the title of your project instead of the team name
Please follow all instructions above.
The submission may be rejected if the formatting guidelines are violated or the file is not properly named.
**delete this text box**