Abstract: This presentation explores the application of Large Language Models (LLMs) in theoretical physics and computational cosmology across three key frontiers: evaluation, inference optimization, and algorithmic discovery. First, we introduce the Theoretical Physics Benchmark (TPBench) to assess LLM reasoning, demonstrating that while foundational models are advancing, research-level physics remains a critical bottleneck. To address these reasoning limits, we investigate test-time scaling techniques. We propose a novel symbolic weak-verifier that exploits the intrinsic mathematical structure of physics problems, significantly outperforming standard scaling methods. Finally, we introduce MadEvolve, an evolutionary optimization framework that transitions LLMs from solving established problems to discovering novel scientific methods. By autonomously generating and refining code, MadEvolve yields improved algorithms for complex cosmological tasks such as initial condition reconstruction and 21cm foreground mitigation. Together, these works outline a concrete pathway for leveraging LLMs to accelerate autonomous discovery in physics.