Bone conduction headphones have gained popularity due to their comfort and versatility, as they transmit sound via vibrations through the user's skull bones rather than the eardrums. However, the transmitted audio may contain sensitive information, posing serious privacy risks if intercepted by unauthorized parties. While prior research has explored acoustic eavesdropping attacks via side channels such as millimeter wave (mmWave) radar, existing methods remain constrained by limited generalizability and degraded reconstruction quality. In this paper, we propose EchoLLM, the first mmWave-based eavesdropping attack specifically targeting the semantic content of audio transmitted through a victim's bone conduction headphones. EchoLLM exploits the fact that audio signals in bone conduction headphones are transmitted as mechanical vibrations, which can be captured as a side channel. The attack is designed to be context-aware, target-aware, low-cost, and robust. To improve automatic speech recognition (ASR) under weak mmWave signals, EchoLLM introduces a novel multi-modal speech recognition model that leverages the victim's own speech as contextual input for better ASR accuracy. To provide higher-quality input to the multi-modal model, EchoLLM incorporates signal enhancement techniques such as target identification and background reflection reduction. Our experiments show that EchoLLM achieves a word error rate (WER) as low as 5.23%, and confirm that it can effectively reconstruct the content from the audio transmitted through a targeted bone conduction headphone under realistic scenarios.