The image you use should be a high quality JPG or PNG at either 3840×2160, 2560×1440, 1920×1080, 1280×720, 854×480, 640×360, or 426×240.
I know that, after your original is re-encoded by YouTube, audio will be 192 kb/s at 1080 and 720, and drop to 64 kb/s beneath that. I don’t know if higher resolutions result in better bitrates.
Use a standard audio bitrate above 192 kb/s (if that’s what you’re aiming for). I suggest 384 kb/s. With every encode you should start with something larger than the result.
As far as I know, the only program that allows you to use a single image for the video stream is TMPGEnc. You can also use a program like Vegas or Premiere to create and merge audio and video streams.
Programs that I know won’t help are Handbrake, XMedia Recode, Avidemux, and Freemake, because they only accept videos as video input.