Personal breakthrough: “speaking jpg”

Personal breakthrough: “speaking jpg”

IMPORTANT: Sadly this site ‘processes’ my solution into a common movie with all drawbacks 🙁 — You can try a high quality like 4K but i fear the benefit of what i did is not really visible. My own ‘processing’ was the best possible, so i can not really show here how it would look like 🙁
I will still leave it here together with the description how to do that Yourself…

Those who know me for a while are aware this has been a goal for more than 10 years – creating a jpg with some text spoken without needing a full movie with the necessary size. My original idea was to provide a text file only, but computers are still not all equipped with skills to read loud making me abandon that.

I tried several things and now found a good solution – it is a movie (mp4) but only uses the things really needed – a single picture and the sound file.

I had experimented with movies for a while but no matter what i tried with all kinds of software it always produced a huge file and a flickering low quality picture by converting to movie format (by usually using a low fps).

What i use now is a basic routine embedded in the system core (of many machines, if not it can be installed there). It is basically a system routine that is also used by all the fancy software, making it much easier but so far not providing the function i needed. Because of that You need to use the terminal to access it – the used command is called ffmpeg (You can find much more info in the net, which is also the place i got it from)

What it does with my parameter settings is using only one picture for the full length movie, which is far better than any try with low fps – and also much smaller in file.

The current example is using my scene of OfficeAmbiguity92 – i chose a tame scene so i can use it a bit more universally.

It is a picture with 3500×2250 pixel (usually far too big for a movie) and a 7 minute soundfile with spoken text to produce a ’speaking jpg’ with only 6,3 MB!!! — this was the crucial point, avoiding overhead and only providing the data absolutely needed to get a filesize of a (larger) picture.

This absolutely solved my problem as i can now provide a picture with text spoken in the background at a reasonable file size. That way it is now possible to tell a story while someone is watching the picture! Something i always wanted to do.

While it is a huge personal breakthrough for me it does not solve all problems i have with the format.

First the quality of synthetic speech is not too good unless You use some really expensive solution. I looked at some and found something that really creates a result that con not distinguished from a human (maybe not a good actor, but a standard person) – these systems work with artificial intelligence and seem to partially ‘understand’ what they read – at least as much to interpret things correctly. Drawback is the price and that it i subscription only – and of course the text must be uploaded somewhere to be analyzed – sadly not really an option for what i do.

I even tried to speak it myself, but quickly learned that voice over is hard work and needs knowledge and training. I estimated it would take me too long to get to an acceptable standard (and now have a high opinion of those who are able to as it looks much easier than it is)

So i still have to stick to a rather simple solution, which is cheap and for that price offers a good quality – but it does not sound good enough so listening is a joy — You get the meaning, but nothing more.

Second problem is the length of my scenes — we had that problem discussed elsewhere already. For OA92 the spoken text is about 7 minutes! — This is much too long! Nobody wants to listen so long while staring at a single picture.
A solution would be “pan and zoom” which dan give very nice results if creating a movie with stills (especially if done well and in a creative way) – but this needs some time and in the end creates a full movie wich for OA92 would have 200 or 300 MB
I tried this as well but do not see it as a solution.

So i can only present the solution ‘as is’ – it still is a huge step, just does not finally solve my presentation problem. I got it by combining some infos on the net — many thanks for all that input and i am kind of standing on the shoulders of giants here 😉

Because of that i also want to share my findings so You can use it as well — the command i used is as follows (typed into terminal while being in the folder where the files needed are):

ffmpeg -r 0.01 -loop 1 -i PICTURE.jpg SOUND.m4a -c:v libx264 -tune stillimage -pix_fmt yuv420p -crf 22 -shortest -t 190 -f mp4 RESULT.mp4

with PICTURE.jpg being the picture filename, SOUND.m4a being the sound filename and RESULT.mp4 being the desired filename of the result

THE FIRST TAKEカテゴリの最新記事