Talk is Cheap
Show me Your Code
TL;DR: Follow these steps to extract the binary metadata from tmap, then parse it. HEIF and AVIF share the same container format, so the method works for both.
- Environment: get
MP4Boxready. - Locate
tmapand extract it. - Parse according to ISO 21496-1.
# 1. Install MP4Box via brew or other package manager
brew install gpac
# 2. Locate `tmap`
MP4Box -info example.heic
# 3. Find the ID of `tmap` in the output above
MP4Box -dump-item [ID] example.heic
# 4. Parse with Python or another method
uv run parse_binary_payload.py
This script can be obtained from GitHub; you’ll need to replace the path in main.
The following examples were tested:
- HEIC shot on iPhone
- HDR HEIC screenshot from iPhone
- AVIF Gainmap produced by
libavif - AVIF Gainmap exported from Lightroom (with maximum compatibility selected on export)
Show me Your Prompt
The idea came from the fourth question in C.3 of ISO 21496-1’s six questions:
(d) How is the gain map metadata binary payload stored?
In other words, regardless of image format, the gainmap metadata must be stored as a binary payload.
So the question posed to the AI was as follows (I feel it’s not quite appropriate to reproduce the standard text directly, so the reader is encouraged to look up the standard themselves).
[ISO 21496-1 Annex C.2, C.3]
Investigate how the HEIC and AVIF image formats store the information described in C.3.
Dancing with AI
At this point, you’ve finished reading the part related to the title. What follows is the part about collaborating with AI—just some notes on thought processes that might become obsolete at any moment.
Research and Search
If, like me, you know almost nothing about the internal structure of HEIF and AVIF, the first step is to have AI gather some information, essentially doing the job that search engines used to do. The following models were used here.
- Gemini 3.1 Pro (web conversation)
- Grok 4.2 Beta (API)
- Qwen DeepResearch (web conversation)
- GPT 5.4 Pro (web conversation)
At this step I habitually deploy multiple AIs—the search results are more comprehensive that way, and it would be a waste not to use them. Among these models, apart from Gemini 3.1 Pro which has limited multi-turn capability and mainly relies on its own world knowledge, the rest all support search and multi-step execution. Grok’s API seems to have issues—asking a simple Hello cost me 160k tokens. GPT 5.4 Pro is extremely slow and its tone is rather unpleasant. Qwen is mainly useful for the references after search, though it doesn’t find much on very recent topics.
Planning
Once the information above came back, the problem was largely solved. Several models all returned similar answers about finding things in tmap. I threw all the various information at Gemini and had it produce a checklist: how to parse what we wanted step by step, what terminal commands to run at each step, and so on.
Execution
Once the plan was set, I found a Codex-style CLI tool that could execute commands, ran through the checklist from the previous step, and ideally obtained the final result. This step used GLM-5.
There was a small side story here: tmap doesn’t just contain binary metadata. Apart from Hasselblad’s implementation of HEIF, all others have an extra version number at the very beginning, occupying one byte. Moreover, in the sample images I provided, gainmapmax was numerically equal to alt headroom, and gainmapmin equal to base headroom, causing some identical numbers to appear during parsing. GLM-5 failed to recognise these two points and kept trying different slicing and decoding orders. By the time I noticed, it had already consumed 3 million tokens. I should have used a smarter model.
Review and Second Search
Since GLM-5’s execution in the previous step wasn’t entirely successful, some of its output was fed to other models for a second round of search. This time Grok 4.2 Beta (web conversation) was used. Grok discovered this extra version byte issue from a PR in libheif.
Tidying Up the Code
At this point the work was done. I had Copilot tidy up the code and tested with several more images to see if parsing went smoothly. I used Copilot because it’s free, and since they recently removed model selection it only uses auto-selected models, so it’s basically only good for this kind of miscellaneous work.
Brute Force and Simplicity
All that rambling above—who knows, the next time some internet-connected AI product is asked a similar question, it might just search its way here and give an answer.
Looking at the data provided by Cloudflare, the bots visiting this blog every day far outnumber humans, with most being crawlers from various large model companies—more than ten times the number of search engine crawlers.
To dream a little further: in another six months to a year, once all this stuff gets fed into training data, base models should be able to solve problems like this directly.