Page 1 of 1

identify does not correctly detect iptc encoding

Posted: 2010-07-08T02:06:25-07:00
by Eric B
Hi,
Although the following topic deals with similar stuff, it is here a little different.
viewtopic.php?f=3&t=16083

The problem is that imageMagick seems to ignore the IPTC CodedCharacterSet.

I am using exiftool via geosetter to set the IPTC fields in UTF8 instead of latin-1. I will use 2 simple french words for the example: "voilà l'été" meaning "here comes summer"

Let's generate a picture to show the issue:

Code: Select all

convert -size 300x50 xc:black -gravity east -draw "fill white  text 0,0  'iptc utf8 voilà l\'été'" -format JPEG -trim +repage iptc.jpg
and set the IPTC with exiftool. Here the important part of the cmd generated by Geosetter (Win7 x64 US):

Code: Select all

exiftool.exe  -overwrite_original -IPTC:Headline="voilà l'été" -IPTC:CodedCharacterSet=UTF8 iptc.jpg
I've also uploaded the resulting picture here: Image (ExifViewer plugin has the same issue)

Note the strange characters already. But it is right: UTF8 ready software read that properly.

Now, I want to read if with identify

Code: Select all

identify -format "%[IPTC:2:105]" iptc.jpg
I get "voilà l'été" back. How can I get "voilà l'été"?
I tried to convert it afterwards, but it does not work.

PS: if I force

Code: Select all

exiftool.exe  -overwrite_original -IPTC:Headline="voilà l'été" -IPTC:CodedCharacterSet=UTF8 iptc.jpg
identify recognize that properly, but it is wrong since no UTF8 encoding (at least on my machine), and the UTF8 ready software don't recognize the field.
Event If I force latin1 encoding with exiftool (-L), I got the same bad output with identify!

Code: Select all

exiftool.exe  -overwrite_original -IPTC:Headline="voilà l'été" -L iptc.jpg

Re: identify does not correctly detect iptc encoding

Posted: 2010-07-09T07:03:42-07:00
by boardhead
Just to confirm, what is the output of identify on your system for this image?

On my system with ImageMagick 6.2.8 04/17/08 Q16, the headline displays properly.

- Phil

Re: identify does not correctly detect iptc encoding

Posted: 2010-07-09T15:32:41-07:00
by Eric B
Thanks for the feedback.
I got the same output: "voilà l'été", with code pages 1252 and 850 and "voilÃ" with code page 65001.
identify -version gives me:
Version: ImageMagick 6.6.1-10 2010-05-15 Q16 http://www.imagemagick.org
Edit: just upgrated to ImageMagick 6.6.3-0 2010-07-01 Q16 x64 dll, get the same pb

So, a bug only if newer version? Do I have to rollback to an older version? Which one?

Re: identify does not correctly detect iptc encoding

Posted: 2010-07-10T03:07:17-07:00
by boardhead
I think the problem is that your console isn't displaying UTF-8 properly. The difference is that I'm running on Mac OS X and the Terminal is natively UTF-8. The ExifTool FAQ number 18 addresses this problem. I'm thinking that you may need to change your font.

- Phil

Re: identify does not correctly detect iptc encoding

Posted: 2010-07-10T04:29:45-07:00
by Drarakel
I tried it with iptc.jpg, with test.jpg and with a newly created JPG. (I couldn't copy Eric's exiftool command with the UTF8 encoded headline though, as one character was missing then and exiftool displayed "Malformed UTF-8 characters". I had to use the latin1 encoding.)
I always got the same result as Eric (apart from additional quotation marks in test.jpg):
Eric B wrote:"voilà l'été", with code pages 1252 and 850 and "voilÃ" with code page 65001.
I'm using ImageMagick 6.6.3-0 Q16, on Windows XP.

With ExifToolGUI for Windows, the text is always displayed properly. If I set the Windows console to UTF-8, ExifTool also shows the correct text there ("voilà l'été"). But in the output of ImageMagick, it still gets displayed as "voilÃ". :?

@boardhead:
Do you really use IM v6.2.8? I thought that such an old version doesn't even recognize the IPTC fields?

Re: identify does not correctly detect iptc encoding

Posted: 2010-07-10T12:23:23-07:00
by Eric B
Thanks for the feedback. So the bug is confirmed in the current version, isn't it? What are the next steps?
boardhead wrote:I think the problem is that your console isn't displaying UTF-8 properly. The difference is that I'm running on Mac OS X and the Terminal is natively UTF-8. The ExifTool FAQ number 18 addresses this problem. I'm thinking that you may need to change your font.
- Phil
I've put "Lucida Console" as Font. I've tried both cmd.exe and powershell (2 in Win7).Tried both x86 and x64 exe from 6.6.3
I've tried also exiv2 which also deals with iptc.
So the only code page which works with it properly and make sense is 65001 (UTF8).

As you are using an older version, I've downloaded some older versions of ImageMagick, found only code, so recompiled it under visual studio (x86 only).
Results: bug in 6.6.3 (current) and 6.5.9, but NOT in 6.4.9 and NOT in 6.5.0!!
(cmd.exe with Lucida Console, chcp 65001)

I don't want to test all versions, but something went wrong in the source code in the 6.5 branch, 6.5.0. still ok, 6.5.9 not anymore!
Reading the change log, I saw a change in identify -format in 6.5.5, so I gave a last chance by testing the 6.5.4-10. This one DO have the bug.
So it is something else, between 6.5.0 and 6.5.4-10

My current workaround is consequently to use 6.5.0 for identify. I let 6.6.3 x64 installed on my Win7, put my compiled 6.5.0 x86 into another directory, and explicitly define this path in my transformation script!

In the meanwhile, I informed the author of Exif Viewer (Firefox plugin) who already fixed it: http://araskin.webs.com/exif/exif.html#downloads

Re: identify does not correctly detect iptc encoding

Posted: 2010-07-12T06:01:19-07:00
by boardhead
Drarakel wrote: @boardhead:
Do you really use IM v6.2.8? I thought that such an old version doesn't even recognize the IPTC fields?
I haven't checked the ImageMagick change logs, but I do know that I get this at the console:

Code: Select all

> uname -s -r
Linux 2.6.18-194.3.1.el5PAE

> identify -version
Version: ImageMagick 6.2.8 04/17/08 Q16 file:/usr/share/ImageMagick-6.2.8/doc/index.html
Copyright: Copyright (C) 1999-2006 ImageMagick Studio LLC

> identify -format "%[IPTC:2:105]" tmp/test.jpg
"voilà l'été"
(cut-n-pasted from an OS X Terminal window via ssh to the Linux system)

- Phil

Re: identify does not correctly detect iptc encoding

Posted: 2010-07-12T11:14:03-07:00
by Drarakel
OK. It must have been changed a few more times between the versions. (I had tried IM v6.3.2-9 - and there, 'identify -format "%[IPTC:2:105]"' showed nothing.)

Now.. After Eric's observations, I tried a few more versions. The last version that works without problems with the UTF-8 characters seems to be IM v6.5.4-2. IM v6.5.4-3, there's "voilÃ" again.

But I found a workaround for the current versions - don't let it output to the console, but write it to a file:

Code: Select all

identify -format "%[IPTC:2:105]" iptc.jpg >iptc.txt
type iptc.txt
Et voilà:

Code: Select all

voilà l'été
:D

It's strange - because if I redirect it to file in versions 6.5.4-2 and 6.6.3-0, the result is always a correct UTF-8 text format (completely identical). If I let it output directly to the console (set to UTF-8), the result differs.

Maybe it has something to do with that (from the 6.5.4-3 changelog):(?)
Support breaks in Chinese characters which traditionally do not include spaces.
Anyway, something must have been changed in the UTF-8 output in version 6.5.4-3 - at least for Windows.

Re: identify does not correctly detect iptc encoding

Posted: 2011-10-10T14:15:53-07:00
by Eric B
More than one year later, I've upgraded my main IM install to version 6.7.3-0 2011-10-08 x64: it still contains the bug!
So I'll continue to work with 2 versions: the old 6.5.0 for this dedicated command, and the 6.7.3 for the rest...

Re: identify does not correctly detect iptc encoding

Posted: 2011-10-10T17:36:41-07:00
by Jason S
I'm curious as to how it could have worked in old versions (but not curious enough to spend the time to figure it out).

The real solution to this, and lots of similar problems, is straightforward. Maybe it's time to start pushing for it to happen.

1. Put stdout in Unicode mode.

2. Convert all human-readable text to UTF-8 when it is read in, and use UTF-8 everywhere internally.

3. Convert all text to the UTF-16 before outputting it.

How to do this?

1. Call
_setmode(_fileno(stdout),_O_U8TEXT);
or
_setmode(_fileno(stdout),_O_U16TEXT);
before writing any output. Which one? Probably _O_U8TEXT. This determines the encoding used when stdout is redirected to a file. Either way, you still have to encode your text in UTF-16 before outputting it.

2. This will take some work to do completely. No changes are needed to fix this particular issue, as the encodings will match by chance.

3. Update the FormatLocaleFileList function in magick/identify.c, to convert the text from UTF-8 to UTF-16, and call a "w" function like fputws() or _vfwprintf_l().

(Warning: I haven't tried this, so I could be missing some steps.)

Re: identify does not correctly detect iptc encoding

Posted: 2014-06-03T07:29:20-07:00
by Mark_Reiser
Dear Community,

unfortunately this problem still exists on newer image magick versions.

I have a JPG and I want to read the IPTC cation. It contains an german "Umlaute".

My command line to test it:
"C:\Program Files\ImageMagick-6.8.9-Q16\identify" -format "%[IPTC:2:120]" 8.jpg

The output:
ü ä ÃY Ão Ã" Ã- ö /

What it should be (properly readable with IrfanView 4.37):
ü ä ß Ü Ä Ö ö /

I have a german Windows 7 and used the version "ImageMagick-6.8.9-2-Q16-x64-dll.exe".

If someone wants to test it please find the file here (sure you need to unzip the file):
http://www.file-upload.net/download-9000029/8.zip.html

Thank you very much in advance
Mark Reiser

Re: identify does not correctly detect iptc encoding

Posted: 2014-06-03T09:36:56-07:00
by snibgo
I confirm the behaviour with IM v6.8.9-0 under Windows 8.1.

When using UTF-8, I normally:

Code: Select all

chcp 65001
This doesn't cure the problem. The output can be redirected to a file, "type file.txt", and that works. Or, more simply:

Code: Select all

identify -format "%[IPTC:2:120]" 8.jpg |inout
... where inout.cpp is the following trivial program, compiled with gcc:

Code: Select all

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>

int main (int argc, char *argv [])
{
#define LineLen 1000

  char sLine [LineLen];

  FILE * fin = stdin;
  while (fgets (sLine, LineLen, fin)) {
    printf ("%s", sLine);
  }

  return (0);
}
I don't understand why this cures the problem.

Even so, to get this to work needs "chcp 65001".

Re: identify does not correctly detect iptc encoding

Posted: 2014-06-03T23:16:42-07:00
by Mark_Reiser
Thank you for your reply!

You´re right - having the output put to a file lead to a correct result.
This might be a (bad) workaround.
By the way - putting it to a file can be done easy with the following command:

C:\identify -format "%[IPTC:2:120]" 8.jpg > 8.txt

However, unfortunately we use the "Visual Basic API" of Image Magick with the following command line within Lotus Script.
To goal hereby is to get multiple image parameters in one call, including the IPTC caption:

Set imageObj = CreateObject("ImageMagickObject.MagickImage")
strResult = imageObj.identify("-format", "%b" & strSep & "%d" & strSep & "%e" & strSep & "%f" & strSep & "%h" & strSep & "%w" & strSep & "%x" & strSep & "%[colorspace]" & strSep & "%[IPTC:2:105]" & strSep & "%[IPTC:2:120]", strSourceFullPath)

You see the results come into the strResult String variable and are with bad Umlaut characters there.

So I think the "bug" might be especially within the Visual basic API of the identify function?
Any ideas?

Best regards
Mark Reiser

Re: identify does not correctly detect iptc encoding

Posted: 2016-12-18T15:25:53-07:00
by Eric B
I don't really understand: the command line program "identify.exe" is using the Visual Basic API ???

Just tried again -6 years after- using IM 7.0.4: the identity command is still not able to output properly the characters when chcp 65001 is activated.
I am using this command in a Powershell script in Windows 10 x64.