Enabling regular expressions in XML files
Thread poster: Don Alejandro
Don Alejandro
Don Alejandro
Russian Federation
Nov 19, 2019

Hello,

I'm working in the Studio 2015 and receive XML file on rare occasion to translate. I've uploaded it and it was processed quite well according to standard rules, all tags are hidden and so on. But the text for translation itself contains some "garbage" that I'd like to remove from actual translation. With Excel files it's quite simple, I go to Project Settings -> File Types -> Microsoft Excel 2007-2013 -> Embedded Content. Here I add document structure information sdl:cell an
... See more
Hello,

I'm working in the Studio 2015 and receive XML file on rare occasion to translate. I've uploaded it and it was processed quite well according to standard rules, all tags are hidden and so on. But the text for translation itself contains some "garbage" that I'd like to remove from actual translation. With Excel files it's quite simple, I go to Project Settings -> File Types -> Microsoft Excel 2007-2013 -> Embedded Content. Here I add document structure information sdl:cell and next I'm adding regular expressions that I see fit.

But since I'm not that tech savvy, I can't figure out on my own whether I can do the same with the XML files. I tried the same Project Settings -> File Types -> XML: Any XML -> Embedded Content and then I'm stuck. What structure information should be properly selected in order for the regexs to work? I tried sdl:paragraph and sdl:code, but none of those had any effect. Or it's not that simple with XML files and I will need to introduce structure information manually?
Collapse


 
Rossana Triaca
Rossana Triaca  Identity Verified
Uruguay
Local time: 14:35
English to Spanish
Useful article by Paul Filkin Nov 19, 2019

Depending on the file, you may need a custom filter. There's a handy article here on how to do this (check up the comments too for easier tag catch-alls):

https://multifarious.filkin.com/2014/06/01/custom-xml/

Warning: it's long.


 
Don Alejandro
Don Alejandro
Russian Federation
TOPIC STARTER
OK Nov 19, 2019

Yes, I've read all those topics of course =] but creating the filter / scheme from the scratch seems a bit complex to me, because the XML file does not have proper containers, so I will have to add every single rule that is already enabled in standard settings. What I actually need is make a small adjustment to already existing standard rules, namely two small regular expressions to exclude \{.*?\} and \(.*?=.*?\). I've found a way to do this with MemoQ just 15 minutes ago, by using "Import with... See more
Yes, I've read all those topics of course =] but creating the filter / scheme from the scratch seems a bit complex to me, because the XML file does not have proper containers, so I will have to add every single rule that is already enabled in standard settings. What I actually need is make a small adjustment to already existing standard rules, namely two small regular expressions to exclude \{.*?\} and \(.*?=.*?\). I've found a way to do this with MemoQ just 15 minutes ago, by using "Import with Options", then Cascade filter of default XML and Regex. But my main tool is Trados, so I was wondering whether I can do it without creating a new file type.

[Edited at 2019-11-19 16:49 GMT]
Collapse


 
Rossana Triaca
Rossana Triaca  Identity Verified
Uruguay
Local time: 14:35
English to Spanish
Sorry… Nov 19, 2019

I've only ever needed to tweak more complex XML filetypes, or for really easy things I've fiddled with the regex delimited text files (the Text with inline tags filter option), but I'm not sure there's a middle ground (if your file is simple, perhaps the Text filter is actually enough with catch-alls for angular brackets).

Just another idea -- couldn't you create you own unique attribute for just these two cases, apply that style to the content you want with ye olde Notepad++ regex,
... See more
I've only ever needed to tweak more complex XML filetypes, or for really easy things I've fiddled with the regex delimited text files (the Text with inline tags filter option), but I'm not sure there's a middle ground (if your file is simple, perhaps the Text filter is actually enough with catch-alls for angular brackets).

Just another idea -- couldn't you create you own unique attribute for just these two cases, apply that style to the content you want with ye olde Notepad++ regex, and then create an AnyXML based filetype with just that one more attribute to be handled as you need? Quick and dirty, and definitely not elegant, but it should work.

Another idea, depending on how/where this content appears and what you want to do with it -- maybe you could filter it after importing and tweak it then? (again, sorry to be vague, but it's hard without a sample).
Collapse


 
Don Alejandro
Don Alejandro
Russian Federation
TOPIC STARTER
OK Nov 21, 2019

Thank you for potential solutions, they could actually work, but there is one problem - I need the original XML file to be in the same state as I received it from the client. So if I was about to tweak it in Notepad++ directly, for example, I won't be able to make it back 100% pristine. The sample of XML file (I left only the first line) is available here: https://dropmefiles.com/I5KrH There are quite a lot... See more
Thank you for potential solutions, they could actually work, but there is one problem - I need the original XML file to be in the same state as I received it from the client. So if I was about to tweak it in Notepad++ directly, for example, I won't be able to make it back 100% pristine. The sample of XML file (I left only the first line) is available here: https://dropmefiles.com/I5KrH There are quite a lot of tags, but default Trados Any XML file type handled them perfectly. But then again there are certain elements that are inside the translatable area that I'd like to remove. Here is how the file looks in Trados https://take.ms/xkCiK What I usually do with Excel files is going here and adding regex, so that everything that does not need to be translated is converted into Trados tags (see https://take.ms/YZxVw ). With XML however this does not seem to work; frankly, I was so desperate that I tried every option of structure information (see https://take.ms/KTkCC ), but nothing worked. I still see all those {0} and (phrase=phrase) constructions.

MemoQ worked perfectly with "Import with options" and using Cascade filters (see https://take.ms/T9wUJ ), perhaps this is not possible with Trados 2015 indeed, althouth the software settings look quite easy, so I thought all I need was to define correct structure information.

[Edited at 2019-11-21 10:21 GMT]
Collapse


 
xxRGProz (X)
xxRGProz (X)  Identity Verified
Spain
Local time: 18:35
Prepare the file in memoQ and translate the exported mqxliff/mqxlz file in Trados Studio Nov 21, 2019

Hi,

If I understood you correctly, you don't want to "remove" those "tags" from the XML file—you just want to convert them into Trados Studio tags. And you know how to use memoQ Regex Tagger to do the job, but your main tool is Trados Studio.

Instead of trying to figure out how to use the embedded content processor in Trados Studio (which isn't really user-friendly), why don't you prepare the XML file in memoQ and translate the exported mqxliff/mqxlz file in Trados St
... See more
Hi,

If I understood you correctly, you don't want to "remove" those "tags" from the XML file—you just want to convert them into Trados Studio tags. And you know how to use memoQ Regex Tagger to do the job, but your main tool is Trados Studio.

Instead of trying to figure out how to use the embedded content processor in Trados Studio (which isn't really user-friendly), why don't you prepare the XML file in memoQ and translate the exported mqxliff/mqxlz file in Trados Studio? Once you complete the translation, you just need to reimport the exported file into memoQ and generate the target XML.

As a workaround, there's a Trados Studio app which is similar to memoQ Regex Tagger, but I haven't tried it (I'm a memoQ user): https://appstore.sdl.com/language/app/cleanup-tasks/963/

Finally, instead of "\{.*?\}" and "\(.*?=.*?\)", I'd use "\{[^}]+\}" and "\([^=]+=[^)]+\)". Both options should work, but negated character classes are usually preferred over lazy matching.

Hope this helps.

[Edited at 2019-11-21 16:15 GMT]
Collapse


 
Don Alejandro
Don Alejandro
Russian Federation
TOPIC STARTER
I'VE FINALLY MADE IT Apr 15, 2021

Rossana Triaca wrote:

Depending on the file, you may need a custom filter. There's a handy article here on how to do this (check up the comments too for easier tag catch-alls):

https://multifarious.filkin.com/2014/06/01/custom-xml/

Warning: it's long.


This is probably a long forgotten thread, but I just wanted to say BIG THANKS to Rossana for the link provided in the first message. I've now stumbled upon same issue in another project and it was rather critical to remove tags and other garbage, so I sat down and read the whole article thoroughly as many times as needed until something finally clicked in my head and I managed to create a custom XML template file, picked correct document structure information (or rather found where it's actually stated in Trados) and enabled those damn Regex expressions in XML files. Again, much appreciated, thank you!


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Enabling regular expressions in XML files







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »