Enabling regular expressions in XML files
Publicador del fil: Don Alejandro

Don Alejandro
Rússia
Nov 19, 2019

Hello,

I'm working in the Studio 2015 and receive XML file on rare occasion to translate. I've uploaded it and it was processed quite well according to standard rules, all tags are hidden and so on. But the text for translation itself contains some "garbage" that I'd like to remove from actual translation. With Excel files it's quite simple, I go to Project Settings -> File Types -> Microsoft Excel 2007-2013 -> Embedded Content. Here I add document structure information sdl:cell an
... See more
Hello,

I'm working in the Studio 2015 and receive XML file on rare occasion to translate. I've uploaded it and it was processed quite well according to standard rules, all tags are hidden and so on. But the text for translation itself contains some "garbage" that I'd like to remove from actual translation. With Excel files it's quite simple, I go to Project Settings -> File Types -> Microsoft Excel 2007-2013 -> Embedded Content. Here I add document structure information sdl:cell and next I'm adding regular expressions that I see fit.

But since I'm not that tech savvy, I can't figure out on my own whether I can do the same with the XML files. I tried the same Project Settings -> File Types -> XML: Any XML -> Embedded Content and then I'm stuck. What structure information should be properly selected in order for the regexs to work? I tried sdl:paragraph and sdl:code, but none of those had any effect. Or it's not that simple with XML files and I will need to introduce structure information manually?
Collapse


 

Rossana Triaca  Identity Verified
Uruguai
Local time: 21:41
Anglès a Espanyol
Useful article by Paul Filkin Nov 19, 2019

Depending on the file, you may need a custom filter. There's a handy article here on how to do this (check up the comments too for easier tag catch-alls):

https://multifarious.filkin.com/2014/06/01/custom-xml/

Warning: it's long.


 

Don Alejandro
Rússia
AUTOR DEL TEMA
OK Nov 19, 2019

Yes, I've read all those topics of course =] but creating the filter / scheme from the scratch seems a bit complex to me, because the XML file does not have proper containers, so I will have to add every single rule that is already enabled in standard settings. What I actually need is make a small adjustment to already existing standard rules, namely two small regular expressions to exclude \{.*?\} and \(.*?=.*?\). I've found a way to do this with MemoQ just 15 minutes ago, by using "Import with... See more
Yes, I've read all those topics of course =] but creating the filter / scheme from the scratch seems a bit complex to me, because the XML file does not have proper containers, so I will have to add every single rule that is already enabled in standard settings. What I actually need is make a small adjustment to already existing standard rules, namely two small regular expressions to exclude \{.*?\} and \(.*?=.*?\). I've found a way to do this with MemoQ just 15 minutes ago, by using "Import with Options", then Cascade filter of default XML and Regex. But my main tool is Trados, so I was wondering whether I can do it without creating a new file type.

[Edited at 2019-11-19 16:49 GMT]
Collapse


 

Rossana Triaca  Identity Verified
Uruguai
Local time: 21:41
Anglès a Espanyol
Sorry… Nov 19, 2019

I've only ever needed to tweak more complex XML filetypes, or for really easy things I've fiddled with the regex delimited text files (the Text with inline tags filter option), but I'm not sure there's a middle ground (if your file is simple, perhaps the Text filter is actually enough with catch-alls for angular brackets).

Just another idea -- couldn't you create you own unique attribute for just these two cases, apply that style to the content you want with ye olde Notepad++ regex,
... See more
I've only ever needed to tweak more complex XML filetypes, or for really easy things I've fiddled with the regex delimited text files (the Text with inline tags filter option), but I'm not sure there's a middle ground (if your file is simple, perhaps the Text filter is actually enough with catch-alls for angular brackets).

Just another idea -- couldn't you create you own unique attribute for just these two cases, apply that style to the content you want with ye olde Notepad++ regex, and then create an AnyXML based filetype with just that one more attribute to be handled as you need? Quick and dirty, and definitely not elegant, but it should work.

Another idea, depending on how/where this content appears and what you want to do with it -- maybe you could filter it after importing and tweak it then? (again, sorry to be vague, but it's hard without a sample).
Collapse


 

Don Alejandro
Rússia
AUTOR DEL TEMA
OK Nov 21, 2019

Thank you for potential solutions, they could actually work, but there is one problem - I need the original XML file to be in the same state as I received it from the client. So if I was about to tweak it in Notepad++ directly, for example, I won't be able to make it back 100% pristine. The sample of XML file (I left only the first line) is available here: https://dropmefiles.com/I5KrH There are quite a lot... See more
Thank you for potential solutions, they could actually work, but there is one problem - I need the original XML file to be in the same state as I received it from the client. So if I was about to tweak it in Notepad++ directly, for example, I won't be able to make it back 100% pristine. The sample of XML file (I left only the first line) is available here: https://dropmefiles.com/I5KrH There are quite a lot of tags, but default Trados Any XML file type handled them perfectly. But then again there are certain elements that are inside the translatable area that I'd like to remove. Here is how the file looks in Trados https://take.ms/xkCiK What I usually do with Excel files is going here and adding regex, so that everything that does not need to be translated is converted into Trados tags (see https://take.ms/YZxVw ). With XML however this does not seem to work; frankly, I was so desperate that I tried every option of structure information (see https://take.ms/KTkCC ), but nothing worked. I still see all those {0} and (phrase=phrase) constructions.

MemoQ worked perfectly with "Import with options" and using Cascade filters (see https://take.ms/T9wUJ ), perhaps this is not possible with Trados 2015 indeed, althouth the software settings look quite easy, so I thought all I need was to define correct structure information.

[Edited at 2019-11-21 10:21 GMT]
Collapse


 

Rafa Gómez  Identity Verified
Espanya
Local time: 01:41
Mebre des-de 2009
Anglès a Espanyol
Prepare the file in memoQ and translate the exported mqxliff/mqxlz file in Trados Studio Nov 21, 2019

Hi,

If I understood you correctly, you don't want to "remove" those "tags" from the XML file—you just want to convert them into Trados Studio tags. And you know how to use memoQ Regex Tagger to do the job, but your main tool is Trados Studio.

Instead of trying to figure out how to use the embedded content processor in Trados Studio (which isn't really user-friendly), why don't you prepare the XML file in memoQ and translate the exported mqxliff/mqxlz file in Trados St
... See more
Hi,

If I understood you correctly, you don't want to "remove" those "tags" from the XML file—you just want to convert them into Trados Studio tags. And you know how to use memoQ Regex Tagger to do the job, but your main tool is Trados Studio.

Instead of trying to figure out how to use the embedded content processor in Trados Studio (which isn't really user-friendly), why don't you prepare the XML file in memoQ and translate the exported mqxliff/mqxlz file in Trados Studio? Once you complete the translation, you just need to reimport the exported file into memoQ and generate the target XML.

As a workaround, there's a Trados Studio app which is similar to memoQ Regex Tagger, but I haven't tried it (I'm a memoQ user): https://appstore.sdl.com/language/app/cleanup-tasks/963/

Finally, instead of "\{.*?\}" and "\(.*?=.*?\)", I'd use "\{[^}]+\}" and "\([^=]+=[^)]+\)". Both options should work, but negated character classes are usually preferred over lazy matching.

Hope this helps.

[Edited at 2019-11-21 16:15 GMT]
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Enabling regular expressions in XML files

Advanced search







WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running and helps experienced users make the most of the powerful features.

More info »



Forums
  • All of ProZ.com
  • Cerca de termes
  • Feines
  • Fòrums
  • Multiple search