How to get xml parse data?

This topic contains 11 replies, has 4 voices, and was last updated by  Kenji Sugimoto 1 day, 9 hours ago.

  • Author
    Posts
  • October 12, 2019 at 2:49 AM #37071

    Kenji Sugimoto
    Participant

    Hello, I’m a elder Tap Forms beginner.
    Basically, I am using Tap Forms for manage of books.
    Now, I am trying to write scripts for get books information from National Diet Library, Japan. Its output is XML format.
    So, I need XML parsing, But I have no ideas to parse it. If Tap Forms Utils has utils.getXMLFromUrl() like a utils.getJSONUrl() or utils.getTextUrl(), it is useful.
    Now, I am getting data using utils.getTextUrl() from site. and trying to parse XML data. I know there is XML parser in DOM. But I don’t know how to use it . The pioneers of Tap Forms Scripts, Would you please advice me good solution.

    Regards. Kenji.

    October 12, 2019 at 4:58 PM #37084

    Daniel Leu
    Participant

    It might be helpful if you provided the URL you use to get the XML data. This way, one might be able to provide with a working solution.

    October 12, 2019 at 5:33 PM #37085

    Brendan
    Keymaster

    XML Parsing is really complicated unfortunately. There’s so many ways to do it. That’s why Tap Forms has just JSON parsing built-in. It’s easy and built-in to iOS and macOS to return a dictionary given some JSON structured data.

    But yes, it would be helpful to get the URL as Daniel suggested. If we could see what the structure of the XML is, it might help to figure out a solution.

    But often when you have a web API for getting data, they can have different formats you can specify with a parameter on the URL. Is there anything like that with the web service you’re using Kenji?

    October 12, 2019 at 6:48 PM #37093

    Sam Moffatt
    Participant

    XML is hard because inherently the structure doesn’t easily map to a JavaScript object unlike JSON which is technically JavaScript. I looked for a JS native XML option but most seemed to suggest using the browser based DOM handlers which doesn’t work for Tap Forms and Apple’s JavaScriptCore.

    What makes XML a challenge is that a single node in XML can have attributes as well as a value on top of being a nested structure as well which may have child values whose ordering is important but whose values is duplicated. Part of the heritage coming from SGML similar to the challenge with HTML.

    For myself I have a set of PHP scripts that I use to bridge and give Tap Forms a JSON interface to work with. I use these for scraping web sites and converting values so that Tap Forms can work with a relatively clean JSON interface and the PHP scripts do the heavy lifting. This does mean you need to run a web server or similar service somewhere to make it work but it could help convert the XML into domain specific JSON for Tap Forms to consume.

    October 12, 2019 at 6:54 PM #37094

    Kenji Sugimoto
    Participant

    Daniel and Brandan san,
    Thank you for your advice.
    The URL is https://iss.ndl.go.jp/information/api/riyou/.
    It is API guide page but Japanese language.
    But, it has english version API pdf document link.
    Real URL for search is
    https://iss.ndl.go.jp/api/sru?operation=searchRetrieve&query=isbn=(isbn)
    Sample is as follows;
    https://iss.ndl.go.jp/api/sru?operation=searchRetrieve&query=isbn=9784334779146
    Result include Japanese language Kanji.
    I want get information using by isbn and parse data by specific keywords as title, creator, and publisher.
    Thank you.

    October 12, 2019 at 7:13 PM #37095

    Kenji Sugimoto
    Participant

    Sam san,
    I have understood it is very hard.
    I think the solution is good way for me, but I don’t want run web server now. But I want to leave it as last solution. Thank you.

    October 12, 2019 at 8:45 PM #37096

    Sam Moffatt
    Participant

    If you’re running on a Mac, a web service with PHP available is pre-installed by default on Macs up to at least Mojave (haven’t upgraded to Catalina yet to check but it’s likely there). There are a number of instructions on the web on how to enable the local web server.

    October 12, 2019 at 10:45 PM #37098

    Kenji Sugimoto
    Participant

    Hi, Sam-san.
    Mainly I am running on iPhone when I am getting book information. My scenario is as follows:
    1. Scan the QR code of the book at the bookstore for check that it is already registered or not on my book library. Because I sometimes buy the same book.
    2. If it is no entry, buy it. And get the book information from the site and add it to my book library.
    I am already running JSON site version for other site. but that site is not enough book information.
    So, I am trying this XML site.

    Anyway, I have activated a local web server and PHP on my Catalina Mac.
    So, I am ready to go for your solution. Would you please support me?
    Thank you.

    October 12, 2019 at 10:53 PM #37099

    Daniel Leu
    Participant

    The XML file is rather simple so I just tried a little hack to parse it and extract the data you are looking for:

    function getText(str, tag){
    	return str.substring(str.indexOf('<'+tag+'>')+tag.length+2, str.indexOf('</'+tag+'>'))
    }
    
    function parseIsbnRecord(url){
    	var xml = Utils.getTextFromUrl(url);
    
    	// cleanup xml
    	xml = xml.replace(/dc:/g,'').replace(/</g,'<').replace(/>/g,'>');
    
    	// extract recordData section
    	xml = xml.substring(xml.indexOf('<recordData>')+12, xml.indexOf('</recordData>'))
    
    	var data = [];
    	data['title'] = getText(xml, 'title');
    	data['creator'] = getText(xml, 'creator');
    	data['language'] = getText(xml, 'language');
    	data['publisher'] = getText(xml, 'publisher');
    	data['description'] = xml.match(/<description>([^<]+?)<\/description>/g);
    	for (n=0; n<data['description'].length; n++){
    		data['description'][n] = data['description'][n].replace(/<description>/g, '').replace(/<\/description>/g, '');
    	}
    	
    	return data;
    }
    
    var url = 'https://iss.ndl.go.jp/api/sru?operation=searchRetrieve&query=isbn=9784334779146';
    isbn_record = parseIsbnRecord(url);
    
    console.log("Title: " + isbn_record['title']);
    console.log("Creator: " + isbn_record['creator']);
    console.log("Publisher: " + isbn_record['publisher']);
    console.log("Language: " + isbn_record['language']);
    console.log("Description: " + isbn_record['description']);

    This generates following output:

    Title: まよい道 : 新・吉原裏同心抄(一)
    Creator: 佐伯泰英 著・文・その他
    Publisher: 光文社
    Language: jpn
    Description: 判型 : 文庫,販売対象 : 一般,発行形態 : 文庫,内容 : 日本文学小説・物語,Cコード : 0193,ジャンル : 文庫

    This code is very specific to this example ISBN number you provided. Most likey, other entries are very similar.

    • This reply was modified 1 day, 10 hours ago by  Daniel Leu.
    October 12, 2019 at 11:14 PM #37101

    Kenji Sugimoto
    Participant

    Daniel-san.
    Thank you for your code.
    I just wanted it.
    I will include the code in my scripts.
    Thank you again.

    October 13, 2019 at 12:09 AM #37103

    Kenji Sugimoto
    Participant

    Daniel-san.
    Your advise is just fit my solution.
    I learned parsing technics from your code.
    Thank you.

    October 13, 2019 at 12:15 AM #37105

    Kenji Sugimoto
    Participant

    Sam-san,
    My original problem was solved. But it was specific keywords.
    I think your ideas are good for general purpose.
    I would like to learn your technique if possible.
    Regards.

You must be logged in to reply to this topic.