Using scripts to fill in data and categorize from external files?

Tap Forms – Organizer Database App for Mac, iPhone, and iPad Forums Script Talk Using scripts to fill in data and categorize from external files?

Viewing 20 reply threads
  • Author
    Posts
  • November 9, 2020 at 2:55 PM #42535

    Brent S
    Participant

    Hi Everyone. I am new to scripting, so want to see if this is possible. Imagine the following situation. You have 50,000 food recipes all in PDF form input into the Tap Forms Database, each PDF with a unique file name and in a field where it is a “File Attachment”. So, there are then 50,000 records in the db. Many fields have been added to the database, but are empty until you manually go in and populate them. Is it possible to use scripts to help populate the data?

    For example, lets say outside TapForms you use Spotlight on the iMac to identify 60 PDF’s (recipes) that contained the words “Gummy Bear”. You make an Excel File (.csv) with those 60 unique file names. Can you now write a script that would reference those file names in a .csv file, find/look up the specific records in tapforms where those files are attached and then do two things:

    a) choose a picklist field that already exists, and choose one of the presets. For example, if this was a list of meal choices like “Breakfast, Lunch, Dinner, Dessert” the script would make it choose “Dessert” for each of those 60 records, but not any other records in the database

    b) If you had a textfile field named KEYWORDS then have it add the term “Gummy Bear” to the list of keywords again just for those 60 records

    If this is possible then I can see how scripting could help me and others categorize a very large database that would take a very long time to categorize manually. Any direction on what a script might look like would be extremely appreciated.

    Thanks,
    Brent

    November 9, 2020 at 8:27 PM #42537

    Daniel Leu
    Participant

    You can import a csv file into a separate form and then use a script to process all records and update your reference records accordingly. Then you can assign values of meal choices accordingly and update your keywords field as well.

    But this seems to be a rather tedious job if you want to update all 50k food recipes in your Tap Forms database this way. I would start with looking into software that converts your PDFs into text files. Then you can add the content of your text files into fields inside Tap Forms. These fields are searchable, which makes it much easier to categorize your recipes. But yeah, it might be a challenge to convert your PDFs, specially if they were scanned and the quality is poor.

    But if you want to go the CSV route, I would create a spreadsheet with columns for PDF, Category, and Keyword. Here is my example:

    function addKeyword(rec, kw_id, kw){
    	var keywords = rec.getFieldValue(kw_id);
    	
    	if (keywords == undefined || keywords == ""){
    		rec.setFieldValue(kw_id, kw);
    	} else {
    		rec.setFieldValue(kw_id, keywords + ',' + kw);
    	}
    }
    
    function Update_Records() {
    
    	var cvsForm = document.getFormNamed("csv");
    	var recipesForm = document.getFormNamed("Recipes");
    	
    	// recipes
    	var category_id = 'fld-9828d73d445a4746a8dba4c0dac89716';
    	var keyword_id = 'fld-073ee12d29174c2eb64f7cf91f5bb383';
    	var pdf_id = 'fld-e870153bd0a647c9823ca998c213d1fd';
    
    	
    	// get csv field Ids
    	var csvPdf_id = cvsForm.getFieldNamed('PDF').getId();
    	var csvCategory_id = cvsForm.getFieldNamed('Category').getId();	var csvKeyword_id = cvsForm.getFieldNamed('Keyword').getId();	
    	// Loop over all csv entries
    	for (entry of cvsForm.getRecords()){
    		// get pdf file name
    		var pdfName = entry.getFieldValue(csvPdf_id);
    		// replace spaces with underscores
    		pdfName = pdfName.replace('/ /g', '_');
    		console.log("Processing: " + pdfName);
    		
    		// Loop over all recipes records
    		for (recipe of recipesForm.getRecords()){
    			// this assumes that there is only one PDF per recipe!
    			var filename = recipe.getFieldValue(pdf_id)[0].filename;
    			
    			if (pdfName == filename){
    				console.log("Found match: " + filename);
    				
    				// Update recipe record
    				addKeyword(recipe, keyword_id, entry.getFieldValue(csvKeyword_id));
    				
    				// Set category
    				recipe.setFieldValue(category_id, entry.getFieldValue(csvCategory_id))
    			
    				break;
    			}
    		
    		}
    		document.saveAllChanges();
    	
    	}
    	
    	return 'done';
    }
    
    Update_Records();

    The first row of my spreadsheet are the field labels: PDF, Category, Keyword. They are used to create the field names when creating a new form called csv upon importing your csv. The script is part of your Recipes form.

    This script assumes that the keywords are comma separated. Keywords are added to your record if new ones are found. The category field is overwritten!

    In order for the script to work, you need to update category_id, keyword_id, and pdf_id according to your recipes form.

    Hope this helps! Happy cooking!

    November 9, 2020 at 9:56 PM #42539

    Brent S
    Participant

    Thank you Daniel! This is great. I am glad to know this is possible. It is going to take me a little bit of time to figure out scripting, but you have given me a great start.

    My database is actually not a recipe database. That was just a simple example to illustrate the problem I have which is a lot more complex. Your answer will help me generate a solution.

    The documents (readable PDF’s) I am dealing with are unstructured contracts and each are about 30 pages long, and yes each has its own unique file name. I do have them all in HTML too, but I think that might be a bit much to have that much text in Tap Forms?

    So, just circling back to the one part of the solution, where I have a pick list with four choices, in my example, “Breakfast, Lunch, Dinner, Dessert” your code is as follows:

    // Set category
    recipe.setFieldValue(category_id, entry.getFieldValue(csvCategory_id))

    If I understand you correctly, I would put the word “Dessert” as one of the entries in the CSV file under “Category” and then this will overwrite it in the Pick List?

    Anyhow. Thanks again!

    November 10, 2020 at 12:48 AM #42540

    Daniel Leu
    Participant

    // Set category
    recipe.setFieldValue(category_id, entry.getFieldValue(csvCategory_id))

    If I understand you correctly, I would put the word “Dessert” as one of the entries in the CSV file under “Category” and then this will overwrite it in the Pick List?

    Yes, but it would not overwrite the pick list, but just the value assigned to the field.

    Pick list is just a list of predefined values for a field. You still can enter a different value to that field.

    November 14, 2020 at 5:21 PM #42586

    Brent S
    Participant

    Hi Daniel,

    A few more questions if I may. I have tried to figure things out by watching some of Sam’s videos etc., but I am a little stuck.

    So first, from my reading it looks like I need to use a “Form Level Script” as opposed to a “Field Level Script” as I am updating existing records.

    I only have two forms, my main database and the cvs data I want to input. Is it correct that I must place the Form Level Script in my main database and not in the cvs form? (That seems to be the only way I can get the correct fld-hash numbers that I need into the script).

    Next, much of the script makes sense to me, and other parts seem somewhat circular. Could you look what I have done and see where I messed up? …because I think I did a good job on that : )

    I need the script to look up each record that aligns with the PDF file name in the cvs file, and then update the records in the main database form called “Contracts”

    I have attached a few screen shots in case that helps with context.

    Thanks again.
    Brent

    ——————-

    function addKeyword(rec, kw_id, kw) {

    var keywords = rec.getFieldValue(kw_id);

    if (keywords == undefined || keywords == “”){
    rec.setFieldValue(kw_id, kw);
    } else {
    rec.setFieldValue(kw_id, keywords + ‘,’ + kw);
    }
    }

    function Update_Records() {

    var csvForm = document.getFormNamed(“csv”);
    var ContractsForm = document.getFormNamed(“Contracts”);

    // Contracts
    var pdffilepath_7_id = ‘fld-3ea77ccc18604174b722ece105965c44’;
    var license_category_id = ‘fld-d63ab1856e554efebe62b9f1826c388d’;
    var license_subcategory_id = ‘fld-f0c47c7e6ad244fe955e2c24cb448590’;
    var agreement_keywords_id = ‘fld-a2126cbe512646e9ba144fb5a90b06dc’;

    // get csv field Ids
    var pdffilepath_7 = cvsForm.getFieldNamed(‘PDF’).getId();
    var license_category = cvsForm.getFieldNamed(‘Category’).getId(); var license_subcategory = cvsForm.getFieldNamed(‘Subcategory’).getId();
    var agreement_keywords = cvsForm.getFieldNamed(‘Keyword’).getId();

    // Loop over all csv entries
    // So this tells my Contracts database to look at the cvs form?
    for (entry of cvsForm.getRecords()){

    // get pdf file name
    // is it looking for another fld-hash here?
    var pdfName = entry.getFieldValue(csvPdf_id);

    // replace spaces with underscores
    pdfName = pdfName.replace(‘/ /g’, ‘_’);
    console.log(“Processing: ” + pdfName);

    // Loop over all LICENSEdb records
    for (Contracts of Contracts.getRecords()){

    // this assumes that there is only one PDF per Contract
    // where have I defined (pdf_id) before? Do I need to ?
    var filename = Contract.getFieldValue(pdf_id)[0].filename;

    if (pdfName == filename){
    console.log(“Found match: ” + filename);

    // Update Contracts record
    addsubcategory (Contracts, subcategory_id, entry.getFieldValue(csvKeyword_id));

    // Update Contracts record
    addKeyword(Contracts, keyword_id, entry.getFieldValue(csvSubcategory_id));

    // Set category
    Contracts .setFieldValue(category_id, entry.getFieldValue(csvCategory_id))

    break;
    }

    }
    document.saveAllChanges();

    return ‘done’;
    }

    Update_Records();

    Attachments:
    You must be logged in to view attached files.
    November 15, 2020 at 11:27 AM #42602

    Sam Moffatt
    Participant

    A few more questions if I may. I have tried to figure things out by watching some of Sam’s videos etc., but I am a little stuck.

    Curious for feedback, were any of them helpful? What was not helpful and what would improve it? What was helpful and how was it helpful? I had thought about doing another simple intro one on the next few snippets which includes functions but haven’t done it. I tried another video on some other topics but wasn’t happy with how it turned out.

    So first, from my reading it looks like I need to use a “Form Level Script” as opposed to a “Field Level Script” as I am updating existing records.

    I think my rule of thumb is that field scripts are meant to act on the contents of the record as those contents change whilst form scripts are more one off or bulk action changes. You can use them somewhat interchangeably and the question is do you want this thing to run all of the time or is it something you want to control when it runs?

    I only have two forms, my main database and the cvs data I want to input. Is it correct that I must place the Form Level Script in my main database and not in the cvs form? (That seems to be the only way I can get the correct fld-hash numbers that I need into the script).

    While it might not be intuitive you can access any form and any field from scripts (fields or forms) in any form within a document. The editor won’t show you the fields that aren’t directly linked to the form that you’re in but you can use the editor to go to other forms and get their field ID’s. You can also get the field ID from below the description box in the field editor UI.

    You can use the code button to wrap your code with backticks (e.g. `code here`) to make it a little easier to read (also ran it through a formatter since the post didn’t have any indentation):

    function addKeyword(rec, kw_id, kw) {
    
    	var keywords = rec.getFieldValue(kw_id);
    
    	if (keywords == undefined || keywords == "") {
    		rec.setFieldValue(kw_id, kw);
    	} else {
    		rec.setFieldValue(kw_id, keywords + ',' + kw);
    	}
    }
    
    function Update_Records() {
    
    	var csvForm = document.getFormNamed("csv");
    	var ContractsForm = document.getFormNamed("Contracts");
    
    	// Contracts
    	var pdffilepath_7_id = 'fld-3ea77ccc18604174b722ece105965c44';
    	var license_category_id = 'fld-d63ab1856e554efebe62b9f1826c388d';
    	var license_subcategory_id = 'fld-f0c47c7e6ad244fe955e2c24cb448590';
    	var agreement_keywords_id = 'fld-a2126cbe512646e9ba144fb5a90b06dc';
    
    	// get csv field Ids
    	var pdffilepath_7 = cvsForm.getFieldNamed('PDF').getId();
    	var license_category = cvsForm.getFieldNamed('Category').getId(); var license_subcategory = cvsForm.getFieldNamed('Subcategory').getId();
    	var agreement_keywords = cvsForm.getFieldNamed('Keyword').getId();
    
    	// Loop over all csv entries
    	// So this tells my Contracts database to look at the cvs form?
    	for (entry of cvsForm.getRecords()) {
    
    		// get pdf file name
    		// is it looking for another fld-hash here?
    		var pdfName = entry.getFieldValue(csvPdf_id);
    
    		// replace spaces with underscores
    		pdfName = pdfName.replace('/ /g', '_');
    		console.log("Processing: " + pdfName);
    
    		// Loop over all LICENSEdb records
    		for (Contracts of Contracts.getRecords()) {
    
    			// this assumes that there is only one PDF per Contract
    			// where have I defined (pdf_id) before? Do I need to ?
    			var filename = Contract.getFieldValue(pdf_id)[0].filename;
    
    			if (pdfName == filename) {
    				console.log("Found match: " + filename);
    
    				// Update Contracts record
    				addsubcategory(Contracts, subcategory_id, entry.getFieldValue(csvKeyword_id));
    
    				// Update Contracts record
    				addKeyword(Contracts, keyword_id, entry.getFieldValue(csvSubcategory_id));
    
    				// Set category
    				Contracts.setFieldValue(category_id, entry.getFieldValue(csvCategory_id))
    
    				break;
    			}
    		}
    	}
    	document.saveAllChanges();
    
    	return 'done';
    }
    
    Update_Records();
    

    First thing I noticed when I put it through a formatter was that there seemed to be a missing bracket, I’ve added an extra one before the document.saveAllChanges() line and everything balanced out. I used Visual Studio Code to reformat it but I also found an online JavaScript beautifier as well.

    Next, much of the script makes sense to me, and other parts seem somewhat circular. Could you look what I have done and see where I messed up? …because I think I did a good job on that : )

    I need the script to look up each record that aligns with the PDF file name in the cvs file, and then update the records in the main database form called “Contracts”

    Now I see you added an extra call:

    addsubcategory (Contracts, subcategory_id, entry.getFieldValue(csvKeyword_id));
    

    What this change is doing is trying to call a function here called addsubcategory but that function doesn’t exist. I think you’re trying to copy the addKeyword line for a new field and if you look at the very top of the script there is a function addKeyword which defines or creates the function. If what you’re trying to do is create a comma separated list of sub categories but in a different field, then I don’t think you need to do a new function, I think you can just call addKeyword with the sub category field ID:

    addKeyword (Contracts, subcategory_id, entry.getFieldValue(csvKeyword_id));
    

    Try fixing the curly braces and changing addsubcategory over to addkeyword as above and see if that gets you a little closer.

    November 15, 2020 at 7:02 PM #42612

    Brent S
    Participant

    Hi Sam,

    Thank you for the help and feedback. I am getting close to getting it to work, but still a bit shaky in terms of my understanding of things. But first things first, let me answer your questions:

    Curious for feedback, were any of them helpful?

    Yes. Absolutely! Thank you for making them. The original “Creating a Licensing Database” was great. One of the other videos on Scripting (Deep Dive) helped me make sense of the fld-hash codes, as I had no idea how Daniel had come up with that until I watched your video.

    What was not helpful and what would improve it?
    I like when videos can compress content in 5-10 minutes. Better to break up videos into smaller digestible chunks and not to try to cover too many topics (IMHO)

    What was helpful and how was it helpful?

    I really like how you went through the Javascript Code line by line to explain how the computer is interpreting the code.

    I had thought about doing another simple intro one on the next few snippets which includes functions but haven’t done it. I tried another video on some other topics but wasn’t happy with how it turned out.

    This would be very helpful. For me, an example where you had two very simple forms and you wanted to use a particular Snippet like “Set Field Value”. Ideally the video would explain what you are trying to do, why you are trying to do it, write the code and run the Snippet, show the effect that it had. Ideally the length of the video would be approximate 5 min. For a beginner like me I need to see a very basic version of something working so that I can alter it for my own use. Once I get the simple version working, hopefully then I can build on that for things that are more complex.

    There is your feedback : ). And thanks again for making the videos.

    Now, a small request for additional help…..

    I have not been able to debug the script that I have to get it to work. I am wondering if you can help me vastly simplify the script just to doing one or two things instead of about 4 things at the same time. Basically I would like to start over. So, as I said, I only have two forms. One is my main database called “Contracts” which has thousands of unique PDF’s attached in a file attachment field. The second form (which I named csv) is a sub-list of those PDFs (about 1200 of 36000) that I want to tag with the the word “vaccine”.

    Therefore, I need a script that will take my small list of 1200 pdfs, find the corresponding/matching file in the larger Contracts database, and insert an entry “vaccine” into the field “Agreement Keywords”. Seems like this should be simple.

    If I can just accomplish that then I can change the code later to take action in other fields.

    Key information

    The location of the field I need to access in the contracts database:
    var pdffilepath_7_id = ‘fld-3ea77ccc18604174b722ece105965c44’;
    var agreement_keywords_id = ‘fld-a2126cbe512646e9ba144fb5a90b06dc’;

    The location of the data in the csv database:
    var keyword_id = ‘fld-53de1c4c633a4da6a4d0213158b5ef0a’;
    var pdf_id = ‘fld-0cbd22a473cd4c58aa1dc47341a7157b’;

    I think this ability to have scripts auto-populate fields is critically important to anyone like me that has a massive database of documents that are unstructured and need to be tagged and categorized within Tapforms.

    Thanks,
    Brent

    November 16, 2020 at 11:57 AM #42618

    Daniel Leu
    Participant

    In hindsight, it would have been better if you had described your problem as is and not made up this hypothetical recipe case….

    As you might have noticed, all CSV form related fields use the a csv prefix to make it clear where this field resided. And yes, my script is a from script that is part of the contracts form. My assumption was that the CSV form is deleted after use and recreated with a new tagging set. So it makes sense that the script has a more permanent location.

    Here is the updated script. Hope this works.

    function addKeyword(rec, kw_id, kw){
    	var keywords = rec.getFieldValue(kw_id);
    	
    	if (keywords == undefined || keywords == ""){
    		rec.setFieldValue(kw_id, kw);
    	} else {
    		rec.setFieldValue(kw_id, keywords + ',' + kw);
    	}
    }
    
    function Update_Records() {
    
    	var csvForm = document.getFormNamed("csv");
    	var contractsForm = document.getFormNamed("Contracts");
    	
    	// recipes
    	var keyword_id = 'fld-a2126cbe512646e9ba144fb5a90b06dc';
    	var pdf_id = 'fld-3ea77ccc18604174b722ece105965c44';
    
    	// get csv field Ids
    //	var csvPdf_id = csvForm.getFieldNamed('PDF').getId();
    //	var csvKeyword_id = cvsForm.getFieldNamed('Keyword').getId();
    
    	// get use fixed field Ids!!!!!!!!!!!!
    	var csvPdf_id = 'fld-0cbd22a473cd4c58aa1dc47341a7157b';
    	var csvKeyword_id = 'fld-53de1c4c633a4da6a4d0213158b5ef0a';
    
    	// Loop over all csv entries
    	for (entry of cvsForm.getRecords()){
    		// get pdf file name of CVS record
    		var cvsPdfName = entry.getFieldValue(csvPdf_id);
    
    		// replace spaces with underscores
    		cvsPdfName = cvsPdfName.replace('/ /g', '_');
    		console.log("Processing: " + cvsPdfName);
    		
    		// Loop over all contract records
    		for (contract of contractsForm.getRecords()){
    			// this assumes that there is only one PDF per recipe!
    			var pdfName = contract.getFieldValue(pdf_id)[0].filename;
    			
    			if (cvsPdfName == pdfName){
    				console.log("Found match: " + pdfName);
    				
    				// Update contract record
    				addKeyword(contract, keyword_id, entry.getFieldValue(csvKeyword_id));
    			
    				break;
    			}
    		
    		}
    		document.saveAllChanges();
    	
    	}
    	
    	return 'done';
    }
    
    Update_Records();
    
    November 16, 2020 at 3:04 PM #42621

    Brent S
    Participant

    Hi Daniel,

    You are 100% correct. I realized once I saw the complexity of the answer that a hypothetical example was not the way to go. Noob mistake.

    The code is almost working. Its at least now starting to process files, but then encounters the following error. (See attached).

    I went through the code and replaced some entries that had “cvs” instead of “csv” and replaced the word “contract” with “Contracts” just to be consistent (not sure if the last change mattered or had any effect), but still have the same error.

    Thanks again. I hope this helps many people on the forum in addition to myself.

    Cheers,
    Brent

    Attachments:
    You must be logged in to view attached files.
    November 16, 2020 at 5:33 PM #42623

    Daniel Leu
    Participant

    I thought I caught all cvs typos…

    There is a loop that goes over all contracts. So contract points to one entry. It makes more sense to use the singular form in this case.

    The error points to var pdffilepath_7_id = ‘fld-3ea77ccc18604174b722ece105965c44’; It was my understanding that this would be the PDF file attachment field.

    Could you share your form template? (file -> export -> form template). Thanks.

    November 16, 2020 at 8:06 PM #42624

    Brent S
    Participant

    Hi Daniel,

    I am not comfortable posting my entire db structure online. I would send it privately.

    The location of the PDF files in the main contracts database is indeed:
    var pdffilepath_7_id = ‘fld-3ea77ccc18604174b722ece105965c44’;

    I just checked it again.

    There are 35,747 entries (PDF’s), so it is only one field, but there are many entries.

    I can share the csv form publicly with you if that helps to see the other end.

    I have included a slice of the db that shows PDFFILEPATH 7 so you can indeed see that is where the different PDF Files are being held.

    Thanks,
    Brent

    Attachments:
    You must be logged in to view attached files.
    November 16, 2020 at 8:35 PM #42627

    Sam Moffatt
    Participant

    I’d also to check that there is a file attachment and skip ones that don’t:

    		// Loop over all contract records
    		for (contract of contractsForm.getRecords()){
    			// this assumes that there is only one PDF per recipe!
    			var pdfName = contract.getFieldValue(pdf_id)[0].filename;
    

    to:

    
    		// Loop over all contract records
    		for (contract of contractsForm.getRecords()){
    			if (!contract.getFieldValue(pdf_id) || contract.getFieldValue(pdf_id).length == 0) {
    				console.log('Record is missing PDF: ' + contract.getUrl());
    				continue;
    			}
    			// this assumes that there is only one PDF per recipe!
    			var pdfName = contract.getFieldValue(pdf_id)[0].filename;
    

    That should check there is an attachment and validate it’s not set plus log a message with a clickable link in the console. If that ends up being everything then that’s not great but it wouldn’t bomb part way through.

    There are 35,747 entries (PDF’s), so it is only one field, but there are many entries.

    Many records or a single record with multiple PDFs?

    Re feedback: The last few videos hit half an hour and I need to work on getting the content back in the 10 minute range. I ended up throwing a bunch of stuff in one of the videos which when I went back should have probably ended up broken out into a few videos (ended up putting in chapter links for at least one of them). I’m trying to keep them as continuous takes to make sure I don’t miss any steps along the way though sometimes that leads to mild confusion as it’s also not scripted/practiced. Sometimes seeing something small and seemingly unimportant is the hook you need to understand. Thanks for the feedback!

    November 16, 2020 at 10:32 PM #42629

    Brent S
    Participant

    Hi Sam,

    Thanks. Almost there. So close I can taste it. See error message attached.

    I thought perhaps I am missing one line that starts with something like:

    var csvPdfName = csv.getFieldValue(csvPdf_id)[0].filename;

    However, I tried this and it did not work and just gave me another message “Category Script: ReferenceError: Can’t find variable: csv, line:(null)

    With respect to your question, there are 35,747 records. Each has one PDF associated with it.

    Cheers,
    Brent

    Attachments:
    You must be logged in to view attached files.
    November 17, 2020 at 1:28 AM #42632

    Daniel Leu
    Participant

    Your code is a bit messed up with two contract loops…

    unction addKeyword(rec, kw_id, kw){
    	var keywords = rec.getFieldValue(kw_id);
    	
    	if (keywords == undefined || keywords == ""){
    		rec.setFieldValue(kw_id, kw);
    	} else {
    		rec.setFieldValue(kw_id, keywords + ',' + kw);
    	}
    }
    
    function Update_Records() {
    
    	var csvForm = document.getFormNamed("csv");
    	var contractsForm = document.getFormNamed("Contracts");
    	
    	// recipes
    	var keyword_id = 'fld-a2126cbe512646e9ba144fb5a90b06dc';
    	var pdf_id = 'fld-3ea77ccc18604174b722ece105965c44';
    
    	// get csv field Ids
    //	var csvPdf_id = csvForm.getFieldNamed('PDF').getId();
    //	var csvKeyword_id = csvForm.getFieldNamed('Keyword').getId();
    
    	// get use fixed field Ids!!!!!!!!!!!!
    	var csvPdf_id = 'fld-0cbd22a473cd4c58aa1dc47341a7157b';
    	var csvKeyword_id = 'fld-53de1c4c633a4da6a4d0213158b5ef0a';
    
    	// Loop over all csv entries
    	for (entry of csvForm.getRecords()){
    		// get pdf file name of CSV record
    		var csvPdfName = entry.getFieldValue(csvPdf_id);
    
    		// replace spaces with underscores
    		csvPdfName = csvPdfName.replace('/ /g', '_');
    		console.log("Processing: " + csvPdfName);
    		
    		// Loop over all contract records
    		for (contract of contractsForm.getRecords()){
    			if (!contract.getFieldValue(pdf_id) || contract.getFieldValue(pdf_id).length == 0) {
    				console.log('Record is missing PDF: ' + contract.getUrl());
    				continue;
    			}
    
    			// this assumes that there is only one PDF per contract!
    			var pdfName = contract.getFieldValue(pdf_id)[0].filename;
    			
    			if (csvPdfName == pdfName){
    				console.log("Found match: " + pdfName);
    				
    				// Update contract record
    				addKeyword(contract, keyword_id, entry.getFieldValue(csvKeyword_id));
    			
    				break;
    			}
    		
    		}
    		document.saveAllChanges();
    	
    	}
    	
    	return 'done';
    }
    
    Update_Records();
    November 17, 2020 at 4:56 PM #42636

    Brent S
    Participant

    Daniel and Sam,

    Thank you! Thank you! Thank you!

    Success!!

    Like Daniel mentioned in his original response, the process is a bit tedious. Indeed this is true, took my very fast computer (new iMac with 64G RAM) about 6 hours to complete this task. I was expecting about 6 minutes. So, I am going to need to find a more efficient way to do this. It might involve putting the full text of the PDF documents into a text field so that Tap Forms can just use its own search capability to pluck out keywords associated with a document within a record. (I also have full versions of all the contracts in HTML format which may be easier).

    It may also be there is a more efficient way the code could be written because I think this went though all 35,747 files in the main database rather than looking at the approximately 1200 files I wanted to tag. This was because I noticed I was getting many “empty record” notices in the console when I know for a fact each entry in the database has a PDF file in it.

    In any case, I am LOVING Tap Forms and looking forward to learning how to use these built in snippets along with customized scripts. Thank you Tap Forms community.

    November 17, 2020 at 5:38 PM #42637

    Daniel Leu
    Participant

    Brent, great to hear that it worked, finally!

    Is there a way you can select the 1200 file records with the advanced search? If that’s the case you can select the saved search and just run the script with a little change:

    Before:
    for (contract of contractsForm.getRecords()){

    After:
    for (contract of search.getRecords()){

    It is strange to get the ’empty records’ notice although there should be a file associated with every record. That’s something I would look into.

    Fetching the file record might be what takes a bit of time. If you run this script frequently, it might be worth a try to save the PDF filename in a field and then just use this field in the comparison.

    Oh, don’t forget to backup your database, specially with so many entries!

    I might add a little sanity check as well to verify that a record was really found:

    		var success = 0;
    		// Loop over all contract records
    		for (contract of contractsForm.getRecords()){
    			if (!contract.getFieldValue(pdf_id) || contract.getFieldValue(pdf_id).length == 0) {
    				console.log('Record is missing PDF: ' + contract.getUrl());
    				continue;
    			}
    
    			// this assumes that there is only one PDF per contract!
    			var pdfName = contract.getFieldValue(pdf_id)[0].filename;
    			
    			if (csvPdfName == pdfName){
    				console.log("Found match: " + pdfName);
    				success = 1;
    				
    				// Update contract record
    				addKeyword(contract, keyword_id, entry.getFieldValue(csvKeyword_id));
    			
    				break;
    			}
    		
    		}
    		if (!success) {
    			console.log("No match found: " + csvPdfName, "#FF0000");
    		}
    November 18, 2020 at 1:09 AM #42640

    Sam Moffatt
    Participant

    Some of the slow down is that you’re doing two loops and if you can control it a little more you might be able to create a unique key for one of the records and get rid of one of the loops. That would require making sure you imported your documents properly and maybe some sort of hash collision system. Obviously if you can reduce your candidate set to a search then that would reduce your loops down as well.

    December 12, 2020 at 7:54 PM #42832

    Brent S
    Participant

    Hi Sam and Daniel,

    In order to get rid of one of the loops I could move everything to the script. That is, instead of having multiple columns with different data in my CSV file that need to be looked up and replaced,

      I could reduce the CSV file to a single column that only has the PDF name

    . That might solve a number of problems (not only speed, but the ’empty records’)

    Then, all the script needs to do is a) find the matching record from the CSV form in the database that has the same PDF name, b) run the rest of the script to populate the fields in that record, and c) repeat for other records in the CSV file until finished.

    I know that if I did it this way the end of the script should look something like this if there were two fields to populate.

    function Update_Records() {

    var csvForm = document.getFormNamed(“csv”);
    var contractsForm = document.getFormNamed(“Contracts”);

    // specifies type of agreement
    var type_of_agreement_id = ‘fld-6af499ed439143b297828643341d939b’
    record.setFieldValue (type_of_agreement, ‘Licensing Agreement’)

    // This should “replace” keyword or keywords in a selected record
    var agreement_keywords_id = ‘fld-a2126cbe512646e9ba144fb5a90b06dc’;
    record.setFieldValue (agreement_keywords_id, ‘Vaccine, Animal, Bacterial’)

    Could you show me how you would take the original code we had above and simplify the beginning of the script and modify it so it works with the code I have here?

    Thanks very much for any insight you can provide.

    Cheers,
    Brent

    December 12, 2020 at 9:44 PM #42833

    Brent S
    Participant

    This is how I tried to fix, but still have the two loops:

    function Update_Records() {

    var csvForm = document.getFormNamed(“csv”);
    var contractsForm = document.getFormNamed(“Contracts”);

    // Contracts
    var keyword_id = ‘fld-a2126cbe512646e9ba144fb5a90b06dc’;
    var pdf_id = ‘fld-3ea77ccc18604174b722ece105965c44’;

    // get use fixed field Id
    var csvPdf_id = ‘fld-4bcd0f1fba5c4178bb8d10a112b17489’;

    // Loop over all csv entries
    for (entry of csvForm.getRecords()){
    // get pdf file name of CSV record
    var csvPdfName = entry.getFieldValue(csvPdf_id);

    // replace spaces with underscores
    csvPdfName = csvPdfName.replace(‘/ /g’, ‘_’);
    console.log(“Processing: ” + csvPdfName);

    // Loop over all contract records
    for (contract of contractsForm.getRecords()){
    if (!contract.getFieldValue(pdf_id) || contract.getFieldValue(pdf_id).length == 0) {
    console.log(‘Record is missing PDF: ‘ + contract.getUrl());
    continue;
    }

    // this assumes that there is only one PDF per contract!
    var pdfName = contract.getFieldValue(pdf_id)[0].filename;

    if (csvPdfName == pdfName){
    console.log(“Found match: ” + pdfName);

    // Update contract record
    var type_of_agreement_id = ‘fld-6af499ed439143b297828643341d939b’
    record.setFieldValue (type_of_agreement, ‘Licensing Agreement’)

    // This should “replace” keyword or keywords in a selected record
    var agreement_keywords_id = ‘fld-a2126cbe512646e9ba144fb5a90b06dc’;
    record.setFieldValue (agreement_keywords_id, ‘Vaccine, Animal, Bacterial’)

    break;
    }

    }
    document.saveAllChanges();

    }

    return ‘done’;
    }

    Update_Records();

    December 13, 2020 at 8:54 PM #42841

    Sam Moffatt
    Participant

    If you were to aim to optimise the script, I’d probably want to have a map of PDF name to Tap Forms record ID. You can do a retrieve of a record by key/record ID via the script so if you precomputed a map of these keys or used the record ID import mode for Tap Forms to import the records then it’ll be a single retrieve for retrieving the record (form_record_id is the field in the CSV file which if present will be prepended with a rec- prefix and allow you to use form.getRecordWithId). The horse has likely already bolted for this one but something to consider for future projects, so let’s go ahead with the other suggestion: PDF attachment name in a script field.

    If you have the PDF attachment name in a script field, you could use the getRecordFromFormWithKey function in my script manager. The method signature looks like this:

    getRecordFromFormWithKey(formName, keyFieldId, keyValue, createIfMissing = true, earlyTermination = true, alwaysScanOnMiss = false)
    

    It creates in Javascript an index on a first pass and then does looks up from there (set earlyTermination to false and alwaysScanOnMiss to true). When you’re in a tight loop this should improve performance by doing the record loop once to build the index and then you can do a similar sort of roughly constant time retrieval for matching records. It’s still two loops but you remove the nesting and replace it with a more optimised data structure.

    First step is to create a new script field in your Contracts form and in it you want to add in the code we use to grab the filename from the PDF:

    record.getFieldValue('fld-e870153bd0a647c9823ca998c213d1fd')[0].filename;
    

    You’ll need to tick the “Update records when saving” option when creating that to populate the new field in all of your record. Essentially we’re just copying the filename as a new field. At this point you might also want to setup a search to look for records that have this field empty because that seems likely a bug. You’ll also need the field ID from the form panel (underneath the “Description” field) which I refer to later as “fld-scriptfieldidhere” that you’ll need to change.

    Using the code you provided as a basis, something like this should get you some of the way there:

    document.getFormNamed('Script Manager').runScriptNamed('getRecordFromFormWithKey');
    
    function Update_Records() {
    
        var csvForm = document.getFormNamed("csv");
    
        // Contracts
        var keyword_id = 'fld-a2126cbe512646e9ba144fb5a90b06dc';
        var pdf_id = 'fld-3ea77ccc18604174b722ece105965c44';
    
        // get use fixed field Id
        var csvPdf_id = 'fld-4bcd0f1fba5c4178bb8d10a112b17489';
        var csvKeyword_id = cvsForm.getFieldNamed('Keyword').getId();
    
        // Loop over all csv entries
        for (entry of csvForm.getRecords()) {
            // get pdf file name of CSV record
            var csvPdfName = entry.getFieldValue(csvPdf_id);
    
            // replace spaces with underscores
            csvPdfName = csvPdfName.replace('/ /g', '_');
            console.log("Processing: " + csvPdfName);
    
            contract = getRecordFromFormWithKey("Contracts", "fld-scriptfieldidhere", csvPdfName, false, false, true);
    
            if (!contract) {
                console.log('Unable to find matching PDF record: ' + csvPdfName);
                continue;
            }
    
            console.log("Found match: " + csvPdfName);
    
            // Update contract record
            var type_of_agreement_id = 'fld-6af499ed439143b297828643341d939b'
            contract.setFieldValue(type_of_agreement, 'Licensing Agreement')
    
            // This should “replace” keyword or keywords in a selected record
            var agreement_keywords_id = 'fld-a2126cbe512646e9ba144fb5a90b06dc';
            contract.setFieldValue(agreement_keywords_id, entry.getFieldValue(csvKeyword_id));
        }
        document.saveAllChanges();
    
        return 'done';
    }
    
    Update_Records();
    

    We’ve still got the outer loop that iterates over the incoming CSV form, the “Contracts” loop is removed and replaced with the call to getRecordsFromFormWithKey which internally does a loop for you but also builds an index in memory that should make subsequent accesses quicker. Watch your memory usage in Activity Monitor but you’ve got 64GB of RAM so we should be fine. The script is finding the record for you so the code that lived inside your inner for loop moves up a level though there is a check to see if we got a record back. I did some minor changes to set the keyword based on the CSV form as an example of how I think that to work.

    One thing I’ve done in scripts like this is put in a field for when it was processed to be able to skip it and also a counter to limit how many records are processed. This is useful for debugging scripts and being able to progressively execute it to get an idea of it:

    let minTime = new Date().getTime() - 86400000; // one day
    let processed = 0;
    let skipped = 0;
    
    for (...) {
    
        let previousDate = entry.getFieldValue('fld-previousdateidhere');
        if (previousDate && previousDate.getTime() > minTime) {
            skipped++;
            continue;
        }
    
        entry.setFieldValue('fld-previousdateidhere', new Date());
    
        processed++;
    
        if (processed > 100) {
            break;
        }
    
        // your existing for loop logic here
    }
    

    This uses an extra field and updates when the script is run to set it that way the next time you run the script, it skips past records it’s already “processed” so to speak and moves onto the next batch of 100. Instead of handling the date logic in code, you could also tie this into a saved search as well so that Tap Forms only gives you the candidate records.

    December 14, 2020 at 1:07 PM #42847

    Daniel Leu
    Participant

    Loading the PDF attachments into memory should provide a nice performance boost. It would be interesting to see how the run times compares between the two versions.

    December 18, 2020 at 9:34 AM #42898

    Brent S
    Participant

    Thanks guys. Once I get it working I will report back on performance and then close this thread. Appreciate all the support.

Viewing 20 reply threads

You must be logged in to reply to this topic.