May 16, 2016 (6.5 hours) – Last Day
Since it was my last day doing web-archiving at FARL I tried to wrap up the three sites I have been working on: The William Blake Archive, The Gilded Age, and The Frick Collection. The latter will not be truly completed as FARL will continue to crawl and capture every month. I, however, at least wanted to leave it in a place where it could easily be picked up again in the future. I left the pages in the following state:
William Blake Archive — The vast majority of this site has been crawled and captured successfully. I have found one final host page that should be crawled in order to capture the entire journal. That page can be found here. Overall the site should not need to be re-crawled often, I would say annually would be best in order to make sure that all updates are captured. However, I would suggest that the aforementioned page should probably be captured biannually as it is regularly updated with new journal entries.
Gilded Age — The double layering issue continues. None of what I patched crawled last week seemed to fix the issue.I had been trying to crawl this site hoping it would capture all of the images as that is where they all have to be initially uploaded (Victoria previously explained this to me since she was the sites creator) but have be unable to do so successfully. Since it is my last day and would rather someone not waste as much time on this issue as I have I sent in a support ticket to Archive-It asking for the recommendations (#4447). Hopefully they are able to figure something out on the back end.
I continued to try to patch crawl what I could from the remainder of the site but really only found the same missing content as before. Thus re-crawling what I crawled last week, hoping that it actually captures in this patch. The site will need to be worked on more until the issue of the double layered images can be remedied. However, since this is an online exhibition it will not be updated and shouldn’t need to be crawled again in the future.
The Frick Collection — Since capturing and QAing this site is a continuous task, today I focused on making sure all the rotating exhibitions and programs have been captured in this months crawl. There were no major issues and patch crawling seems to have run smoothly. This site will need to be crawled next month (as scheduled) in order to insure any changes made to the site in the month of May has been recorded.
The main problem continues to be dynamic content; virtual tours added to the website beginning in June 2015, which Archive-Its crawler cannot capture, and most embedded video, which may be captured but always fails to playback.
Finally, I took one more look at the Fricks websta (web instagram viewer) and it appears that the patch crawling I preformed a few weeks ago helped. There is no longer the issue with formatting or the quadrupling of images.
Outstanding Support Tickets
#3557 – Victorian Web, missing URLS not registering for patch crawls.
#3947 – Library of William Morris, missing URL’s not loading
#4163 – William Morris/Victorian Web, show as one seed in the crawl report
#4447- Gilded Age, double layered images
May 9,2016 (7.5 hours)
Today, I came in to some bad news. As is par for the web archiving course, the solution I though I found for the double layering of the Gilded Age site didn’t patch as I thought it would. I am now trying one alternative route before I send in a support ticket. I spent the rest of the day trying to patch crawl this issue.
May 2, 2016 (7.5 hours)
QA’ing on the Gilded Age and Frick sites continue. I think I finally figured out how to solve the issue caused by the double layering of photographs on the Gilded Age site. It requires me to click on the link to the actual Frick page (located under the image credits) to crawl that page, therefore allowing the original image to appear it large format without the issues of only capturing the
Today, Megan and I had our presentation to the FARL staff describing what we have done since August. Since the audience has already heard about web archiving several times before from previous interns we decided to only briefly talk about the QA process, instead focusing on the issues we have personally came across while archiving. In addition, we included several slides about our other various projects we have completed during our time here, but not for FARL, tying in our experiences with those we have had while here.
After our presentation we were able to get a tour of the Digital Lab lead by Luciano Johnson and the digitization team.
April 25, 2016 (7.5 hours)
Megan and I were extremely busy today. On top QA’ing the Gilded Age site this morning, we conducted an interview with Lousia to hear about her background and work in Digital Art History. Subsequently, we met with Sumitra to discuss our upcoming presentation to FARL — she had ask us to talk less about QA and the common web archiving topics as this audience has heard a similar presentation every year. Finally, this afternoon we were able to attend NYARC’s 10th anniversary party at MoMA.
April 18, 2016 (7.5 hours)
Today I continued QA’ing the William Blake Archiving, discovering that there is a word press blog linked to the account, which illustrates the process that was followed to create the site. I also worked on the Gilded Age site which is proving to be extremely challenging due to the general dynamic structure. I have been able to pull few images and PDF’s but the resolution in some of these visuals is not up to the original, live, standard. I am going to continue on today hoping that I can patch some of these issues today but if not I will be sending in a support ticket.
This afternoon Megan and I were lucky enough to get a tour of the archives lead by Sally Brazil and Julie Ludwig. It was wonderful to see the full scope of their collection and talk to them about their experiences with various user groups and how that relationship has changed over time.
April 11, 2016 (7.5 hours)
Archive-IT seems to be running better this week. I am still preforming QA on the William Blake Archive, Gilded Age, Frick sites. Today I expanded from the Frick’s main site to their web-based Instagram feed. As you can see from the image below, Archive-IT had a difficult time crawling and capturing the social media feed requiring me to preform a number of patch crawls in order to render the archived version as it appears on the live site.
April 4, 2016 (6.5 hours)
QA procedures continue for the William Blake Archive and the Gilded Age sites. However, Archive-IT is running extremely slow today, making things very difficult. Audrey, Megan and I also submitted our proposal for the upcoming ARLIS webinar. Subsequently, later in the afternoon, Megan and I conducted another informational interview for our project, with the Assistant Librarian for Public Services Ralph Baylor.
March 28, 2016 (7.5 hours)
This morning I began QA on the site “Documenting Art Collections in Gilded Age New York”, a microsite of the Frick created by Victoria Pilato. Maintaining a highly dynamic form a vast majority of the site was not captured in the initial crawl; including all the sites images, videos, and interactive capabilities, which in turn affected the sites overall visual structure. Luckily a large portion of these lost materials rendered during QA and I was able to patch crawl them. I then continued QA on the William Blake Archive.
In the afternoon we (Megan, Audrey, Sumitra and I) met in Debbie’s office to watch part 4 of the webinar series we have recently been attending entitled “So you want to build a web archive”.
March 21, 2016 (7.5 hours)
Today, I continued to preform QA on the Frick’s website.
In addition Megan and I met up with Audrey (another NYARC web archiving intern) to discuss submitting a proposal on web archiving to ARLIS for a virtual conference this May. This afternoon we attended a SIG + ArchiveIt users group meeting, held at the Metropolitan New York Library Council (METRO). There were several speakers discussing various aspects of web archiving, including:
- Jefferson Bailey & Karl-Rainer Blumenthal – Updates from the Internet Archive
- Alex Thurman – IIPC collaborative collection
- Web Archiving Panel
- Lorena Ramirez-Lopez – NYU
- Stephen Klein – CUNY
- Morgan McKeehan – Rhizome/NDSR-NY
- Martha Tenney & Charlotte Kostelic – Barnard College
March 14, 2016 (7 hours)
This morning, after reading and responding to various Support Tickets I had filed last week, I was able to check the crawl report for the William Morris Library and Victorian Web. They were both fairly successful, though the Library of William Morris will require additional crawls in the future as it seems to be undergoing constant updates.
We were then lucky enough to attend a staff tour of the new Van Dyck exhibition. Afterwards Megan and I met to discuss our Practicum project.
Throughout the rest of the day I worked on QA’ing the Frick’s new crawl, sending in a few patch crawling to pick up missed URL’s.
Archive-It began to act-up after lunch, slowing down our QA procedures by stating that “The Resource you have requested is temporarily unavailable. Please try again later”. Since Megan and I are both receiving the same error report we can only assume it is a problem on Archive-Its side. It did resolve itself eventually, but we were unable to QA for almost an hour. It is these kinds of issues which plague web archiving, those that cannot be explained and often fix themselves with a little patience and luck.
March 7, 2016 (7.5 hours)
Today, when arriving to the Frick I first set out to complete the tasks which had been emailed to me since my shift last week. This included adding more potential websites to the list of possible web archived material and sending in a support ticket for the crawl Sumitra had previously ran on the Library and William Morris and the Victorian Web. As you can see on the ticket below (#4163), the results from the crawl were only rendering seed URL’s from one website even though the report indicates it belongs to both cites, having been crawled together.
For the rest of the day I continued to QA the Frick’s website and add more updates to the spread sheet I worked on last week of various Auction House websites which have been put forth as possible cites to archive. I was unable to preform any patch crawls as there is apparently an issue with the Archive-It server, but I have sent in a support ticket so hopefully it will be taken care of quickly (#4166).
Sumitra also forwarded me the link to Rhizome’s beta web recorder tool which allows the user to archive as they progress through any given website.
February 29, 2016 (7.5 hours)
This morning before conducting our interview with Mark Bresnan, I registered for a Web Archiving users group meeting hosted by METRO. I am especially excited as I very rarely get to attend workshops outside of my scheduled shift due to an overall hectic work and class schedule. I am very much looking forward to hearing from other members of the community who I have not yet had the pleasure of meeting with.
I was also asked to type up a spread-sheet for the department, organizing information related to various auction house websites in order to decide if each cite should be catalogued or web archived by FARL.
Megan and I were invited this afternoon to watch the METRO webinar hosted by Sumitra which revolved around the description and access of web archived materials. It was exciting to not only hear Sumitra but to be able to listen to Lily Pregil, of NYARC, and Alex Thurman, of Columbia University, speak about the current status and issues surrounding web archiving.
February 22, 2016 (7.5 hours)
Cataloging websites into OCLC continues, making slight alterations according to changed standards to previously entered records. Subsequently I began QA’ing the Frick’s website, focusing primarily on the digital content including virtual tours, images, and embedded videos; patch crawling those which were not captured initially.
Sumitra and I also discussed re-crawling several websites I had previously QA’d in order to capture their updated information: The Victorian Web and The Library of William Morris.
February 8, 2016 (7.5 hours)
Discussions with Karl regarding the William Morris Library website continue. Having sent additional screen-shots last week he is now able to talk to the engineers about finding a possible solution. I did however continue to QA the site, making sure there was no visible issues before setting the site aside until the aforementioned issue has been rectified.
I then began working on cataloging more websites into OCLC and QA’ing the William Blake Archive.
Finally I was able to watch the webinar discussing Archive-It 5.0 which discuss the updates and changes made to the web-archiving platform.
- User Interface
- Easier and more efficient to manage collections
- Information Architecture
- Back-end, more powerful, quicker communication
- Seed Types
- Adjusted existing (standard type)
- New (standard +)
- Capture everything in standard plus one extra layer of external links
- Seed Crawl Histories
- Able to see list of crawls a specific seed has been in
- Seed Level Scoping
- Previously available at the collection lever
- Easier to add expansions or restrictions to specific seeds, will only apply to URLs from those seeds.
- Revamped Help Center
- Easier access
Features in 4.9 which have been superseded by others in 5.0:
- QA Report
- Introduced before Umbra, no longer necessary
- Superseded by Actionable Host Report
- Scope-It Crawler Explorer
- Superseded by Actionable Host Report
- Document Budgets
- No longer focusing on Document budgets, only Data Budget.
February 1, 2016 (7.5 hours)
I heard back from Karl regarding the ticket I sent in for the William Morris Library. The issue persisted into the day but I was assured I would receive word when the issue was resolved or at least looked into fully. Since I was unable to work on that site, I began working on the Frick’s own website which had recently been passed to me from another intern.
We then had a web archiving team meeting to discuss what we have been working on, the issues we have been encountering, and new developments in the field.
Subsequently I edited my previous records from last week and catalogued several more, adding the finalized ones to NYARC’s overarching database.
January 25, 2016 (7.5 hours)
Today Archive-It, the web archiving software employed by NYARC, was not working very well most of the day. While issues are typical, especially in the morning before the LA office begins their daily work, the problems experienced today made it difficult to QA any of the captures from the William Morris Library. The number of missing URL’s were increasingly rapidly upon refreshing (ranging from 9-130). I sent in a support ticket to Karl at Archive-It.
Additionally, I was instructed on cataloguing procedures and began creating new MARC file records in OCLC Connection for various art gallery websites.