Upgrading Evergreen: A Case Study
Managing a Software Upgrade
Discoveries in Digital Humanities: How Evergreen Contributes
Josh Berkus (with help from Anoop Atre) -
Slides are here.
- Postgres SQL Expert, some familiarity with Evergreen.
- Covering common task administration (installation, configuration, security, backing, etc.) -- core topics
- Due to time, the talk will only apply to newer versions of Postgres. No promises on older versions.
- Will not be talking about indexes or application stuff, also due to time.
- This presentation is for people who have to work on servers and are then expected to be database experts. Presentation slides available here: http://pgexperts.com/tutorials.html
- Database use has grown and the supply of DB admin has not.
- Hence, people outside of the technical area are also being asked to handle database.
- PostgreSQL documentation is extensive. The goal is to teach you what you want to know and what it is called so that you can consult the very long technical topics.
Key Terms/Touch Words/Slang
- DevOps/Clouds, Efficiency = No we’re not going to hire a DBA.
Changing Playing Field
- DBAs are not something admins are willing to support. As cuts come in, technical staff job descriptions begin absorbing other job titles.
- Philosophy: Don’t PANIC!
- Many use PostgreSQL without full training. The body of knowledge is incredibly large. What you need to run postgres on Evergreen is smaller than this. There is hope.
- Postgres does not need to be a big dangerous elephant. This elephant can be small and friendly =)
- Most will be using Catalyst, SITKA, whatever.
- If you are not doing this as a service, then you will need to install it yourself.
- The answer to this situation is that you want to use packages.
- There are plenty of good ones for major Linux installs. Red Hat, CentOS,...
- If you are not concerned with having the most updated installs, they may already come installed.
- If you do need the most recent versions, there are repositories for the most popular GNU/Linux variants:
- Red Hat: yum.postgresql.org
- Ubuntu: apt.postgresql.org
- SuSE: build service
- Debian: apt.postgresql.org
- For Windows and OS X (not recommended server platforms, but you may not have a choice) -- there is a graphical installer for these OS.
- Not suggested to use the graphical installer for Linux. It may get the paths wrong, etc.. However, for Windows and OSX, the graphical are the best options.
- Nice thing about graphical is that it lets you install pgAdmin and Post JS and other bells and whistles. However, does not believe that the graphical will install all the components that Evergreen needs.
- Solaris 10/11
- OpenSolaris/ Illumos
- Solaris 9
- “Home” Windows
Solaris Packages exist -- longer list in slides.
- You do not always want to have a data directory to be in the original install. You may want it to be in a specific location or drive. So use the NTDB to create a data directory.
- Shows a demo on creating skeleton directories using this tool.
- It creates folders that are created by default; do not mess with these folders. It can crash Postgres if you delete or mess with them.
- Will cover the .conf files later in presentation
- Some store activity logs in the data directories. Put them somewhere else, like where you store other directories for other systems.
- These directories can be anywhere. Postgres doesn’t care as long as it has read write access and knows which directory to look for.
- If it is on a sandbox or anywhere else, you just need the correct location
Major interfaces -- Command Line
- What presenter is familiar with; can grow and shrink the text, which cannot do on the graphical interface.
- You can use pgAdmin if you like graphical tools for postgres. This can be handy because there are a lot of tables.
- However it is client/server so you can’t remote SSH.
- The command line has a lot of good number of commands. /? gives you a command list
- Examples: You can access all the tables and parameters for tables. All of these can be brought up using the special commands.
- PSQL is the most complete interface and we know this because it is what the Postgres hackers use.
- Postgres is not just a database it is dev platform. We give a code to run at run time to extend the functionality of the Database. For Perl users, this is like Cpan.
- The important thing from adn admin standpoint, you need to install extensions separately
- Evergreen requires 4 extensions -- Plpgsql (standard issue now); hstore (key value storagestorage for extensible attributes for new fields); xml2 (XML transformation functions); tablefunction (database linking and cross tabs)
- Postgres exposes all info as pseudo tables so you can query them, We also keep track of versions in this view Extensions have binary files and C libraries AND they have headers and references/function/views that get installed in each DB where you will use the extension.
- Prior to 9.1 you needed to install the files first. See below.
- 1. install the binary files
- using packages, PGXN, or source
- installs to postgres “share” directory
- a few extensions don't have binaries
- 2. load the SQL file in each database where it's used
psql -f hstore.sql
Major update announcements -- important security update happened last week.
- You need to update ASAP
- Postgres can break backwards compatibility in major upgrades.
- Minor version upgrades, like 9.1.9 had a patch for the security hole. If you updating from 9.1.8 you are just updating the binaries. You do not need to do anything else.
- These minor updates come out every 2-3 months, only have bug fixes.
- We only introduce new features and API in exceptional circumstances about once every year and a half. Sometimes we need to patch for security (and we will warn).
- We do not change functionality; so you should be updating these minor upgrades as soon as you can schedule a downtime.
- 5 min downtime is all you need. Download the , shut down PostgreSQL, install packages, restart Postgres, restart application Done!
Major Upgrades -- once a year
- Usually in September; can break older versions
- Wait until Evergreen has worked with the installs before you go ahead.
- Uses dump and reload; use pg_dump and restore; this is the most reliable method. See slides. Does not clean-up DB though!
- Larger DBs can take a long time.
- Hence we use pgUpgrade in place of the database files. This allows you to do upgrades faster. Way better than the 36 hours -- done in 15-20 minutes. It is just rewriting the system tables and moving files over.
- There are cons -- it does not clean up the DB.
- It does not always work, but it will tell you if it fails and revert to previous version. Not all extensions support pg_upgrade -- all the ones that Evergreen uses in standard does support it, but you may have extensions later that do not.
- After 5 years we stop patching versions of postgres -- check the webpage www.postgresql.org/support/versioning
- Note: see http://pgxn.org for other extensions.
- You want to upgrade your extensions; If you are on 9.0 this is easier -- upgrade binary, go into the DB and then do the alter/extension update. If the person who is maintaining the extension has added the correct metadata, it will just upgrade.
- If pre-9.0 there is no easy way to upgrade extensions.
- A DB uses all the hardware outputs. E.g. web servers use CPU and ram; but a file server will use RAM and IO, but not CPU; A DB uses it ALL. It can overwhelm any part of the hardware.
- Balance your hardware budget. Especially if your DB will be larger than RAM, you need to buy decent storage that is relatively speedy; the DB will never be faster than your hardware.
- Biggest rule about hardware, if you have performance concerns (regional library sizes) with lots of traffic, then move Postgres to its own server so that it doesn’t have to compete with other server software unrelated to Postgres.
- Extra important on virtual servers.
- If you are running ILS on public clouds, remember the virtual machine is slower and more resource-poor than a real machine of the same specs. On Red Hat and others, IO is really slow and unreliable. Anytime it has to use extra storage, you will see a slow down.
- File Systems (primarily *NIX)
- If you want high performance, you need to tweak Linux and other IO settings. See slides. You had to less of this in the past, but he defaults are not as good as before.
- If you are allowed to use a more advanced modern file system (XFS, Ext4, etc.), then do. They perform better than the old systems.
- See slides for notes on optimization. Windows/OSX file systems do not optimize very well.
- XLOG (transaction log) the WAL (Write ahead log) refer to the same thing. This is where new transactions are written synchronously to avoid info loss. However, it has a distinct write pattern (not reading) and for that reason you can get a huge jump in performance by simply moving the write-ahead log to its own disk resources (gaining 30%). Or more.
- Parameters you don’t care about: 206-214
- Parameters you care about: 10-18 settings are those that people change. Demo time!
- Postgres has 80% of settings available on the main website. This is not that helpful. Delete this auto file and start a brand new file with only the settings you actually want to change.
- Shared buffers are the amount of dedicated memory. You want this to be about ¼ RAM; if not a dedicated server, figure out what is available on your server (beyond 8 gbs is the questionable limit for actually seeing improved performance).
- Be extra cautious on shared memory. You need to use working memory, maintenance working memory, wal_buffers
- There is a file that you can download among the files of this talk that has these setting listed.
- A key setting is telling Postgres that correct query planner cache; If you are going to be doing big writes, like bulk loads, you want a larger transaction log.
- Con: Big transaction log, if it crashes, then takes longer to bring up (cause of reading more files).
- Checkpoint completion target can have different recommendations. See postgresql demo.
- Other key points, logging options. It offers lots of options. See what Postgres did while you were away; again, see demo files.
- You can use UNIX sockets locally to connect to Postgres and if Evergreen is on the same server as Postgres, this is best.
- If you are going to another server, use Port 5432 and keep it open on firewall, or else you won’t be able to get to Postgres.
- Supports SSL connections -- if you need to use Evergreen on an untrusted network (like a public cloud). Use this, but we cannot go into the details here. However, it is important if you cannot trust the network.
- max_connections settings
- There is room for super user connections in case you have too many connections and need to remove them.
- Go into the error log and you will see an error message that tells you there are too many connections.
- If you have to keep increasing the connections, this is a sign you have a larger problem, and eventually performance suffers. In postgres world, each connection is a process and each process needs resources.
- The common reason for this problem is PGBouncer. It is an event-based pooler. It combines connections together -- if you have 11 idle connections and only one database connections, you can have multiple processes going through the one connection thanks PGBouncer.
These are compliant with IPB6 (but no IPB6 examples included).
- This is an ordered file where Postgres checks all the rules and see if you matched the rules; and if you match the rule, what you should do about the connections. If there is no rule that matches the connection, Postgres denies it.
- If you want to connect on the local connection, it would require a text password.
- You can set up SSL connections between the servers to protect text passwords from snooping.
- Internal users will not use SSL, but will be authenticated by LDAP.
- Another solution: anyone from the Admins group can connect remotely provided they are using Kerberos instead of SSL.
- You can make rules for replication connections. These are handled separately for security reasons.
- Don’t expose the postgres server/port to the internet
- Don’t allow users to connect as the superuser -- the superuser can do anything including full deletion! Give them special user logins.
- Use the strongest authentication as is possible for your build.
- Talks about the dangers of not having a back-up. Until a disaster, most orgs do not have a decent disaster recovery method.
- Logical backup. Postgres version agnostic
- Small database friendly
- Extract stuff to alternate databases.
- PITR (Point-In-Time-Recovery)
- binary/continuous backup
- Helps you with mandates for info loss (like max 15 minute info loss)
- pg_dump cannot do this
- So, PITR takes a snapshot of the database, saves the log files as they accumulate in an archives and then periodically you take another snapshot, and carry on bravely.
- And then if someone makes a mistake, then you can take the latest food snapshot and then apply the transaction logs before the problematic area.
- Managing PITR can be painful with lots of config options. So, use the open-source tools to manage it:
- favorite is Barman: Python based GPL tool that is designed for disaster PITR recovery
- there is a Perl one called omnipitr
- WAL-E for amazon (AWS) only
Have a DR plan!
- Plan to see the ways you can lose data, and have a way to recover.
- Replication is valuable for scenarios where doing backups are not feasible
- Binary replication
- Use load balancing
- Security -- replicas are iron-clad read only; if you need to give certain users read only access, giving them access to the replica can be beneficial if you fear mistakes.
- This is like DRBD for PostgreSQL
- Replicating binary blocks
- Streaming replication -- the ability to connect to postgres and get new data
- hot standby -- replicas running and accepts connections
- Replicas need to replicate whole servers; certain items aren’t replicated like unlogged tables
- LISTEN/NOTIFY is not replicated
- Query cancel issues (will discuss later on)
- Build on top of PITR; take a snapshot and then the 2nd DB server creates a receiving connection and that connects to the master, which then connects to the replica and then sends the changes.
- You can add replicas on top of this, because there is little overhead on the master.
- “Recovery” -- because we build replication on top of PITR; we say recovery when we often mean PITR
Example demo on replication
- Using PGBase Backup we create snapshots for replicas. The dash Xz switch says you want to copy data. This is generally for people with higher needs, but can be used in many situations.
- Specify to show progress meter and specify which directory to use.
- And after running it, the ILS directory has new files!
- For the replica you need replica configuration file (confusingly called recovery.conf)
- This is for streaming replication only -- you need to have a setting for where to connect to the master.
- Did a demo, but will conclude it during the Q&A.
- Replication supports other cloning methods, due to very large DBs, etc. This is what you want to do to create a snapshot.
Consideration during IRL implementations
- For whatever reasons the replica falls behind and missing data.
- It reconnects to the master and asks for data from 16 hours ago.
- Best scenario -- master will re-read files, take a performance hit, and drag up old data
- Worst -- master has already recycled log files and is no longer tracking; only way to get replica up again is to take a new whole DB snapshot -- can be a problem for big DB.
- In addition to having streaming, or instead of it, if you need to do long distance replication, you can use archiving method to create copies of the transaction logs (to a sand that the replica has access to). So when the replica falls behind is it will read that directory to the older data missed. If you have a ton of replicas set up on a cloud, you can use a single sand storage; you will still need a way for data to expire so that it does not get too unwieldy.
Failover vs. load-balancing
- Queries on the replica can conflicting with write activity on the master; it needs to make a decision to let the replica fall behind or make changes -- you decide how it handles these issues in configuration.
- You cannot have a failover AND a load-balancing server on the same server.
- If you set it up for rapid failover, it does you no good if a 3 hour report is canceled every 15 minutes; it will never complete.
- So, only set it up to fall behind or for rapid failover.
There are more complicated replication methods
- New abilities like synchronous replication, filtered replication, etc.
- all the open source friendly monitoring systems have some build in support for postgres -- (ganglia, collectd, Hyperic, OpenNMS, Openview, etc.)
- If your system does not support Postgres (which it probably does), you can use nagios check_postgres.pl
- It has a complete set of probes for data mining (and this may be so good you may want to use nagios anyways)
OS Check -- things to monitor and trend
- Monitoring is good to tell you if the server will go down right away, but trending will give you hints before the crash
- Disk Space (postgres hates running out of diskspace and is hard to shrink -- it just won’t run if it runs out)
- System Load (if goes up to 42 in systems terms, you have a problem
- Memory Use (how do you need for the cache?)
- IO Activity (if you are hitting 100% capacity, you have issues)
- Network (as storage increases, more people find the DB bottlenecked on the network level)
- active idle, idle xtns (see slides) -- idle means a transaction is issued a begin and then goes no where; transactions are holding up other connections and cause locks.
- Blocked queries (queries waiting on those locked queries)
- Query times (you can monitor these on logs)
- Table size/growth (more about trending -- keep trending on the top 10 tables and how they are growing for both table and index; that will tell you how much disk space you will need in the future)
- If you are getting close to 80%, you have a problem.
See slides for others
- Tools -- pgFouine, pgBadger -- can take track of logs and activity with graphics over the last few days
- Postgres uses append only storage -- non-overwriting; some updating, but you can basically think of them as non-replicating.
- It is constantly tagging info for deprecation - but if you have a row that updates a lot, you will have various versions taking up space. And make a bigger table (slower table)
- We use a vacuum to clean this up the idle transactions; but those that no one references anymore, it will remove them and move the storage around. It helps postgres re-use space more efficiently.
- We ran manual vacuum once a day manually; this is not effective.
- Hooray for auto-vacuum and multi-threaded auto-vacuum. By default it is on and leave it on for Evergreen. It is a DAEMON that looks for dead nodes that need attention and maintenance and then runs through them when it runs through.
- Defaults are not good enough and can fall behind the changes and dead rows/space. And when it falls behind there is bloat. Even though the table may only have 0.5 million rows but has performance of 6 million row table due to dead space.
You can check PGSTAT USER TABLES to check for table bloat.
- Nagios bloat query can be used; or you can trend database size -- if you see table growing in size but not rows, you have a problem.
- Long queries time can also mean bloat.
Fight that uncomfortable bloating
- Manual VACUUMs.
- While it does this it needs extra space; so never run out of disk space, or else you won’t be able to fix the problems causing bloat.
- vacuum full rewrites table and its indexes
- pg_reorg (addon) does background table rewriting (con: complicated)
- You can add more autovacuum workers if you have cores available.
- At around 2.2 billion, it wraps around. Transaction IDs are stored in the DB tables and so when we are getting close to the wraparound size, the old IDs need to get cleaned out. Once old, they are tagged for removal.
- In order to do this, there is frequently scanning and re-writing of large tables.?
- Auto-Vacuum does this too, but it does this in the middle of the way -- which makes sense because peak times are when lots of transactions happen.
- So, at 5pm closing time, DB performance goes through the roof as the auto-vacuum kicks in.
- Makes the wraparound a big problem! More of a time bomb than a clock! DOOMSDAY
- To help: add to monitoring; watch for new wraparound limits.
- When the limit gets close, setupdo an automated vacuum freeze in the middle of the night.
- Use vacuum freeze_min_age -- the default is conservative. make it more aggressive by making it smaller.
- Uses sampling, non-intrusive, can run concurrently with other queries.
- Auto-vacuum takes care of this; so even if you turn this off and do a manual vacuum, keep auto-analyze on.
- If you turn auto-analyze off, you can miss a lot.
- You can manual analyze only when you just made a new table or deleting a ton of records; you don’t want your stats to lag. So, bulkload the table and analyze it.
- Or the existing run-analyze is wrong.
No time for Bonus Round time, straight to questions:
Q: When you talk about replication that can be used for failover or for load balancing, could you have two replica servers?
A: For users with high requirements, yes. The reporting server can be used for failover, just not rapid failover. When you failover, it will notice that it is 3 hours behind and prevent it from going ahead. I didn’t explain synchronous replication -- this is for high needs users.
Synchronous Replication people will also have a synch replicas.
Q: Features List of in 9.2 features that are key
A:Vastly better performance for synchronous environments, Limited set of Jsong features, better performance
Q: When was auto vacuum added?
Q: What you do in a DB with connections set to 1000?
A: Well, you are wasting memory tracking these. Depending on OS, if you actually used more than half the connections, you will see an actual order magnitude drop in efficiency; there is an issues of process accounting. Back in the Linux mis 2 series, when you hit a lot of processes, it would switch processes and efficiency would drop. It will drop a ton.
Q: What recommended Linux scheduler do you use?
A: This used to be a big deal. they changed it. The 3 series should be left at the default unless I have good storage in which case I use in no I?
They changed the algor and it is now working fine in post-res.
Upgrading Evergreen: A Case Study
- Topic: Sitka’s upgrade from Evergreen 2.0 -> 2.2
- How to upgrade Evergreen [simplistic diagram]
- More complex diagram [in PP]
- Preparing the code
- the big stuff: [takes the most time]
- server code (application layer)
- database upgrade scripts [require modifications due to customizations]
- OPAC skins [if each library has its own skin...then takes a while]
- config files
- building the staff client [branding & customization require | Windows & Mac staff client version]
- Server code - local vs. upstream
- our own local git repository
- nice about git: able to treat different git repositories as branches of same repository.
- includes: [diagram that letters refer to in PP]
- Sitka-specific customizations (B, E)
- branding that is local
- hiding a feature
- fixes/features shared with upstream (C)
- backports from upstream (D)
- Goal: Single stream of development; pull 2.0 & 2.2 together
- Having a similar upgrade this summer; hope to identify more upstream issues.
- a series of commands that allows you to consolidate ‘streams’
- git interactive rebase
- create new sitka_2_2 branch from sitka_2_0
- git rebase -i rel_2_2
- Q: do you keep separate branches for customizations/fixes, etc.?
- resolving merge conflicts
- git mergetool | command that allows you to open up a text editor & figure out issues
- PROTIP: keep a log! | make a note of what error & what you did.
- Server code - interactive rebase [screenshot of code]
- recommend prefixing commit commands; used [sitka]; makes easier to do part of upgrade.
- server code
- database upgrade scripts
- config.upgrade_log is your friend
- pg_dump & pg_restore for snapshots [creating copy of database]
- assigning permissions [dependent on local permission groups; need to apply them beforehand.]
- other stuff (OPAC skins, config files, building the staff client...)
- who are your testers?
- what is being tested?
- multiple test servers (VMs)
- evolving codebase - iterative process
- testing procedures (checklists)
- [had checklists, trying to think of all different interactions of a situation]
- issue tracking and fixing bugs
- prioritization [testers were good at prioritizing issues]
- weekly summary [important since team was distributed; worked virtually, so meeting together very important.]
- [used a separate bug reporting program for this]
- [had several test servers. At the time, the beta version of 2.2 was released and uploaded to test server. Then eventually consolidated all the versions. Virtual environments were very helpful in doing this. Different test environments helped as well as some would work, others wouldn’t.]
- May 2: test server 1
- May 15: upstream 2.2rc1 released
- May 29 (Jun 4): test server 2
- Jun 3-11: bib record cleanup
- June 13: upstream 2.20 released
- June 13 (Jun 15): test server 3
- June 27: Sitka 2.2 code freeze
- June 30-July 2: upgrade production
- pre-upgrade upgrades (PG 9.1)
- scheduling downtime
- spare brick
- maintenance page [notice to sitka users]
- …[missing; need PP]
- [Production environment diagram]
- Two brick in production; third brick held in reserve. 3rd brick was upgraded first.
- had 7 different servers to be upgraded; had to have 7 different windows open; would like to automate that in the future
- What worked:
- division of labor when preparing code
- internal wiki (milestones, tasks, procedures)
- support team testing (did fantastic job)
- ticket prioritization + weekly summary (recommended!)
- multiple test servers (helped with quality control)
- 2-day launch window [ex: started saturday @ closing of libraries, then had holiday to upload & troubleshoot]
- MARC cleanup too close to upgrade
- (make sure everyone knows you’re doing upgrades)
- Q: Were there same bib records &tc used in each server?
- Yes, different snapshots thereof.
Managing a Software Upgrade
- Evergreen Indiana - very geographically distributed & taking on new members all the time.
- Managing a software upgrade
- necessary disruption in service to staff and patrons
- have to really sell/convince why upgrading
- balancing end-user expectations w/ new version of software
- individuals may be technology/change-resistance
- best time to do that: when doing committees & especially with those who you know who have strong leadership skills in the field to help you communicate & manage during difficult times.
- Launch version: 1.2
- September 4, 2010: Evergreen 1.4
- : Evergreen 2.2
- August 9, 2013: Evergreen 2.3
- Establishing the groundwork
- Upgrade windows established by the Executive Committee
- December or August [best times to start; related to buy-in]
- Channels of communication [take advantage of all of them!]
- committee meetings
- blogs [weekly update blog]
- Webinars [ did after the upgrade, to compensate for adjustment in software]
- Set up test server w/ future version
- set up admin & permissions exactly as in the live version
- Create test accounts for each permission group to use
- Have an exercise sheet w/ daily activities for circulations staff, catalogers, admin, reference, etc.
- Ask participants to work through exercise sheets and take notes if there are problems or major differences
- [great way to get back commentary: inconsistencies, etc.]
- Use staff feedback to crease an introductory webinar/video for consortium staff members.
- Making the announcement
- Detailed timeline w/ instructions
- include dates along w/ time
- Sample announcements available here:
- Do not assume anything! [example: access to software during upgrade on Sunday; ppl asking what time on Sunday will it be available again.]
- Monday, August 6 - Friday, September 14: inform local library staff & patrons
- Provide customizable patron fliers which explain the limitations during that period of time
- Monday, August 6 - Friday, September 14 : Train staff on using offline mode
- Provide links to offline mode tutorials
- Ask them to reach out to the staff at Indiana State Library to help them practice.
- Friday, September 14 (8 PM) - Sunday, September 16 (8 PM): Run in offline mode
- SIP2 connection will also be down. [don’t forget your dependent third party-vendors!]
- Sunday, September 16 (Evening): Listserv message
- email to the listservs indicating that the upgrade is complete.
- Monday, September 17 (before business day begins): Accessing 2.2 auto-update staff client and completing offline mode transactions
- processing offline transactions before creating new transactions
- Monday, September 17 (Beginning of business day): Backdate checkins.
- Fail Fair [what went wrong]
- Provided the test server but no exercises or facilitation for completing them.
- Results: few libraries explored the test server before the upgrade and were therefore unfamiliar w/ new features
- Results: angry customers!
- An ebook vendor agreed to use their own server during the time our SIP2 connection was down.
- Results: Vendor dropped ball on their end, confusing staff & patrons. [still reflects on Indiana Evergreen]
- Establishing policy for new features ahead of time.
- Listing permission assigned to each group before upgrade.
- “I could do that with my circ1 username before the upgrade & now I can’t.”
- Providing webinars/videos before the upgrade (and after).
- Better & higher quality communication.
- Upgrade horror stories?
- Upgrade success stories?
- what types of training opportunities have worked best?
- What types of communication do you use?
- Q: Do you have copies of your test worksheets?
- A: Yes, send an e-mail to me & a copy will be sent.
- Comment: Don’t do upgrade while you’re tired!
Discoveries in Digital Humanities: How Evergreen Contributes
Vyacheslav Tykhonov \ Mieke Stroo - Evergreen tools at IISH
- IISH conducts advanced research on the global history of work, workers, and labor relations. To this end, it gathers data, which is made available to everybody.
- Institution holds over 3,000 archives, more than 1 million printed volumes and equivalent audiovisual forms...
- Evergreen migration steps
- Export of bibliographic data, serials and authorities from Advance ILS
- 1st migration to Evergreen 2.03 was done in Sept 2011
- Last migration to Evergreen 2.2.2 was finished in Jan 2013
- Unit testing was developed for every Evergreen module
- Evaluation of results for every unit test by Collection Department Team (CODI)
- Final migration to Evergreen 2.2.2 w/ completely tested functionality.
- Q: Was the Unit Testing results released to the community & shared?
A: No, not yet, but he has done a lot of work yet to be released.
Suggestion: Evergreen community send a committee to work over IISH for an extended period of time.
- Linked Data - Authority Linking
- Users can link authority in Evergreen Staff Client:
- Open authorities list
- Choose appropriate authority
- Link it with bibliographic record
- Automatic authority linking by using searching and matching techniques
- automatically linked for 1.3 million records during migration.
- Data management: Excel file [screenshot]
- Vu-Find as OPAC
- Evergreen is not used by our patrons; instead use Vu-Find (search.socialhistory.org)
- MARC21 records, EAD and other databases - all in Vu-Find
- Data is exported through an OAI protocol
- (github address to be added)
- API Evergreen -> OAI protocol -> API Vu-Find
- [Screenshot of current view]
- Tool to import book titles: the ISBN reader
- evergreen.iisg.nl/cgi-bin/offline/isbnreader.pl [doesn’t seem to work]
- Automatic update of publication to the 041, 044 (country), year of publication codes ….
- More information: technical details on all tools can be obtained by emailing email@example.com
- Presentation: http://www.slideshare.net/vty/bridging-research-and-collections [shows tools developed that uses Evergreen metadata, etc.]
- Q: What more does your institution need from Evergreen that would help your efforts?
A: Need a Claim module for Serials. Browse searching for call numbers. (Currently binary search)