I’ve just posted a new Greatest Hits article on the ILM forum on the subject of how ILM (or the FIM Sync Service) can be used to clean up the mess of existing accounts, before you can actually get  on to the more interesting tasks of provisioning and updating. With the way FIM codeless sync works, needing an existing attribute to match on, and only allowing simple matching rules, it will be more important than ever to start from a position of tidy directories with correctly identified existsing accounts. Here’s the article…Â
Phase One Joins and Data Matching
The tedious truth is that most IdM projects must begin with a phase of data matching and cleaning. Before you can start to automate management of identities, you need a predictable
data set around which to base your rules. Many organizations today, whether due to changes in IT personnel, company mergers, variable naming conventions or lack of guidance on handling resignations, have existing user account bases that can only be described as a mess.
This document covers some of the methods you can use with ILM to get through that first project phase. Unfortunately there are no magic bullets here – eye-straining, brain-numbing trawls through long lists of unmatched accounts cannot usually be avoided. What you can do is try to extract quality data, and construct your lists as helpfully as possible, so they can be targeted at the people whose eyeballs and brains are most likely to give the best response.
It must be added that this is a very big topic, and the methods you choose will depend on the data you are faced with, and the aims of your project.
Joins
Really, it’s all about joins.
Once existing accounts are joined to their correct data source (such as matching an HR record to an AD account) you can begin to flow updates.
Once you have a clear idea who does and does not have an account in a target directory, you can begin to make provisioning and deprovisioning decisions.
Â
Your eventual aim is a simple Join Rule
When you first start data matching, you will use a lot of different join rules, and you will make joins manually and with CSV files (more on that below). But keep
this in mind:Â
Any join made manually, or with extra effort, should be considered temporary.Â
There are various situations in ILM which can only be reliably rectified by a clear-out and re-import of a connector space.
Always plan for this.
Ideally, when you re-import a connector space, you will have a single, direct join rule which effortlessly re-joins all your objects. And to achieve this we use…Â
Breadcrumbing
Once the join is verified you should export a uniquely identifying attribute, such as an employee number, to the target directory.
After that a simple “employeeID = employeeID†type join rule is all you will need.Â
Â
Sometimes you are faced with an import-only system. For either technical or political reasons you are not able to export the breadcrumb attribute.
There are a couple of things you can do:Â
- Â Import the DN or other identifying attribute from the target directory into an attribute on the Metaverse object.But be warned, you could still lose these joins if you had to repopulate your Metaverse.
-  To be on the safe side you should also “save†these joins by exporting an identifying pair somewhere else – for example using a database or text file
MA, as pictured below.
Â
Understanding Join Rules
There are some key points to note about Join Rules:Â
- Join rules operate on the CS object. It takes one CS object and attempts to find a match among all Metaverse objects of the correct type, even if they are already joined.
- Components within one rule are AND-ed together – they must all match.
- Multiple rules are evaluated top down, so put your strongest rules at the top.
- While you can offer variations on a CS objects attribute using an Advanced Join Rule, you have to find an exact match with an attribute already on a Metaverse
object. There are no “StartsWith†or “Contains†comparisons.
Resolve Join
At the bottom of the Join Rule configuration you will see a check box “Use rules extension to resolveâ€.
Here you can link to code you write under the ResolveJoinSearch subroutine in the MA extension.
The Resolve rule is used when ILM finds multiple possible matches in the Metaverve (including already-joined objects).
All the possible matches into the collection rgmventry and your job, in the code, is to check through them, looking for the best possible match.
If your code finds an ideal match you return the index number of the object in imventry, and set the value of ResolveJoinSearch to true.
The following example only joins if a single, unjoined Metaverse object was found.Â
Public Function ResolveJoinSearch(ByVal joinCriteriaName As String, ByVal csentry As CSEntry, ByVal rgmventry() As MVEntry, ByRef imventry As Integer, ByRef MVObjectType As String) As Boolean Implements IMASynchronization.ResolveJoinSearch
Select Case joinCriteriaName
Case "Resolve_NotYetJoined"
If rgmventry.Length = 1 AndAlso _
rgmventry(0).ConnectedMAs("AD").Connectors.Count = 0 Then
imventry = 0
Return True
Else
Return False
End If
End Select
End Function
Advanced Join Rules
A simple join rule directly matches a connector space attribute to a Metaverse attribute:
 | Connector Space |  | Metaverse |  |
---|---|---|---|---|
 | givenName = Kathryn |  | FirstName = Kathryn |  |
 | sn = Bigalow |  | Lastname = Bigalow |  |
With an Advanced Join Rule you construct a list of possible values with which to find an exact match in the Metaverse:
 | Connector Space |  | Metaverse |  |
---|---|---|---|---|
 | givenName = KathryngivenName = KategivenName = Kathy |  | FirstName = Kate |  |
 | sn = Bigalow |  | Lastname = Bigalow |  |
Sometimes you can use code rules to make your list of possible matches (eg., presenting a phone number in different formats); other times you have to use long
look-up lists of possible variations (try genealogy websites for name-variation lists).
The following example uses a lookup file of aliases, where each line has the possible variations on a name.
If a match to the first name is found on the connector space object, the whole line is added to the possible values to search for in the Metaverse.Â
Elizabeth,Liz,Beth,Betty
David,Dave,Davey
Jerome,Jérôme
…Â
Public Class MAExtensionObject
Implements IMASynchronization
Dim fileAliases As System.IO.StreamReader
Dim arrAliases As String()
Dim i As Integer
Public Sub Initialize() Implements IMASynchronization.Initialize
fileAliases = New System.IO.StreamReader("C:\aliases.txt", System.Text.Encoding.Default)
i = 0
While Not fileAliases.EndOfStream
ReDim Preserve arrAliases(i)
arrAliases(i) = fileAliases.ReadLine
i = i + 1
End While
fileAliases.Close()
End Sub
Public Sub MapAttributesForJoin(ByVal FlowRuleName As String, ByVal csentry As CSEntry, ByRef values As ValueCollection) Implements IMASynchronization.MapAttributesForJoin
Select FlowRuleName
Case "Join_aliases"
Dim aliasList As String
Dim value As String
For Each aliasList In arrAliases
If aliasList.Contains(csentry("givenName").Value) Then
For Each value In aliasList.Split(",".ToCharArray)
values.Add(value)
Next
End If
Next
End Select
End Sub
End Class
Using CSV Files
A lot of the difficult account matching will probably be done outside ILM.
It is therefore useful to be able to “export†lists of possible matches, and later, “import†the joins from a CSV file.
Be careful when writing to files from extension code – the DLL doesn’t unload for five minutes after the MA run completes, which means you may have to wait for it
to finish writing to the file. If you’re in a hurry, recompiling the code will force the DLL to finish writing to the file.Â
Export Possible Matches to a CSV file
You can use the Resolve rule to export possible matches to a CSV file. The following example resolves the join rule “sn | Direct | lastnameâ€. As a match on the last
name alone is too weak for an immediate join, we just write the possible matches to the text file.Â
Public Class MAExtensionObject Implements IMASynchronization
Dim fileMatches As System.IO.StreamWriter
Public Sub Initialize() Implements IMASynchronization.Initialize
fileMatches = New System.IO.StreamWriter("C:\possible matches.txt", System.Text.Encoding.Default)
End Sub
Public Sub Terminate() Implements IMASynchronization.Terminate
fileMatches.Close()
End Sub
Public Function ResolveJoinSearch(ByVal joinCriteriaName As String, ByVal csentry As CSEntry, ByVal rgmventry() As MVEntry, ByRef imventry As Integer, ByRef MVObjectType As String) As Boolean Implements IMASynchronization.ResolveJoinSearch
Select Case joinCriteriaName
Case "Resolve_Lastname"
Dim MAName As String = csentry.MA.Name
Dim mvobject As MVEntry
Dim cFirstname, mFirstname As String
If csentry("givenName").IsPresent Then
cFirstname = csentry("givenName").StringValue
Else
cFirstname = "UNKNOWN"
End If
If mvobject("firstname").IsPresent Then
mFirstname = mvobject("firstname").StringValue
Else
mFirstname = "UNKNOWN"
End If
For Each mvobject In rgmventry
If mvobject.ConnectedMAs(MAName).Connectors.Count = 0 Then
fileMatches.WriteLine(csentry("sn").StringValue & ";" _
& cFirstname & ";" _
& mvobject("lastname").StringValue & ";" _
& mFirstname)
End If
Next
Return False
End Select
End Function
End Class
You will need to adapt this code of course. Firstly, you probably want to export a lot more identifying information in your CSV file – department, email address,
dn … whatever helps. Next, it can really help to supplement your possible matches with a probability score. This is where you do a series of tests and add points-
the more points, the higher the chance of the match. For example:Â
- Names similar*Â Â Â Â Â Â Â Â Â +1
- Department the same +1
- City the same          Â
+1
*Some tips for testing if names are similar:Â
- Strip out all spaces, dashes and hyphens then compare;
- Check if one string is contained in the other (so that “Sally-Anne†gets
a point for “Sallyâ€); - Use a function which compares string similarity (search “Soundex†and “Levenshtein
distanceâ€).
Join from CSV
You can use an Advanced Join Rule to “import†joins from a CSV file.
Firstly, our CSV file must be constructed like this:Â
CS_identifier;MV_identifierÂ
For example, if we are trying to match an AD account against Metaverse objects imported from the HR system, we populate the CSV with the AD DN and the employeeID:Â
CN=Fred Bloggs,OU=User,OU=MyOrg,DC=mydomain,DC=com;0012988Â
Next we create the Advanced Join Rule which will look up the csobject’s DN in the text file, but use the employeeID to search the Metaverse.Â
Note that you can’t actually use the DN in the join rule – but that’s ok, just use any attribute that definitely exists. Eg.,Â
sAMAccountName | Rules extension – Join_CSV | employeeIDÂ
And now for the code :Â
Imports Microsoft.MetadirectoryServices
Public Class MAExtensionObject Implements IMASynchronization
Dim joins As String()
Dim i As Integer
Public Sub Initialize() Implements IMASynchronization.Initialize
Dim fileJoins As System.IO.StreamReader
Dim strLine As String
'Open the csv file and read into an array
fileJoins = New System.IO.StreamReader("C:\joins.csv", System.Text.Encoding.Default)
i = 0
While Not fileJoins.EndOfStream
ReDim Preserve joins(i)
joins(i) = fileJoins.ReadLine
i = i + 1
End While
fileJoins.Close()
End Sub
Public Sub MapAttributesForJoin(ByVal FlowRuleName As String, ByVal csentry As CSEntry, ByRef values As ValueCollection) Implements IMASynchronization.MapAttributesForJoin
Select FlowRuleName
Case "Join_CSV"
'If the csentry DN is found in the joins array, then
'use the paired employeeID to search the Metaverse.
For i = 0 To joins.Length - 1
If joins(i).Contains(csentry.DN.ToString) Then
values.Add(joins(i).Split(";")(1))
End If
Next
End Select
End Sub
Reporting
People will ask you questions like “How many people have you joined in system X but not in Y?â€, “How sure are you that the joins are correct?â€, “Which department
has the most unidentified accounts?†It’s best to be prepared for these sorts of questions.Â
Querying the Metaverse
Once data is in the Metaverse it is a simple matter to access it for reporting, either by exporting it into a reporting table, or by directly querying the underlying
tables (as long as you’re careful to do it when ILM is idle, or else use NOLOCK).Â
So consider this: During the data cleaning phase, import all identifying attributes into the Metaverse from all sources.Â
For example: You’ve made a join between a user in AD and an HR record.
Under normal operations you would consider HR as the master source for the name, and you would only flow it from there.
You wouldn’t bother importing the name attributes from AD – in fact you’re more likely to be overwriting them with export flow rules.
 | HR |  |  |  | Metaverse |  |  |  | AD |  |
---|---|---|---|---|---|---|---|---|---|---|
 | Lastname = Powells-Brown |  | -> |  | lastname = Powells-Brown |  | -> |  | sn = Powells-Brown | |
 | Firstname = Joanna |  | -> |  | firstname = Joanna |  | -> |  | givenName = Joanna |  |
However, during the data matching phase, you’re probably not ready to start overwriting attributes, and the information about current values can be very important in your
verification and reporting.
 | HR |  |  |  | Metaverse |  |  |  | AD |  |
---|---|---|---|---|---|---|---|---|---|---|
 | Lastname = Powells-Brown |  | -> |  | HR_lastname = Powells-Brown |  | -> |  | sn = Powells-Brown |  |
 | Firstname = Joanna |  | -> |  | HR_firstname = Joanna AD_lastname = Powells AD_firstname = Jo |
 | <- <- |
 | givenName = Joanna givenName = Jo |
 |
Now, if you have a look at the mms_metaverse table in the ILM database, you will see how simple it is to query the progress of your joins, and also to judge
on what criteria the joins were made.
Some example queries…Â
/* HR person with no join to AD */
select HR_lastname, HR_firstname, HR_employeeid from mms_metaverse
where AD_dn is nullÂ
 /* HR person with join to AD */
select HR_lastname, HR_firstname, HR_employeeid, AD_lastname,AD_firstname,AD_dn from mms_metaverse
where AD_dn is not nullÂ
Caution |
---|
As mentioned above you need to be careful when directly querying the Metaverse tables.If your system is already in production, and you happen to be adding in a new data source, then you may be better off employing a SQL MA to export the data you’re interested in to another table, where you can query it as much as you like without risk of locking errors. |
 Â
Querying the Connector Space
Unfortunately it is not so simple to query the connector space to, for example, report on that state of your disconnected objects.
It’s a great pity that you can’t just save results from the Joins page in the Identity Manager GUI, so your options are:Â
SQL query – The CS table holds data differently to the Metaverse tables.
It is possible to query for disconnectors in this way, however you will only be able to retrieve the CN of objects – which may not be sufficient to identify them.Â
select cs.rdn from dbo.mms_connectorspace cs
join dbo.mms_management_agent ma
on cs.ma_id = ma.ma_id
left outer join dbo.mms_csmv_link mv
on mv.cs_object_id = cs.object_id
where ma.ma_name = 'My MA'
and mv.mv_object_id is null
and cs.connector_state = 0
CSExport – The command-line utility csexport.exe, found in the <ILM program>\bin folder, will allow you to dump connector space objects to an XML file.Â
Report directly from the data source – Once an object in the data source has been correctly identified you will ideally export a unique attribute out to it. It may then be possible to identify the non-joined objects as those which don’t possess this attribute.Â
Project to a different object type – ILM processes joins before projections, so it is fairly simple to project all non-joined objects to a different Metaverse object type – for example, one called “disconnectorsâ€. This may help in reporting on an overall status direct from the Metaverse tables.
Note however that to make these objects once again available for joins they will have to be disconnected from the MVExtension provisioning code.Â
Advanced Tips and Tricks
Join to Multi-value Attribute
Sometimes you may need to join to a value in a multi-value attribute. An example is searching through all the proxyAddresses for a match against
a single email address.Â
When the multi-valued attribute exists in the connector space, and the single valued attribute is in the Metaverse, this is very easily accomplished. Just use an Advanced Join Rule to break the multi-valued attribute down into the values list used by the join rule.Â
Public Sub MapAttributesForJoin(ByVal FlowRuleName As String, ByVal csentry As CSEntry, ByRef values As ValueCollection) Implements IMASynchronization.MapAttributesForJoin
Select FlowRuleName
Case "Join_proxyAddresses"
Dim alias As String
For Each alias In csentry("givenName").Values
values.Add(alias)
Next
End Select
End Sub
However, when the multi-valued attribute has already been imported into the Metaverse you will have a problem. While you can join on a multi-valued attribute, you have to join on the whole thing. There is no way that you can match one value out of a multi-valued Metaverse attribute against a single-valued connector space attribute.Â
Some possible options:Â
- Import the multi-valued attribute into a series of single-valued attributes in the Metaverse, eg., proxy1, proxy2, proxy3 …
- Use a different Metaverse object type to do the project and join the other way around.
Multiple Possibilities in the Connector Space
All the joining techniques so far have been based around a single connector space object, with one or more possible matches in the Metaverse. But what do you do if you want to work the other way around, where there are multiple possible matches in the connector space for a single Metaverse object, and you want to pick the best one? Unfortunately this is not straight-forward. There is no way to offer a selection of connector space objects in a join rule, as joins always work on a single connector space object at a time. You can judge the merits of the current CS object, but you can’t tell if there’s a better one coming up. One way around this is to do everything with CSV files.Â
- Use the ResolveJoinSearch to write possible joins to a CSV, but don’t actually join anything,
- Do your matching outside ILM, then
- Use a CSV with an Advanced Join Rule to make the joins.
Too much data
If you have an enormous number of accounts, and different data sources to trawl through, you may be best off doing your data matching outside of ILM, and then just using the CSV join rule above to make the joins. Check out the Fuzzy Lookup Transformation from the Enterprise version of SQL SSIS for help here.Â
Take-Home Thoughts
- Complex join rules are a means to an end – not the end itself.
- Breadcrumbing is essential for automation.
- When matching on weak rules (eg., surname only) then verify the match another way.
- You can’t do it all in ILM. CSV, Excel and fuzzy lookup algorithms will also help, but an element of by-hand matching is inevitable.
- Get the matching and breadcrumbing sorted out before you start flowing and provisioning.This will make for a happier project, stake holders, users and YOU!
Extra Reading
About the Author
Carol Wapshere is an ILM MVP and contributor of a few of these documents now. She has always believed that putting the work in up-front will save you lots of headaches in the long run. It’s a self-defense strategy really – there’s nothing worse than having to go over and over (and over) the same ground again and again, especially when you’d already moved on to something new and far more interesting. Keep it neat, and keep it simple, and things should work just fine.Â
Thanks to Markus Vilcinskas and Paul Loonen for their help with this document.
Carol – tonight I’ve been wondering about what constitutes best practice for implementing complex (or even semi-complex) join rules in FIM. What I am trying to do is understand how the “breadcrumbing” idea (Markus calls it a “correlation id”) might work in a FIM model where the join rule has to be a simple match on one or more metaverse attributes. The more I think about it the more I am leaning towards the traditional approach – but how will this work if I still want to implement my flow rules in the FIM portal? The question would probably be this … if I implement a combination of join rules and then implement the flow rules in the FIM portal, including writing back a “breadcrumb”, would that mean that the attribute I specify on the join rules tab should be the match on my breadcrumb? Any direction on this would be much appreciated … clients have been sold on the codeless concept, and I am not keen on muddying the water by implementing rules in 2 places if I can avoid it.
Hi Bob,
as far as I can tell the FIM Sync rules only support the simplest type of join rule, so all of the gymnastics covered in this post would still have to be done the old ways – hopefully just initially as part of sorting out the legacy data mess until you could get to a point of exporting your breadcrumb, after which FIM Sync rules would be sufficient. On one particularly complex project I’ve been using one FIM server as the “joins server” and seperate one as the “production server” doing the provisioning and updating.
I just realised something about the Sync Rules the other day – I had thought the match criteria was a kind of “soft join” re-evaluated each time, so if the matching attribute changed in the target directory it could join to a different metaverse object the next time around. But it actually does work exactly the same as a join always has done. I guess this at least means you won’t get anyweird situations where multiple metaverse objects try to flow to one cs object – when it’s joined it’s joined, just like it always was.
Thanks Carol – just read your “FIM Newbies” post and there is a consistency developing here in what you say. I was wondering about your first paragraph in this post where I stopped on your words “doing away with the notion of permanent joins”. I guess your reply explains that now – I was sort of hoping that the joins are still permanent, cos that would have changed the playing field for me! 🙂
Yikes! I’ll change that. Thanks for pointing that out.
P.S. what are your thoughts on this post having worked with FIM for a good while now: http://forums.novell.com/novell-product-support-forums/identity-manager/im-engine-drivers/415131-idm-vs-fim-2.html … do you hear much of this lately? What would be your response based on your experience so far? I think some guidelines on the appropriate use of “codeless” sync rules will be needed pretty soon to ensure that performance doesn’t come into question …
Interesting. I might just have to jump into that discussion. I had a bad experience with DirXML years ago and haven’t been near it since, but I’ve been reading up on IDM 4 and I’d like to learn more. I think the Microsoft sales guys are kidding themselves if they think FIM can do anything near what a product like IDM can do – but on the other hand I do believe you have more flexibility with FIM due to the DIY nature of the product. Of course that also means you need someone who understands all this DIY… I think a lot of it depends on the client and the project. Despite them having given me an MVP, I don’t actually think FIM is the choice for every occasion, BUT when I manage to do something like make it run BPOS powershell cmdlets I think to myself “that’s pretty cool!”
In our provisioning processes, I’ve been managing proxyAddresses directly based on values in a database table, so it’s been straightforward. Now our Exchange team wants to reverse that flow, and instead of me setting proxyAddress values, let Exchange to that and flow them back. No problem mapping proxyAddress back from ADMA into the MV objects. However, now they want me to create distribution lists where user A may request that his mail be redirected to user B, so the spec calls for attempting to join the stated address in the datasource to all proxyAddress values in the MV.
So my question is, do you still think that my best approach is to map the proxyAddress values in the ADMA CS to a set of attributes in the MV so I can join against them, or have you had any better ideas?
Maybe I’m obtuse, but I’m not getting the “Use a different Metaverse object type to do the project and join the other way around” suggestion at all.
Hi Bill. This post was written very much in mind of the initial sort of gymnastics you do to get accounts matched and cleaned up – I wouldn’t recommend these steps as operational options. With the “join the other way” I was talking about a temporary projection into metaverse objects, just so you could join against CS objects in another connector space (where the multivalue attribute came from), after which you would swiftly export an identifying attribute out to your newly joined objects, and then clear them out of the metaverse.
With these groups of yours – could you use SQL to generate them and link them up to their members? Complex logic is almost always best done outside the sync service.