Better Know APOC #3 : apoc.date.parse & format

 

Neo4j.Version 3.0.0
APOC Version 3.3.0.1

If you haven’t already, please have a look at the intro post to this series, it’ll cover stuff which I’m not going to go into on this.


Dates! For some reason people keep on wanting to keep track of dates (and indeed times) in their systems. Neo4j for a long time was leading the charge in rejecting your modern concepts of date and time, unfortunately – many people still want to use them, and much like the ‘paperless office’ a ‘DateTimeless office’ doesn’t really seem like it’ll be on the cards any time soon.

For a lot of people this meant hand rolling code to cope with the progress Neo4j had foisted upon them. I for example, have taken to storing DateTimes in ‘Tick’ form, because nothing says readable like ‘636812928000000000‘ (that’s Christmas day, 2018 btw). Let’s imagine code wise I want to display that – in C# it’s a doddle (and I assume Java) I just create a new DateTime and output it to the screen, but what if I wanted to do this using Cypher? Or indeed want to add stuff to my database without having to write an application to do that?

Introducing apoc.date.parse and apoc.date.format

I’m covering both of these as they are complimentary, and typically if you want to use one, you’ll probably want to use the other.

What do they do?

parse‘ parses a given date time string (something like ‘2018/12/25 01:02:03‘), ‘format‘ takes a ‘parsed’ value and converts it to a string.

Setup – Neo4j.conf

Nothing! These are safe and require no conf changes to use.

apoc.date.parse

We’ll look at parse first, as typically this is where most people do – gotta get that data in!

Ins and Outs

Tapping apoc.help('apoc.date.parse')

Inputs (time :: STRING?, unit = ms :: STRING?, format = yyyy-MM-dd HH:mm:ss :: STRING?, timezone = :: STRING?)
Outputs (INTEGER?)

Inputs

There’s always an input, and parse is no different!

Time

This is our date and time string (badly named), you can pass in just a date:

'2018/3/20'

Using ‘/’, or ‘-‘ separators, it’s all good:

'2018-03-20'

You can add a time:

'2018-03-20 13:34.12'

All you need to do is ensure the pattern you use is reflected in the ‘format’. The default format is listed below.

Unit

This is your output unit, default is millisecond (ms), the values you can convert to are:

  • Millisecond (ms/milli/millis/milliseconds)
  • Second (s/second/seconds)
  • Minutes (m/minute/minutes)
  • Hours (h/hour/hours)
  • Days (d/day/days)

So you can pass in ‘ms’ or ‘milli’ and get the same output.

Format

Default wise – we’re looking at: yyyy-MM-dd HH:mm:ss which is:

  • Full year – all the digits, ’18’ will be treated as the year ’18’, not 2018!
  • Month – 1 or 2 digits, i.e. 1 = January or 01 = January, fancy.
  • Day – Again, 1 or 2 digits
  • Hours – in 24 hour format, so if you want 1pm, that’s 13,
  • Minutes – 1 or 2 digits – I don’t think I need tell you the number of minutes in an hour (right??)
  • Seconds – 1 or 2 digits – and again, the traditional number of seconds in a minute.

But wait! That’s not all – do you want to provide your own format? Not interested in time? Only interested in time? Of course! Just put in your own format string (in the Java format) –

  • Just Date: 'yyyy-MM-dd'
  • Just Time: 'HH:mm:ss'

Just for clarification – the capitalisation of the ‘M’ is important, lowercase = minutes, upper case = months.

Timezone

You’ve got 3 options here,  the full name of the timezone, the abbreviation, or something depicting the hours difference:

  • Full name:
    Europe/London (I can only assume we’ll need to get this to be renamed to something like ‘Brigreatain/London‘ or similar – I’ve put the ‘great’ back into Britain) – A full list of these are available on the great wikipedia.
  • Abbreviation
    PST, UTC, GMT obviously these are generally more broad strokes than a specific country.
  • Custom
    GMT +8:00
    GMT -8:00

Generally it’s recommended to use the full name. If you choose to not pass in a timezone, the default is "", now, you might well ask yourself –

OK does that mean we’re looking at the timezone of my machine? The machine Neo4j is running on? Actually – what does it mean?!

Well – from the code we can see that it’s UTC, so that’s that cleared up.

Output

Your converted value – or an error (Ha!).

apoc.date.format

We’re gonna jump straight into ‘format’ – as examples wise we may as well put the two together in a date field sandwich.

Ins and Outs

So as to not break with tradition, let’s hit ‘help’: apoc.help('apoc.date.format')

Inputs (time :: INTEGER?, unit = ms :: STRING?, format = yyyy-MM-dd HH:mm:ss :: STRING?, timezone = :: STRING?)
Outputs (STRING?)

Inputs

Despite looking like I just copy/pasted from above, you’ll note a key difference, ‘time’ is now an integer. Exciting!

Time

This is our date and time in Unit format. By that I mean if you’re wanting to convert from milliseconds to readable, you can do that, or even seconds to readable, all you have to do is set the…

Unit

These are the same units as above, so I won’t go over them again. Default wise we’re looking at ‘ms‘.

Format

Your output format this time. So even if you stored right down to the millisecond and want to see only the year, you can do that. The default is the same as with Parse – ‘yyyy-MM-dd HH:mm:ss

Timezone

Did you store as UTC but want to see this in PST? Go for your life! As before the default is UTC.

Output

Your input in a nice readable string format. This is one of those functions we know Michael didn’t write – as he understands ticks natively. Seriously – when you next meet him, ask the time – you’ll get a LONG in response, it is a thing of wonder.

Examples

The obligatory examples section

The Basics (look Mum! No (optional) params!)

We always need to provide at least a date/time to be able to parse, so:

RETURN apoc.date.parse('2018-03-31 13:14:15')

Gets you:

1522502055000

So let’s pass that back into our format function:

WITH apoc.date.parse('2018-03-31 13:14:15') AS inny
 WITH apoc.date.format(inny) AS outy, inny
 RETURN *
inny outy
1522502055000 “2018-03-31 13:14:15”

This is the way we’ll proceed with the examples from here on in..

The I want to only see dates example

WITH apoc.date.parse('2018-03-31', 'ms', 'yyyy-MM-dd') AS inny
 WITH apoc.date.format(inny) AS outy, inny
 RETURN *
inny outy
1522454400000 “2018-03-31 00:00:00”

Note, I parsed only the date, but returned the date and time, this is just to prove that the time isn’t parsed, or rather is but set to 00:00:00.

I’m not going to go into just time, I think we can all work that out.

The Timezone Fun Example

(Loosest sense of the word ‘fun’ here) – we pass in a full date with time, using the default of UTC, then convert is back to a PST time, KERRRRAZY.

WITH apoc.date.parse('2018-03-31 13:14:15') AS inny
 WITH apoc.date.format(inny, 'ms', 'yyyy-MM-dd HH:mm:ss', 'PST') AS outy, inny
 RETURN *
inny outy
1522502055000 “2018-03-31 06:14:15”

Summing up the experience

The DateTime conversion stuff is something that is useful with queries, you can pair it with ‘timestamp()’ if you want to add things like a ‘last logged in’ date:

MATCH (u:User {Id: '123'}) 
SET 
    u.LastLoggedIn = timestamp(), 
    u.LastLoggedInReadable = apoc.date.format(timestamp())

Storing date and times as integers makes querying for them easy, and apoc.date.parse makes the query readable:

//Find users who logged in this year
MATCH (u:User) 
WHERE u.LastLoggedIn > apoc.date.parse('2018-01-01', 'ms', 'yyyy-MM-dd)
RETURN u

I personally think that’s better than:

MATCH (u:User)
WHERE u.LastLoggedIn > 1514764800000
RETURN u

Anyhews, enough!

Better Know APOC #2: apoc.graph.fromDB

A rubbish picture that really doesn't represent anything - but you could say 'apoc.graph.fromDB' - honestly - you're not missing anything here

Neo4j.Version 3.0.0
APOC Version 3.3.0.1

If you haven’t already, please have a look at the intro post to this series, it’ll cover stuff which I’m not going to go into on this.

Back at the beginning of this series (if you can remember that far!) I talked about using apoc.export.csv.* – and I showed that an example of using apoc.export.csv.graph that took in a graph – and to get that graph – I used apoc.graph.fromDB. I also said I wasn’t going to cover it in that post – and I didn’t. Time to rectify that lack of knowledge!

What does it do?

apoc.graph.fromDB takes your existing DB and creates a whole new virtual graph for your use later on – we’ve seen it in use in episode 1 – the phantom men… sorry – apoc.export.csv.graph, but a virtual graph can be used in other procedures . This particular instance is a hefty ‘catch all’ version – maybe overkill for most needs – but equally – maybe exactly what you’re after (if you’re after dumping your DB).

Setup – Neo4j.conf

dbms.security.procedures.unrestricted=apoc.graph.fromDB

Ins and Outs

Calling apoc.help(‘apoc.graph.fromDB’) get’s us:

Inputs (name :: STRING?, properties :: MAP?) ::
Outputs (graph :: MAP?)

Inputs

Only two this time, and I reckon you can pretty much ignore them, so that’s a win?!

Name

This is as simple as it seems – just the name – I’m going to be honest here – I really am not sure what this is for – you can access it later on though. I’m pretty sure this is a hangover from the other apoc.graph.from* methods – where it makes more sense as a distinguisher – but for this procedure – as we’re just exporting the whole db, go for whatever you like.

Properties

Just a collection of key/values – accessible after the procedure has executed – but otherwise not used by the procedure.

Outputs

Just the one! Amazeballs!

Graph

This is what you need to YIELD to use the procedure (the examples will cover this) – to access the name you use:

RETURN graph.name

To get your properties it’s:

RETURN graph.properties.<your-property-here>

Examples

Assuming as always that you have the Movies DB setup and ready to roll, just call:

CALL apoc.graph.fromDB('Movies', null) YIELD graph

That will pop the whole DB into your browser – now if you do this with a MONSTER database, you’ll only see the first 300 nodes – otherwise no matter your browser you could expect epic failures.

Typically we want to RETURN something rather than just put it on the screen so:

CALL apoc.graph.fromDB('A1Graphs', null) YIELD graph
RETURN *

Oh look – exactly the same – HANDY.

Let’s (for the sake of something) pretend that we have 2 of these and we’re wanting to check the name:

CALL apoc.graph.fromDB('A1Graphs', null) YIELD graph
RETURN graph.name

That’ll get us:

image

(and the award for the dullest blog post picture goes to..)

Let’s set and get some properties:

CALL apoc.graph.fromDB('A1Graphs', {Hello: 'World'}) YIELD graph
RETURN graph.properties

Which returns

image

But if we just want one property:

CALL apoc.graph.fromDB('A1Graphs', {Hello: 'World'}) YIELD graph
RETURN graph.properties.Hello

Note – I’ve used an upper case property name, so I have to use the same case when pulling them out – (I refuse to be cowed into Java conventions). Anyhews, that returns:

image

Notes

You always  need to YIELD unless you literally want to dump your DB to the screen – doing something like:

CALL apoc.graph.fromDB('A1Graphs', null) AS myGraph

Will lead to exceptions – as Neo4j is expecting you to YIELD, you can do:

CALL apoc.graph.fromDB('A1Graphs', null) YIELD graph AS myGraph

and use myGraph throughout the rest of your code no worries.

Better Know APOC #1: apoc.export.csv.*

Export CSV from Neo4j with APOC

Neo4j Version 3.3.0
APOC Version 3.3.0.1

If you haven’t already, please have a look at the intro post to this series, it’ll cover stuff which I’m not going to go into on this.

We’re going to start with the Export functions – in particular exporting to CSV, mainly as people have asked about it – and well – you’ve got to start somewhere.

apoc.export.csv.*

There are 4 functions documented on the GitHub.IO page:

  • apoc.export.csv.query(query, file, config)
  • apoc.export.csv.all(file, config)
  • apoc.export.csv.data(nodes, rels, file, config)
  • apoc.export.csv.graph(graph, file, config)

All of them export a given input to a CSV file, specified by the file parameter.

Setup

There’s a couple of things we need to have in place to use these methods.

Neo4j.conf

We need to let Neo4j know that you allow it to export, and also run the export csv procedures:

apoc.export.file.enabled=true
 dbms.security.procedures.unrestricted=apoc.export.csv.*

 

apoc.export.csv.query

In no particular order (aside from the docs order) we’ll look at the .query version of export.csv.  This procedure takes a given query and exports is to a csv file – the format of the RETURN statement in the query directly affects the output, so if you return nodes, you get full node detail.

From the help procedure we get the following for the signature:

Inputs (query :: STRING?, file :: STRING?, config :: MAP?)
Outputs (file :: STRING?, source :: STRING?, format :: STRING?, nodes :: INTEGER?, relationships :: INTEGER?, properties :: INTEGER?, time :: INTEGER?, rows :: INTEGER?)

Inputs

Query

I hope this is obvious – but – it’s the query you want to use to get your CSV columns – personally I write the query first, make sure it’s working then simple copy/paste into my apoc call, so let’s say I want to get all the Movies a given Person (Tom Hanks) has ACTED_IN, I would do:

MATCH (p:Person {name: ‘Tom Hanks’})-[:ACTED_IN]->(m:Movie) RETURN m.title, m.released

File

This is the filename to export to – I always go fully qualified, but should you want to go relative, it’s relative to the Neo4j home directory

Config

There is only one config setting that affects this procedure:


Parameter Description
d Sets the delimiter for the export, this can only be one character, so something like ‘-‘ is ok, or ‘\t’ (for tabs)

Outputs

File

This is what you passed in, It doesn’t give you the fully qualified location, just what you passed in.

Source

This will say something like ‘statement: cols(2)’ to indicate the query returned 2 columns, I don’t think I need to explain that the number will change depending on what you return.

Format

Always gonna be csv

Nodes

If you’re returning nodes instead of just properties, this will give you the count of nodes you are exporting.

Relationships

This returns the count of the relationships being returned.

Properties

Will be 0 if you’re just returning nodes, otherwise will be a total count of all the properties returned. So if you’re returning 2 properties, but from 12 nodes, you get 24 properties.

Time

How long the export took in milliseconds – bear in mind – this will be less than the total query time you’ll see in something like the browser, due to rendering etc

Rows

The number of rows returned by the query and put into the CSV file – directly matches the number of lines you’ll have in your CSV file.

Examples

We want to export all the Movie titles and release years for files Tom Hanks has ACTED_IN – we’ve already got that query in place (up above) so lets put it into export.csv:

CALL apoc.export.csv.query(
     "MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN m.title, m.released", 
     "c:/temp/exportedGraph.csv", 
     null
)

I like to put the parameters onto different lines, you can whack it all in one line if that’s your fancy! I’m passing null for the config at the moment, as there’s no change I want to make.If I run this, I will get:

image

We can change that and return just the ‘m’:

CALL apoc.export.csv.query(
    "MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN m", 
    "c:/temp/exportedGraph.csv", 
     null
 )

Which gives a lot more detail – about all of the node:

image

OK, let’s play with the parameter, now I’m a big fan of the tab delimited format, so let’s make that happen:

CALL apoc.export.csv.query(
     "MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN  m.title, m.released", 
     "c:/temp/exportedGraph.csv", 
     {d:'\t'}
)

That gets us:

image

apoc.export.csv.all(file, config)

A lot of the following detail is the same as for the above procedure, so this will be short (in fact the next couple as well).

Config is the same – just one parameter to play with – ‘d’ or delimiter. What we don’t have is the ‘query’ parameter anymore – now we get the entire content of the database in one go – boom!

//Tab delimiting the 'all' query
CALL apoc.export.csv.all(
   "c:/temp/exportedGraph.csv", 
   {d:'\t'}
)

What you’ll find is that as you have no control over the format of the result, the authors of APOC have kindly pulled all the properties out, so if you run it against our Movies database, you get:

"_id","_labels","name","born","released","title","tagline","_start","_end","_type","roles","rating","summary"

as the header, and obviously all the rows are all the nodes!

But hang on.. Movie doesn’t have a born property – (at least in our DB). If we look at our DB we actually have a Movie node and a Person node, and all is dumping out everything – when you scroll down the file you’ll see the different rows have their _labels property.

(To be said in a Columbo voice) “Just one last thing”… we also get the relationship details – if you scroll down the file you’ll see rows like:

,,,,,,,"322","325","ACTED_IN","[""Neo""]","",""

These are the relationships – so you really are getting everything.

apoc.export.csv.data(nodes, rels, file, config)

OK, in this one we’re looking to pass a collection of Nodes and Relationships to be exported, but how do we get those nodes and relationships? With a query!

MATCH (m:Movie)<-[r:ACTED_IN]-(p:Person {name:'Tom Hanks'})
WITH COLLECT(m) AS movies, COLLECT(r) AS actedIn
CALL apoc.export.csv.data(
 movies, 
 actedIn, 
 'c:/temp/exportedGraph.csv', 
 null
) YIELD file, nodes, relationships
RETURN file, nodes, relationships

A couple of things to note, in line 2, we COLLECT the m and r values so we can pass them to the APOC procedure.

You may ask “Hey Chris, why exactly are you YIELDing? We didn’t need to do that before”, and you’re right of course. But because we have a MATCH query at the top the Cypher parser won’t let us end with a CALL clause, so we need to YIELD the result from the CALL and then RETURN those results as well (we’ll have to do this with the next procedure as well).

Now we’ve got the nodes and relationships exported we’ll find when we look at our CSV that it’s taken the same approach as the all procedure and picked out the properties so we’ll end up with a header like:

"_id","_labels","title","tagline","released","_start","_end","_type","roles"

Which makes a LOAD CSV call super easy later on – should that be your dream goal (cypher dreams are reserved for Michael Hunger only 🙂 ).

Config wise – it’s the same as the others, only the ‘d’ option for the delimiter.

apoc.export.csv.graph(graph, file, config)

This procedure exports a graph – simple really. But what is a graph when it’s a home? Well that leads to another APOC procedure – apoc.graph.fromDB which we’ll not cover here, needless to say Imma going to use it:

CALL apoc.graph.fromDB('movies', {}) YIELD graph
CALL apoc.export.csv.graph(
 graph,
 'c:/temp/exportedGraph.csv',
 null
 ) YIELD file, nodes, relationships
RETURN file, nodes, relationships

Say WHAT??!

OK, simples – the first line exports the whole database (I’m not doing any filtering here) into a graph identifier which I then pass into the call to export.

This exports in exactly the same way as the all version – so if you pass an entire DB – you get an entire DB. In fact – the code above does exactly the same as the call to all just in a more complicated way, and in keeping with that theme, the old faithful ‘d’ for delimiter config parameter is there for your use.

Summary

OK, we’ve looked at all the export.csv procedures now, (unless in the time I’ve taken to write this they’ve put another one in) and hopefully I’ve cleared up a few questions about how it works and what exactly the configuration options are. If not, let me know and I’ll augment.

Better Know APOC #0: Introduction

APOC

As all developers in good languages know – arrays and blog series start at 0, not 1 (sorry VB developers) so this is the introductory post to the upcoming series about APOC and it’s procedures etc.

What is APOC?

APOC stands for Awesome Procedures On Cypher – and is a large collection of community functions and procedures for use in Neo4j. You can read more about it on the page linked to above. Also – just so you know – I’m going to refer to them as procedures to save writing ‘functions and procedures’ all the time – because I’m lazy.

Setup for these posts

A lot of this stuff is listed on the APOC page, but use this as a quick get up to speed guide for when you’re reading the subsequent posts on here.

Setup – Configuration

You’ll need to setup your DB to be able to run the procs. We do this by adding a configuration property to your neo4j.conf file. The simplest (but most insecure) would be to add:

dbms.security.procedures.unrestricted=apoc.*

which will allow you to run all the APOC procedures. In the posts I’ll give a more specific version so you can execute just the ones we’re looking at, but if you have the above configuration property in your config – you can ignore that part of the posts.

Some of the procedures (export/import generally) require an additional configuration property, but that’ll be covered in the posts.

Setup – Data

90% of the time, if I can get away with it – I’m going to use the ‘Movies’ example database, it has many benefits – it’s a well-known type of data (most people know Movies have Actors etc) and it’s available to everyone who has Neo4j running.

To get the data you run:

:play movies

In the Neo4j Browser. Press the ‘next’ arrow and run the first query you see. You don’t need to do the rest.

Common stuff

The APOC procedures generally follow a common pattern, and there are some things you can do to help yourself

Get help

This is the first and most basic thing you can do – APOC has a built in help procedure, which you can run to get the signature of the procedure you’re interested in. In the posts the signatures will come from this method:

CALL apoc.help(‘NAME OF PROC’)

This will return a table with 5 columns:

  • Type: Whether it’s a procedure or function – this is important so you know how to call the thing
  • Name: If you’ve put in a non-complete name for help, this will tell you which proc you’re actually looking at
  • Text: Erm
  • Signature: This is the signature of the procedure (the most important bit from our POV)
  • Roles: If you have security setup – this is which roles can see/use this procedure
  • Writes: Ahhh

The signature is the main thing and that can be broken down into 2 parts, innies and outies, initially it can look a bit daunting as it’s one big lump of text, but break it down and it becomes easier:

image

Type wise – they’re all Java types, so easy to understand (Map = Dictionary .Netters).

Now you have the info – you have to pass all the parameters in – i.e. if there are 3 parameters, you can’t just pass in 2 and default the 3rd, you have to pass in all 3.

The Config Parameter

Pretty much all of the procedures will have this parameter:

config :: MAP?

This is a general config dictionary that allows the procedures to take in a large number of parameters without making the parameters list 1/2 a mile long (that’s 0.8KM my European friends).

It’s a Map<String, Object> and to set it when calling in Cypher you use the curly brace technique, so let’s say I want to set a parameter called ‘verbose’ to ‘true’ I would put:

{ verbose:true }

Easy – adding another parameter like (for example) ‘format’ and ‘json’ I would do:

{ verbose: true, format: ‘json’ }

One thing to bear in mind if you decide to look through the source code (and you should to see what these things are doing) is that the Config isn’t always just for the method you’re looking at.

For example ExportConfig is used by:

  • apoc.export.csv.*
  • apoc.export.cypher.*
  • apoc.export.graphml.*
  • etc

So whilst ExportConfig might well list 7 properties – the method you’re looking at may only actually use one of them – and setting the others will have no effect.