Nba Machine Learning Chapter 3

17 minute read

Intro

The following command will sync your repo with mine if you’re having issues:

$ git checkout startChapter3 -f

I’ll be honest, I restructured my data at least four times throughout this project. I’m going to show you the final version of my data structure. The structure of the data in the database will be the following:

// Final Structure
[
    {
        "Player" : "LeBron James",
        "Pos" : "SF",
        "Seasons" : [
            {
                "totals" : {
                    ......
                    ......
                },
                "advanced" : {
                    ......
                    ......
                }
            },
            ......
        ]
    },
    ....
]

Let’s move into our buildData() function. All the following work will be done in .on("data", fun... Since MongoDB deals with objects, we are going to make one giant object in our program called DB_OBJ. Let’s put DB_OBJ with our modules. So add the following line to your Modules Section:

var DB_OBJ = {};

Next, jump inside of our .on("data") method inside of buildData(). Around this time of development, I was using a debugger like node-inspector. It’ll be too much of a hassle to bring the debugging aspect into this tutorial. So I’m just going to show you how the data looks like at this point of the program.

This data is missing some very important information that we need to know. We don’t know what season this data belongs to, or what stat folder it came from(totals, advanced). So we need to parse out the stat type and season from the file path, time for some good old fashion regex! I took it upon myself to find out the regex formula we are going to use.

Inside of buildData(), make .transform()look like this:

.transform(function(data){
    var parsedPath = path.match(/(advanced|totals)|(\d{4})/g);
    data.statType = parsedPath[0];
    data.Season = parsedPath[1];
    return data;
})

Let’s check the debugger to make sure we added statType and Season to our data variable. Don’t worry about running it right now.

Now since Season and statType have been added to data, we can now start playing with DB_OBJ.

In this object, the player’s name will be the key and their seasons/stats will be the value. Let me show you what I mean:

{
    "<PLAYER NAME>": { // LeBron James
        "<SEASON YEAR>" : { // 2007
            "advanced": {
                ... // Stats from advanced CSV
            },
            "totals": {
                ... // Stats from totals CSV
            },
        }
    }
}

I know this data structure does not match up with our final data structure, don’t worry, we will be fixing that a bit later.

Since we are going to use the player names as a key in the DB_OBJ. We need to check if that name already exists in the object. Put the following code in your .on("data") inside of buildData():

console.log(data);
// Does Player exist?
if (!DB_OBJ.hasOwnProperty(data.Player)) {
    // Player doesn't exist
} else {
    // Player does exist
}

Let’s test out the =BUILD, so run the following command:

$ node buildNBA_Data.js =BUILD TEST

Nothing crazy going on here, just checking if the player exists. Place the following lines inside of the if(!DB_OBJ.hasOwnProperty(data.Player)) statement:

// Player doesn't exist
DB_OBJ[data.Player] = {}; // 1
DB_OBJ[data.Player].Seasons = {}; // 2
DB_OBJ[data.Player].Seasons[data.Season] = {}; // 3
DB_OBJ[data.Player].Seasons[data.Season][data.statType] = {}; // 4
DB_OBJ[data.Player].Seasons[data.Season][data.statType] = data; // 5

This is a data example and how the above code would interact with it:

// Data Example, DONT PASTE THIS!!
data = {
    Player : "LeBron James",
    Season : "2007",
    statType: "advanced",
    G : "78",
    Age: "22"
};

Part 1 - Create a new player in DB_OBJ. Example:

// DB_OBJ[data.Player] = {};
{
    "LeBron James": {}
}

Part 2 - Create a Seasons property in Player’s Object. Example:

// DB_OBJ[data.Player].Seasons = {};
{
    "LeBron James": {
        "Seasons" : {}
    }
}

Part 3 - Add the year to Seasons object. Example:

// DB_OBJ[data.Player].Seasons[data.Season] = {};
{
    "LeBron James": {
        "Seasons": {
            "2007" : {}
        }
    }
}

Part 4 - Add stats to Year object. Example:

// DB_OBJ[data.Player].Seasons[data.Season][data.statType] = {};
{
    "LeBron James": {
        "Seasons":{
            "2007" : {
                "advanced": {

                }
            }
        }
    }
}

Part 5 - Add all of the data to the statType object. Example:

// DB_OBJ[data.Player].Seasons[data.Season][data.statType] = data;
{
    "LeBron James": {
        "Seasons":{
            "2007" : {
                "advanced": {
                    // ALL DATA HERE!!
                }
            }
        }
    }
}

If you’re thinking to yourself, “I’m sure this section irritates Fabian a lot, we all know he hates long lines and those are some long ass lines!” Yea, you’re right and we haven’t even added the else portion! Let’s get rid of all these data.<something> to help shorten the lines. Inside of .on("data"), erase everything and replace it with this:

// 1
var name = data.Player,
    yr = data.Season,
    stat = data.statType;

// Does Player exist?
if (!DB_OBJ.hasOwnProperty(name)) {
    // Player doesn't exist

    // 2
    var tmpPlayer = {
        Player: name,
        Pos: data.Pos,
        Seasons: {}
    };
    DB_OBJ[name] = tmpPlayer;
    DB_OBJ[name].Seasons[yr] = {};
} else if(!DB_OBJ[name].Seasons.hasOwnProperty(yr)) {
        // Player Exists, Season doesnt exist
        DB_OBJ[name].Seasons[yr] = {};
}
// 3
DB_OBJ[name].Seasons[yr][stat] = {};
DB_OBJ[name].Seasons[yr][stat] = data;

Part 1 - We are reducing the variables to a single word to reduce the length of the line.

Part 2 - This tmpPlayer is helping us for the final data structure.

Part 3 - we are adding the stats to the player object.

OK now in the .on("end") portion, let’s print out our DB_OBJ:

.on("end", function(){
    console.log("done");
    console.log(DB_OBJ);
    _aCallback();
});

Let’s run it:

$ node buildNBA_Data.js =BUILD TEST

You should’ve gotten something similar to this:

Yeah I know, I know. This data is starting to be really hard to examine, I think it’s time we put it in a file.

In your Modules Section, add the following line:

var jsonfile = require('jsonfile');

and replace async.each() callback to this:( look at the next step if you’re confused )

function (err) {
    console.log('***DONE BUILDING DATA***');
    jsonfile.writeFile('./data/outputFile.json', DB_OBJ, { spaces: 4}, function(err) {
        console.error(err);
    });
});

Your entire buildData() function should look like this:

var buildData = function(paths) {
    async.each(paths, function (path, _aCallback) {

        // Create File Stream
        var inputStream = fs.createReadStream(path);

        // Read in CSV file
        fast_csv.fromStream(inputStream,{
            headers: true,
            ignoreEmpty: true
        })
        .transform(function(data){
            var parsedPath = path.match(/(advanced|totals)|(\d{4})/g);
            data.statType = parsedPath[0];
            data.Season = parsedPath[1];
            return data;
        })
        .on("data", function(data){

            var name = data.Player,
                yr = data.Season,
                stat = data.statType;

            // Does Player exist?
            if (!DB_OBJ.hasOwnProperty(name)) {
                // Player doesn't exist

                // helps us for the final data structure.
                var tmpPlayer = {
                    Player: name,
                    Pos: data.Pos,
                    Seasons: {}
                };
                DB_OBJ[name] = tmpPlayer;
                DB_OBJ[name].Seasons[yr] = {};
            } else if(!DB_OBJ[name].Seasons.hasOwnProperty(yr)) {
                    // Player Exists, Season doesnt exist
                    DB_OBJ[name].Seasons[yr] = {};
            }
            // add the stats to the player object.
            DB_OBJ[name].Seasons[yr][stat] = {};
            DB_OBJ[name].Seasons[yr][stat] = data;
        })
        .on("end", function(){
            console.log("done");
            console.log(DB_OBJ);
            _aCallback();
        });

    }, function (err) {
        console.log('*****DONE BUILDING DATA*****');
        jsonfile.writeFile('./data/outputFile.json', DB_OBJ, { spaces: 4}, function(err) {
            console.error(err);
        });
    });
};

Let’s run it:

$ node buildNBA_Data.js =BUILD TEST

It should create a file called outputFile.json that should look like this, if not don’t worry, a code check up is soon:

OK let’s remove our console.log(DB_OBJ) add a few more years to our =TEST. Inside of your Main Function change endYr from 1982 to 1990:

// If Test is set, only get a few years
var endYr = (isTest) ? 1990 : 2016;

Let’s run it:

$ node buildNBA_Data.js =BUILD TEST

Hopefully your outputFile.json looks like this:

OK, so we’re not done cleaning our data yet. I know were about to get to the Mongo portion of the tutorial, but it’ll be fast, I promise. Here’s our code checkup for the second section:

Code Checkup

/* Modules Section
============================================= */

var _ = require("underscore");
var fs = require('fs');
var fast_csv = require("fast-csv");
var async = require("async");
var jsonfile = require('jsonfile');
var DB_OBJ = {};
// End of Modules

/* Helper Section
============================================= */
var getHeader = function (path) {
    var statObj = {
        'totals': 'Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS',
        'advanced': 'Rk,Player,Pos,Age,Tm,G,MP,PER,TS%,3PAr,FTr,ORB%,DRB%,TRB%,AST%,STL%,BLK%,TOV%,USG%,0,OWS,DWS,WS,WS/48,0,OBPM,DBPM,BPM,VORP'
    };
    return _.compact(_.map(statObj, function (val, key) {
        // key = totals || advanced
        // If key exist inside of path
        if (path.indexOf(key) > -1) {
            // Returns the value at statObj[statType]
            return val;
        }
    }))[0];
};
// End of Helper

/* Functions Section
============================================= */

/**
*Function Name: cleanData
*Parameters: array of file paths
*RUN: node misc_NBA_Data.js =CLEAN
*/
var cleanData = function(paths) {

    async.each(paths, function(filePath, callback) {
        // get data from file
        fs.readFile(filePath, function(err, data) {
            if(err) throw err;
            var data_Str = data.toString();
            // Remove double commas
            data_Str = data_Str.replace(/,,/g,',0,');
            var data_Arr = data_Str.split("\n");

            // Check if first row is empty
            if (!data_Arr[0].length) {
                data_Arr.shift();
            }

            data_Arr = _.filter(data_Arr, function (_str) {
                return (_str.indexOf('Rk,Player'));
            });

            var finalHeader = getHeader(filePath);
            console.log(finalHeader);

            data_Arr.unshift(finalHeader);
            var outputPath = filePath.replace(/csv/,'output');

            fs.writeFileSync(outputPath, data_Arr.join('\n'));
            callback();
        });
    },function (err) {
        console.log('*****DONE CLEANING DATA*****');
    });
};


/**
*Function Name: buildData
*Parameters: array of file paths
*RUN: node misc_NBA_Data.js =BUILD
*/
var buildData = function(paths) {
    async.each(paths, function (path, _aCallback) {

        // Create File Stream
        var inputStream = fs.createReadStream(path);

        // Read in CSV file
        fast_csv.fromStream(inputStream,{
            headers: true,
            ignoreEmpty: true
        })
        .transform(function(data){
            var parsedPath = path.match(/(advanced|totals)|(\d{4})/g);
            data.statType = parsedPath[0];
            data.Season = parsedPath[1];
            return data;
        })
        .on("data", function(data){

            var name = data.Player,
                yr = data.Season,
                stat = data.statType;

            // Does Player exist?
            if (!DB_OBJ.hasOwnProperty(name)) {
                // Player doesn't exist

                 // helps us for the final data structure.
                var tmpPlayer = {
                    Player: name,
                    Pos: data.Pos,
                    Seasons: {}
                };
                DB_OBJ[name] = tmpPlayer;
                DB_OBJ[name].Seasons[yr] = {};
            } else if(!DB_OBJ[name].Seasons.hasOwnProperty(yr)) {
                    // Player Exists, Season doesnt exist
                    DB_OBJ[name].Seasons[yr] = {};
            }
            DB_OBJ[name].Seasons[yr][stat] = {};
            DB_OBJ[name].Seasons[yr][stat] = data;
        })
        .on("end", function(){
            console.log("done");
            _aCallback();
        });

    }, function (err) {
        console.log('*****DONE BUILDING DATA*****');
        jsonfile.writeFile('./data/outputFile.json', DB_OBJ, { spaces: 4}, function(err) {
            console.error(err);
        });
    });
};

/**
- Main Function
*/
(function (task, isTest) {


    // Allowed Tasks
    if (['=BUILD','=CLEAN'].indexOf(task) === -1) {
        console.log('You did not pick an available task.');
        return ;
    }

    // Type of stats
    var STATS = ['totals', 'advanced'];

    // If Test is set, only get a few years
    var endYr = (isTest) ? 1990 : 2016;


    // Get generate our list of file paths
    var pathList = function(begPath) {
        return _.flatten(_.map(STATS, function (stat) {

            return _.map(_.range(1981,endYr), function (year) {
                var finalStr = [begPath,stat,'/leagues_NBA_',
                                year,'_',stat,'.csv'].join('');

                return finalStr;
            });
        }));
    };


    // Divider ===================================

    // What task did user choose
    if (task === '=BUILD') {
        console.log('*****BUILDING DATA*****');
        buildData(pathList('./data/output/'));
    }
    else if(task === '=CLEAN'){
        console.log('*****CLEANING DATA*****');
        cleanData(pathList('./data/csv/'));
    } else {
        console.log('*scratching head* how you got here?');
    }

})(process.argv[2], process.argv[3]);

Mongo

We’re going to pretend that all of us know what issues are going to arise in the future from how our data is currently. For one, it’s not even in the final structure that we wanted. I’m going to go ahead and list the issues that we have with our data:

  1. Not in the final structure.
  2. * at the end of names(HoF players).
  3. All values are strings.

The first is self-explanatory. The second one causes problems when we want to search up certain players in the database. The third, I didn’t pay too much attention to it at first… Until I had to apply math in the Machine Learning portion of this project. I tried converting them to floats while in Python. After 10 minutes of doing that, I figured it would be much easier for them to already be a float type in the database.

So for my final trick.

We’re gonna use modules that were inspired from reading Ben Cherry’s amazing article on Javascript Modules.

This module will have a few properties that will help keep our code together. The first property, filter, will take care of converting all our stats(strings) into floats and remove unwanted properties like Rk, 0, matches. The next property, clean, will change our structure from:

{
    "<PLAYER NAME>": { // LeBron James
        "Player" : "LeBron James",
        "Pos" : "SF",
        "<SEASON YEAR>" : { // 2007
            "advanced": {
                ... // Stats from advanced CSV
            },
            "totals": {
                ... // Stats from totals CSV
            },
        }
    }
}

to the desired structure:

// Final Structure
[
    {
        "Player" : "LeBron James",
        "Pos" : "SF",
        "Seasons" : [
            {
                "totals" : {
                    ......
                    ......
                },
                "advanced" : {
                    ......
                    ......
                }
            },
            ......
        ]
    },
    ....
]

So replace our var DB_OBJ = {}; in our Module Section with:

var DB_OBJ = (function(data) {

    data.clean = function() {

    };

    data.filter = function(obj, reg) {

    };

    return data;
})({});

If you rarely work with modules, I recommend reading that article I mentioned above, even if you just need a quick refresher. The reason why we are making DB_OBJ a module is so that we have all the data and data manipulation in one location. As of right now, we are filtering the data in the .transform() and we would be cleaning the data right before our jsonfile.writeFile() line.

Let’s start off with filter since we have the majority of the code for it already. We will be calling our filtering method in our .transform() method. Inside of buildData(), replace .transform() with:

.transform(function(data){
    var regex = path.match(/(advanced|totals)|(\d{4})/g);
    return DB_OBJ.filter(data, regex);
})

Since we do not have access to path in our data object, we won’t be able to apply our regex in our filter property unless we pass it in as an argument. Let’s hop back to our data.filter:

data.filter = function(obj, reg) {
    obj.statType = reg[0];
    obj.Season = reg[1];
    obj.Player = obj.Player.replace(/\*/g,'');
    delete obj["Rk"];
    delete obj["0"];
    return _.mapObject(obj, function (val, key) {

        if (!isNaN(val)) {
            val = +val;
        }
        return val;
    });
};

Those first two lines seem familiar, what about the rest? The next three are pretty simple to explain. If the player has an * in his name, we remove it. The next two lines we delete the rk and 0 property. The last section may look a bit confusing, but its not. We are going through each property and checking if the string could be a number. This stack overflow answer does a great job explaining it.

Let’s run it and see if the stats are no longer strings and we’ve removed the * from Kareem Abdul-Jabbar:

$ node buildNBA_Data.js =BUILD TEST

Did your stats turn into numbers like mine? if not, its ok, a code checkup is very soon!:

Now let’s get to cleaning. Some of you might have already figured out that we are going to print out those functions inside of outputFile.json(depending jsonfile version). Does that mean we’re going to be printing those out to? We’re going to delete them before it gets to that point, so make your data.clean look like this:

data.clean = function() {
    delete data["clean"];
    delete data["filter"];
    // 1
    return _.map(data, function (plyr) {
        // 2
        plyr.Seasons = _.values(plyr.Seasons);
        return plyr;
    });
};

Part 1 - Since we are using the _.map() function, it’s going to return all the values in our data object as an array, so we get rid of the “Player Name” property.

[
    {
        "Player": "Kareem Abdul-Jabbar",
        "Pos": "C",
        ....
    },
    ....
]

Part 2 - If you remember correctly, our seasons aren’t in the structure that we want, this is where we fix that issue. We set Seasons equal to the _.values() of itself. _.values() will return the data in an array format, which is what we want.

[
    {
        "Player": "Kareem Abdul-Jabbar",
        "Pos": "C",
        "Seasons": [ // Heres the array that we wanted :)
            {
                "totals": {
                    ......
                },
                "advanced": {
                    ......
                }
            },
            {...
            }
        ]
    },
    ....
]

Let’s see, we just have to call DB_OBJ.clean() now, go to the line where jsonfile.writeFile() is, and replace it with:

jsonfile.writeFile('./data/outputFile.json', DB_OBJ.clean(), { spaces: 4}, function(err) {
    console.error(err);
});

Now let’s run it! :)

$ node buildNBA_Data.js =BUILD TEST

Did you get something like this:

Before we do a code checkup, let’s minimize outputFile.json. This dramatically improves performance when uploading to MongoDB. Change {spaces: 4} to {spaces: 0} on the jsonfile.writeFile() line:

jsonfile.writeFile('./data/outputFile.json', DB_OBJ.clean(), { spaces: 0}, function(err) {

Now let’s run without =TEST set:

$ node buildNBA_Data.js =BUILD

Your outputFile.json should look like the following(it may take a bit to load):

Code Checkup

buildNBA_Data.js with {spaces: 4}

/* Modules Section
============================================= */

var _ = require("underscore");
var fs = require('fs');
var fast_csv = require("fast-csv");
var async = require("async");
var jsonfile = require('jsonfile');
var DB_OBJ = (function(data) {

    data.clean = function() {
        delete data["clean"];
        delete data["filter"];
        // 1
        return _.map(data, function (plyr) {
            // 2
            plyr.Seasons = _.values(plyr.Seasons);
            return plyr;
        });
    };

    data.filter = function(obj, reg) {
        obj.statType = reg[0];
        obj.Season = reg[1];
        obj.Player = obj.Player.replace(/\*/g,'');
        delete obj["Rk"];
        delete obj["0"];
        return _.mapObject(obj, function (val, key) {

            if (!isNaN(val)) {
                val = +val;
            }
            return val;
        });
    };

    return data;
})({});
// End of Modules

/* Helper Section
============================================= */
var getHeader = function (path) {
    var statObj = {
        'totals': 'Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS',
        'advanced': 'Rk,Player,Pos,Age,Tm,G,MP,PER,TS%,3PAr,FTr,ORB%,DRB%,TRB%,AST%,STL%,BLK%,TOV%,USG%,0,OWS,DWS,WS,WS/48,0,OBPM,DBPM,BPM,VORP'
    };
    return _.compact(_.map(statObj, function (val, key) {
        // key = totals || advanced
        // If key exist inside of path
        if (path.indexOf(key) > -1) {
            // Returns the value at statObj[statType]
            return val;
        }
    }))[0];
};
// End of Helper

/* Functions Section
============================================= */

/**
*Function Name: cleanData
*Parameters: array of file paths
*RUN: node misc_NBA_Data.js =CLEAN
*/
var cleanData = function(paths) {

    async.each(paths, function(filePath, callback) {
        // get data from file
        fs.readFile(filePath, function(err, data) {
            if(err) throw err;
            var data_Str = data.toString();
            // Remove double commas
            data_Str = data_Str.replace(/,,/g,',0,');
            var data_Arr = data_Str.split("\n");

            // Check if first row is empty
            if (!data_Arr[0].length) {
                data_Arr.shift();
            }

            data_Arr = _.filter(data_Arr, function (_str) {
                return (_str.indexOf('Rk,Player'));
            });

            var finalHeader = getHeader(filePath);
            console.log(finalHeader);

            data_Arr.unshift(finalHeader);
            var outputPath = filePath.replace(/csv/,'output');

            fs.writeFileSync(outputPath, data_Arr.join('\n'));
            callback();
        });
    },function (err) {
        console.log('*****DONE CLEANING DATA*****');
    });
};

/**
*Function Name: buildData
*Parameters: array of file paths
*RUN: node misc_NBA_Data.js =BUILD
*/
var buildData = function(paths) {
    async.each(paths, function (path, _aCallback) {

        // Create File Stream
        var inputStream = fs.createReadStream(path);

        // Read in CSV file
        fast_csv.fromStream(inputStream,{
            headers: true,
            ignoreEmpty: true
        })
        .transform(function(data){
            var regex = path.match(/(advanced|totals)|(\d{4})/g);
            return DB_OBJ.filter(data, regex);
        })
        .on("data", function(data){

            var name = data.Player,
                yr = data.Season,
                stat = data.statType;

            // Does Player exist?
            if (!DB_OBJ.hasOwnProperty(name)) {
                // Player doesn't exist

                 // helps us for the final data structure.
                var tmpPlayer = {
                    Player: name,
                    Pos: data.Pos,
                    Seasons: {}
                };
                DB_OBJ[name] = tmpPlayer;
                DB_OBJ[name].Seasons[yr] = {};
            } else if(!DB_OBJ[name].Seasons.hasOwnProperty(yr)) {
                    // Player Exists, Season doesnt exist
                    DB_OBJ[name].Seasons[yr] = {};
            }
            DB_OBJ[name].Seasons[yr][stat] = {};
            DB_OBJ[name].Seasons[yr][stat] = data;
        })
        .on("end", function(){
            console.log("done");
            _aCallback();
        });

    }, function (err) {
        console.log('*****DONE BUILDING DATA*****');
        jsonfile.writeFile('./data/outputFile.json', DB_OBJ.clean(), { spaces: 4}, function(err) {
            console.error(err);
        });
    });
};

/**
- Main Function
*/
(function (task, isTest) {


    // Allowed Tasks
    if (['=BUILD','=CLEAN'].indexOf(task) === -1) {
        console.log('You did not pick an available task.');
        return ;
    }

    // Type of stats
    var STATS = ['totals', 'advanced'];

    // If Test is set, only get a few years
    var endYr = (isTest) ? 1990 : 2016;


    // Get generate our list of file paths
    var pathList = function(begPath) {
        return _.flatten(_.map(STATS, function (stat) {

            return _.map(_.range(1981,endYr), function (year) {
                var finalStr = [begPath,stat,'/leagues_NBA_',
                                year,'_',stat,'.csv'].join('');

                return finalStr;
            });
        }));
    };


    // Divider ===================================

    // What task did user choose
    if (task === '=BUILD') {
        console.log('*****BUILDING DATA*****');
        buildData(pathList('./data/output/'));
    }
    else if(task === '=CLEAN'){
        console.log('*****CLEANING DATA*****');
        cleanData(pathList('./data/csv/'));
    } else {
        console.log('*scratching head* how you got here?');
    }

})(process.argv[2], process.argv[3]);

Export to MongoDB!

If you do not have MongoDB installed, follow their Installation Guide. If you’re a Mac user, I recommend following their Homebrew Guide, it’s really simple. Once you have it installed, the following section will change depending on how much experience you have with MongoDB. Choose one of the following: Experienced with MongoDB or MongoDB Noob.

Experienced with MongoDB

Go ahead and Fire up mongo with mongod in a separate terminal. Then come back to your initial terminal and input the following:

$ mongoimport --db NBA_Tutorial --collection players --file ./data/outputFile.json --jsonArray

if you have your outputFile.json in a different directory, simply replace the above path with yours. This short article clarified a lot of questions I had when using mongoimport.

MongoDB Noob

We’re just gonna set you up with a FREE MongoLab account. Once you’re signed in, you should be brought to a screen like this. Click the Create new button:

For the free tier, select these options. Don’t forget to put a database name at the bottom.

We need to create a database user to access the data. Do not forget your **username** and **password**, we're going to need it to upload our data.

on that same page, take note of your URI, we’re going to be using it in the next step.

In the root directory of this project, paste the following command with your credentials. Quick tips:

-d = database, -c = collection, -u = username, -p = password.

$ mongoimport -h <URI> -d mongolab_nba -c players -u <USERNAME> -p <PASSWORD> --file ./data/outputFile.json --jsonArray 

I use Robomongo to view my database. If you have trouble setting it up, follow this super fast scotch tutorial. Your database should look like this:

**We're about to start our final chapter!** I know you’re excited, but I don’t want people jumping off the stage yet!