[Introduction] Data Science Knowledge (DSK)

Hello everyone!

As part of the idea to exchange knowledge and discuss more about Data Science, I’m starting a series of posts speaking about the (what I think are) the most important and relevant concepts that any data scientist shoud know. I hope that you will like it!

The posts about that will be tagged at “Data Science > General Knowledge”.

See you around then!

[TUTORIAL 01] Exploring French employment, salaries, population per town – INTRODUCTION

Hello hello!

This is the “quick-off” post for a series of posts that I pretend to do about Data Science (Exploratory Data Analysis, Data Visualization and Machine Learning).

I will try to index/associate these “tutorials” by number on the beginning of the post name. For example, this one is the “TUTORIAL 01”. This will help to follow the next posts that will speak about the same topic/dataset.

Explanations a part, let’s go to speak about what this introductory post will covers:

  1. Which dataset?
  2. How to get/dowload the dataset?
  3. Brief explanation about the file(s) that you will find on the Dataset

So… Let’s start!

First of all, the dataset that we will work on this first tutorial is a dataset from INSEE (Institut National de la Statistique et des Etudes Economiques).

INSEE was created in 1946 and it is a “Directorate-General of the Ministries for the Economy and for Finances” (INSEE.fr, n.d.)

The data site is available to download on Kaggle.com and you could find here.

Basically this dataset content 6 files, 4 “.csv” and 2 “.geojson”: (ps.: These descriptions came from Kaggle.com)

  • (.csv) base_etablissement_par_tranche_effectif: information on the number of firms in every french town, categorized by size.
  • (.csv) name_geographic_information: geographic data on french town (mainly latitude and longitude, but also region / department codes and names).
  • (.csv) net_salary_per_town_per_category: salaries around french town per job categories, age and sex.
  • (.csv) population : demographic information in France per town, age, sex and living mode.
  • (.geojson) communes: geografic data structure for “communes” (equivalent to civil townships)
  • (.geojson) departements: geografic data structure for “departements” (administrative district in France).

You could learn more about geoJSON here and here.

Next post, we will start the to play a bit with these files.
I will use Python + jupyterNotebooks as my main tools to “explore the data”!

See you soon!

References:

INSEE.fr, n.d. Getting to know INSEE [ONLINE]. Available at: https://www.insee.fr/en/information/2381925 (Accessed 02 February 2018).

Kaggle.com, 2017. French employment, salaries, population per town [ONLINE]. Available at: https://www.kaggle.com/etiennelq/french-employment-by-town/data (Accessed 02 February 2018).