Novustack project

Novustack is an ed-tech startup passionate about building the innovation ecosystem. We want to train quality
applicants that can challenge and contribute to the innovation ecosystem. To do this, we first have to attract those quality applicants.
This project is to help us identify the details of the kind of talents that are first interested in the projects and those who ultimately get into the fellowship.

Novustack: Text

Context

This project was a project conducted for Novustack to know who a fellow is. The project was conducted by myself, a data analyst together with two other data analyst. We conducted cleaning of the data as well as compiling the data, analysing the data and I built the machine learning model which would enable us to predict a possible fellow from the application stage.

Problem statement

Novustack had successfully conducted 3 cycles of applications to the Innovate for Africa fellowship. The applications were rigorous with applicants from different demographics, skills, experiences, etc. They wanted an easy view to know the demographics, skill sets, referrals, etc of applicants in the last year.

Aside from a general overview of all the cycles collectively, an individual overview of each cohort was also needed.

Other note-worthy information they wanted included;

1. What Universities do we get the largest applications from, which ones are successful in the fellowship?

2. What other kinds of training have our applicants usually received?

3. What groups or communities do they belong to eg TIIDELAB.

4. How did most applicants hear about us? How did the successful applicants hear about us?

In summary they wanted to identify and know who becomes a fellow and the fellows characteristics.

Solutions

My team and I decided to tackle the problem by creating charts and data tables which would highlight the who is a model fellow and their characteristics like their referrals etc. Furthermore a machine learning algorithm was built to predict who became a fellow from the application stage.

Skills

Collaboration
Coding skills with python
Data analysis skills
Presentation skills
Data Visualization skills with Tableau and Excel.
Machine Learning skills

1dYqJdDJ3P-2fSzGV_vnn1lyEd89GtWy0aLRh_X9d_Vc_Page_17.jpg

Novustack: Text

The data analysis process

Data cleaning and wrangling

The data was cleaned in google sheets by removing duplicate rows and columns with few data. We also created the highest education column and created groups such as bachelor's degree, master’s degree and diploma from the data provided. We reformatted the location column to ensure that we had just states in the column. We checked for spellings and corrected them in the data as required.
My team and I created matching IDs and merged the data by matching IDs.

Data exploration

With respect to the target column which was whether or not an applicant was admitted into the fellowship or not, different charts were plotted to decipher the relationship between the other variables such as age, gender, work experience, referral and tracks with the target variable.

Machine Learning

The data was placed in a pandas dataframe and machine learning with the use of scikit learn was conducted on the data. Logistic regression analysis was used to create a model and even though the model had a high accuracy score and was confident in predicting who would not get into the program, the model was not so confident in predicting who would get into the fellowship.

Conclusions and recommendations

The following recommendation and conclusions were given:

The model would be more confident and reliable after a couple of more cycles when the dataset is larger. The larger the dataset the more confidence in the model.
Online communities should be made a required field in the application form so that information would be available for the next analytics exercise.
Previous education’ should also be made a required field, preferably a dropdown menu for the information on universities and a freeform tab so candidates can outline what training they have received in preparation for the track prior to application.
Information on fellow placement should be updated for both internal and external placement, this would enable a comparison between candidate’s years of experience and placement eligibility.
Qualitative performance questions should be added to the application process to evaluate candidates’ commitment to Novustack values such as grit, collaboration etc. This would provide data for a model that can be used to screen candidates prior to the first interview.
A psychometric response scale can be used to measure grit right from the application to determine who gets in or not into the fellowship. Each option in the response will have a point ranging from 1 to 5. A series of questions will suffice and a total of the scores will be used to determine applicants that has the highest scores in grit which can be in turn used to determine who will be a successful applicant
In the application page, the nationality section as well as the location inputs should be a drop-down list to allow for streamlined format of locations.
Identity numbers should also be given to applicants from the application stage and be updated in the following stages.

Novustack: Text

Novustack project

Context

​

Problem statement

Aside from a general overview of all the cycles collectively, an individual overview of each cohort was also needed.

Other note-worthy information they wanted included;

1. What Universities do we get the largest applications from, which ones are successful in the fellowship?

2. What other kinds of training have our applicants usually received?

3. What groups or communities do they belong to eg TIIDELAB.

4. How did most applicants hear about us? How did the successful applicants hear about us?

In summary they wanted to identify and know who becomes a fellow and the fellows characteristics.

​

Solutions

​

My team and I decided to tackle the problem by creating charts and data tables which would highlight the who is a model fellow and their characteristics like their referrals etc. Furthermore a machine learning algorithm was built to predict who became a fellow from the application stage.

​

Skills

​

Collaboration

Coding skills with python

Data analysis skills

Presentation skills

Data Visualization skills with Tableau and Excel.

Machine Learning skills

The data analysis process